Cem

cem@lists.gking.harvard.edu

152 discussions

Re: [cem] Few matched strata (and individuals within strata) after implementing cem

by Gary King

Sergio, the idea you describe in your first paragraph has been formalized with this algorithm <http://projects.iq.harvard.edu/frontier/home>, and so that's another option. With CEM, you would decide how important it is to get matches for each variable and coursen more for less important variables. Gary --- http://gking.harvard.edu 617-500-7570 On Wed, Apr 5, 2017 at 12:32 PM, Sergio Salis <Sergio.Salis(a)natcen.ac.uk> wrote: > Hi Gary, > > > > Thanks very much for your advice. I understand the idea is trying > different coarsening strategies (among those which make sense) for each > variable and see which one produces the lowest imbalance, measured by means > of the Multivariate L1 distance (the univariate imbalances should also be > looked at individually). Is this correct? > > > > For variables like income and assets I guess it makes sense to use > percentiles as there is no obvious value to create cut-off points. If so, > shall I use > > > > cem var1 var 2 …. income(P1 P2 …. Pn) , treatment(treated) > > > > (where P1=value of the 1st percentile, P2=value of the 2nd percentile ….. > Pn=value of the last percentile)? > > > > (#10) will produce 10 equally sized bins but I am not sure whether equal > size means equal base (e.g. bin 1 includes those with income between 1 to > 10, bin 2 those with income 11 to 20, etc.) or equal frequencies (in which > case a bin defines a percentile). I am also not sure what Sturge's rule and > Scott’s algorithm are, I cannot find any description in the Stata help file. > > > > Thanks again for your help< very much appreciated. > > > > Sergio > > > > *From:* Gary King [mailto:king@harvard.edu] > *Sent:* 05 April 2017 16:06 > *To:* Sergio Salis > *Cc:* cem(a)lists.gking.harvard.edu > *Subject:* Re: [cem] Few matched strata (and individuals within strata) > after implementing cem > > > > Hi Sergio, you can adjust the coarsening rather than using the defaults in > CEM. more coarse bins will generate more observations. you want to make > the choices based on the substance of the variables, and which ones are > more important to match finely on > > > Gary > > -- > > *Gary King* - Albert J. Weatherhead III University Professor - Director, > IQSS <http://iq.harvard.edu/> - Harvard University > > GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - > 617-500-7570 <(617)%20500-7570> - Assistant <king-assist(a)iq.harvard.edu>: > 617-495-9271 <(617)%20495-9271> > > > > On Wed, Apr 5, 2017 at 10:04 AM, Sergio Salis <Sergio.Salis(a)natcen.ac.uk> > wrote: > > > > Hi all, > > > > I’m considering using the cem Stata programme to evaluate the impact of a > welfare-to-work programme in the UK. However, I have never used cem before > so I am trying to understand some basic issues before proceeding with the > estimation. > > > > The first thing I’d be interested in understanding is: How does one handle > situations where after running cem the number of matched strata (and units > within them) are very small? > > > > Applying the cem algorithm to data from a previous impact evaluation I get: > > > > Number of strata: 8883 > > Number of matched strata: 132 > > > > 0 1 > > All 8208 1584 > > Matched 179 141 > > Unmatched 8029 1443 > > > > If I calculate the ATT using cem matched data I get an impact estimate > which is positive (around 5ppts; based on 320 obs only) while using > psmatch2 on all data (i.e. not only those in cem matched strata; around > 8,237 obs are used) with kernel weights I get an estimate of around > -5.7ppts. This means I reach opposite conclusions about the impact of the > programme of interest using cem and psmatch2. > > > > I understand the cem-based estimates are based on better matched data > (i.e. produce less biased estimates) compared to my psmatch2 estimate with > kernel weights) but this comes at the expense of external validity: > inference on the initial population is made based on a very small subset of > data (estimates based on cem are not statistically significant while my > original estimate was highly significant). Any advice about how one can > handle situations of this type? > > > > Many thanks, > > Sergio > > > > > > > > NatCen Social Research > 35 Northampton Square > London EC1V 0AX > 020 7250 1866 > > Visit our website. www.natcen.ac.uk > Read our latest blog. http://www.natcen.ac.uk/blog > Follow us. @NatCen <https://twitter.com/natcen> > Email us. info(a)natcen.ac.uk > > NatCen Social Research is certificated to ISO/IEC 27001:2013 for > Information Security Management Systems and to ISO 20252:2012, the > international standard for market, opinion and social research. > > Company limited by guarantee. Registered in England No. 4392418. Charity > registered in England and Wales (1091768) and in Scotland (SC038454). > > Confidentiality: The information in this email and any attachments are > confidential and may include some that is legally privileged. It must not > be disclosed to or used by persons other than the intended recipient. If > received in error, please notify us immediately and then delete this > document. > Content: Any views or opinions expressed do not necessarily represent > those of NatCen Social Research. Please note the content of this e-mail may > be intercepted, monitored or recorded for compliance purposes. Sensitive > personal data should not normally be transmitted by e-mail. > Copyright: Copyright in this e-mail and any attachments created by NatCen > Social Research belong to NatCen Social Research unless otherwise stated. > Care: NatCen Social Research shall not be liable to the recipient or any > third party for any loss or damage howsoever arising from this e-mail > and/or its content, including loss or damage caused by virus. It is the > responsibility of the recipient to ensure the opening or use of this > message and any attachments shall not adversely affect systems or data. > > >

7 years, 1 month

Re: [cem] Few matched strata (and individuals within strata) after implementing cem

by Gary King

Sergio, the idea you describe in your first paragraph has been formalized with this algorithm <http://projects.iq.harvard.edu/frontier/home>, and so that's another option. With CEM, you would decide how important it is to get matches for each variable and coursen more for less important variables. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Wed, Apr 5, 2017 at 12:32 PM, Sergio Salis <Sergio.Salis(a)natcen.ac.uk> wrote: > Hi Gary, > > > > Thanks very much for your advice. I understand the idea is trying > different coarsening strategies (among those which make sense) for each > variable and see which one produces the lowest imbalance, measured by means > of the Multivariate L1 distance (the univariate imbalances should also be > looked at individually). Is this correct? > > > > For variables like income and assets I guess it makes sense to use > percentiles as there is no obvious value to create cut-off points. If so, > shall I use > > > > cem var1 var 2 …. income(P1 P2 …. Pn) , treatment(treated) > > > > (where P1=value of the 1st percentile, P2=value of the 2nd percentile ….. > Pn=value of the last percentile)? > > > > (#10) will produce 10 equally sized bins but I am not sure whether equal > size means equal base (e.g. bin 1 includes those with income between 1 to > 10, bin 2 those with income 11 to 20, etc.) or equal frequencies (in which > case a bin defines a percentile). I am also not sure what Sturge's rule and > Scott’s algorithm are, I cannot find any description in the Stata help file. > > > > Thanks again for your help< very much appreciated. > > > > Sergio > > > > *From:* Gary King [mailto:king@harvard.edu] > *Sent:* 05 April 2017 16:06 > *To:* Sergio Salis > *Cc:* cem(a)lists.gking.harvard.edu > *Subject:* Re: [cem] Few matched strata (and individuals within strata) > after implementing cem > > > > Hi Sergio, you can adjust the coarsening rather than using the defaults in > CEM. more coarse bins will generate more observations. you want to make > the choices based on the substance of the variables, and which ones are > more important to match finely on > > > Gary > > -- > > *Gary King* - Albert J. Weatherhead III University Professor - Director, > IQSS <http://iq.harvard.edu/> - Harvard University > > GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - > 617-500-7570 <(617)%20500-7570> - Assistant <king-assist(a)iq.harvard.edu>: > 617-495-9271 <(617)%20495-9271> > > > > On Wed, Apr 5, 2017 at 10:04 AM, Sergio Salis <Sergio.Salis(a)natcen.ac.uk> > wrote: > > > > Hi all, > > > > I’m considering using the cem Stata programme to evaluate the impact of a > welfare-to-work programme in the UK. However, I have never used cem before > so I am trying to understand some basic issues before proceeding with the > estimation. > > > > The first thing I’d be interested in understanding is: How does one handle > situations where after running cem the number of matched strata (and units > within them) are very small? > > > > Applying the cem algorithm to data from a previous impact evaluation I get: > > > > Number of strata: 8883 > > Number of matched strata: 132 > > > > 0 1 > > All 8208 1584 > > Matched 179 141 > > Unmatched 8029 1443 > > > > If I calculate the ATT using cem matched data I get an impact estimate > which is positive (around 5ppts; based on 320 obs only) while using > psmatch2 on all data (i.e. not only those in cem matched strata; around > 8,237 obs are used) with kernel weights I get an estimate of around > -5.7ppts. This means I reach opposite conclusions about the impact of the > programme of interest using cem and psmatch2. > > > > I understand the cem-based estimates are based on better matched data > (i.e. produce less biased estimates) compared to my psmatch2 estimate with > kernel weights) but this comes at the expense of external validity: > inference on the initial population is made based on a very small subset of > data (estimates based on cem are not statistically significant while my > original estimate was highly significant). Any advice about how one can > handle situations of this type? > > > > Many thanks, > > Sergio > > > > > > > > NatCen Social Research > 35 Northampton Square > London EC1V 0AX > 020 7250 1866 > > Visit our website. www.natcen.ac.uk > Read our latest blog. http://www.natcen.ac.uk/blog > Follow us. @NatCen <https://twitter.com/natcen> > Email us. info(a)natcen.ac.uk > > NatCen Social Research is certificated to ISO/IEC 27001:2013 for > Information Security Management Systems and to ISO 20252:2012, the > international standard for market, opinion and social research. > > Company limited by guarantee. Registered in England No. 4392418. Charity > registered in England and Wales (1091768) and in Scotland (SC038454). > > Confidentiality: The information in this email and any attachments are > confidential and may include some that is legally privileged. It must not > be disclosed to or used by persons other than the intended recipient. If > received in error, please notify us immediately and then delete this > document. > Content: Any views or opinions expressed do not necessarily represent > those of NatCen Social Research. Please note the content of this e-mail may > be intercepted, monitored or recorded for compliance purposes. Sensitive > personal data should not normally be transmitted by e-mail. > Copyright: Copyright in this e-mail and any attachments created by NatCen > Social Research belong to NatCen Social Research unless otherwise stated. > Care: NatCen Social Research shall not be liable to the recipient or any > third party for any loss or damage howsoever arising from this e-mail > and/or its content, including loss or damage caused by virus. It is the > responsibility of the recipient to ensure the opening or use of this > message and any attachments shall not adversely affect systems or data. > > >

7 years, 1 month

Re: [cem] Few matched strata (and individuals within strata) after implementing cem

by Gary King

Hi Sergio, you can adjust the coarsening rather than using the defaults in CEM. more coarse bins will generate more observations. you want to make the choices based on the substance of the variables, and which ones are more important to match finely on Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Wed, Apr 5, 2017 at 10:04 AM, Sergio Salis <Sergio.Salis(a)natcen.ac.uk> wrote: > > Hi all, > > > > I’m considering using the cem Stata programme to evaluate the impact of a > welfare-to-work programme in the UK. However, I have never used cem before > so I am trying to understand some basic issues before proceeding with the > estimation. > > > > The first thing I’d be interested in understanding is: How does one handle > situations where after running cem the number of matched strata (and units > within them) are very small? > > > > Applying the cem algorithm to data from a previous impact evaluation I get: > > > > Number of strata: 8883 > > Number of matched strata: 132 > > > > 0 1 > > All 8208 1584 > > Matched 179 141 > > Unmatched 8029 1443 > > > > If I calculate the ATT using cem matched data I get an impact estimate > which is positive (around 5ppts; based on 320 obs only) while using > psmatch2 on all data (i.e. not only those in cem matched strata; around > 8,237 obs are used) with kernel weights I get an estimate of around > -5.7ppts. This means I reach opposite conclusions about the impact of the > programme of interest using cem and psmatch2. > > > > I understand the cem-based estimates are based on better matched data > (i.e. produce less biased estimates) compared to my psmatch2 estimate with > kernel weights) but this comes at the expense of external validity: > inference on the initial population is made based on a very small subset of > data (estimates based on cem are not statistically significant while my > original estimate was highly significant). Any advice about how one can > handle situations of this type? > > > > Many thanks, > > Sergio > > > > > > NatCen Social Research > 35 Northampton Square > London EC1V 0AX > 020 7250 1866 > > Visit our website. www.natcen.ac.uk > Read our latest blog. http://www.natcen.ac.uk/blog > Follow us. @NatCen <https://twitter.com/natcen> > Email us. <info(a)natcen.ac.uk>info(a)natcen.ac.uk > > NatCen Social Research is certificated to ISO/IEC 27001:2013 for > Information Security Management Systems and to ISO 20252:2012, the > international standard for market, opinion and social research. > > Company limited by guarantee. Registered in England No. 4392418. Charity > registered in England and Wales (1091768) and in Scotland (SC038454). > > Confidentiality: The information in this email and any attachments are > confidential and may include some that is legally privileged. It must not > be disclosed to or used by persons other than the intended recipient. If > received in error, please notify us immediately and then delete this > document. > Content: Any views or opinions expressed do not necessarily represent > those of NatCen Social Research. Please note the content of this e-mail may > be intercepted, monitored or recorded for compliance purposes. Sensitive > personal data should not normally be transmitted by e-mail. > Copyright: Copyright in this e-mail and any attachments created by NatCen > Social Research belong to NatCen Social Research unless otherwise stated. > Care: NatCen Social Research shall not be liable to the recipient or any > third party for any loss or damage howsoever arising from this e-mail > and/or its content, including loss or damage caused by virus. It is the > responsibility of the recipient to ensure the opening or use of this > message and any attachments shall not adversely affect systems or data. >

7 years, 1 month

Few matched strata (and individuals within strata) after implementing cem

by Sergio Salis

Hi all, I'm considering using the cem Stata programme to evaluate the impact of a welfare-to-work programme in the UK. However, I have never used cem before so I am trying to understand some basic issues before proceeding with the estimation. The first thing I'd be interested in understanding is: How does one handle situations where after running cem the number of matched strata (and units within them) are very small? Applying the cem algorithm to data from a previous impact evaluation I get: Number of strata: 8883 Number of matched strata: 132 0 1 All 8208 1584 Matched 179 141 Unmatched 8029 1443 If I calculate the ATT using cem matched data I get an impact estimate which is positive (around 5ppts; based on 320 obs only) while using psmatch2 on all data (i.e. not only those in cem matched strata; around 8,237 obs are used) with kernel weights I get an estimate of around -5.7ppts. This means I reach opposite conclusions about the impact of the programme of interest using cem and psmatch2. I understand the cem-based estimates are based on better matched data (i.e. produce less biased estimates) compared to my psmatch2 estimate with kernel weights) but this comes at the expense of external validity: inference on the initial population is made based on a very small subset of data (estimates based on cem are not statistically significant while my original estimate was highly significant). Any advice about how one can handle situations of this type? Many thanks, Sergio NatCen Social Research 35 Northampton Square London EC1V 0AX 020 7250 1866 Visit our website. www.natcen.ac.uk Read our latest blog. natcenblog.blogspot.com Follow us. @NatCen Email us. info(a)natcen.ac.uk NatCen Social Research is certificated to ISO/IEC 27001:2013 for Information Security Management Systems and to ISO 20252:2012, the international standard for market, opinion and social research. Company limited by guarantee. Registered in England No. 4392418. Charity registered in England and Wales (1091768) and in Scotland (SC038454). Confidentiality: The information in this email and any attachments are confidential and may include some that is legally privileged. It must not be disclosed to or used by persons other than the intended recipient. If received in error, please notify us immediately and then delete this document. Content: Any views or opinions expressed do not necessarily represent those of NatCen Social Research. Please note the content of this e-mail may be intercepted, monitored or recorded for compliance purposes. Sensitive personal data should not normally be transmitted by e-mail. Copyright: Copyright in this e-mail and any attachments created by NatCen Social Research belong to NatCen Social Research unless otherwise stated. Care: NatCen Social Research shall not be liable to the recipient or any third party for any loss or damage howsoever arising from this e-mail and/or its content, including loss or damage caused by virus. It is the responsibility of the recipient to ensure the opening or use of this message and any attachments shall not adversely affect systems or data.

7 years, 1 month

Exact matching

by ASHKAN MOAZZEZ

Hi everyone, I have two questions: Question 1. I created this sample dataset (test): code Age open outcome 1 A 12 0 1 2 B 15 0 0 3 C 18 0 1 4 D 12 1 0 5 E 18 1 1 6 F 20 1 0 When I run this command: todrop <- c("outcome", "code") cem2 <- cem (treatment = "open", data = test, drop = todrop , k2k=TRUE) I get this data back : code Age open outcome <chr> <dbl> <dbl> <dbl> 1 A 12 0 1 2 C 18 0 1 3 D 12 1 0 4 F 20 1 0 When I use matchit match <- matchit(open ~ Age, test, method = "exact") I get this result code Age open outcome weights subclass 1 A 12 0 1 1 1 3 C 18 0 1 1 2 4 D 12 1 0 1 1 5 E 18 1 1 1 2 So, my question is why CEM does not chose the record "E" with age 18 and chooses the one with age 20. Is the exact method in matchit more accurate than CEM in this case? Question 2. I have a database with 140k records and 440 variables, which I want to match on only 20 variables. If I want to use CEM, is there an easy way to include those 20 variables, and not drop the other 420? Thanks a lot in advance. -Ashkan

7 years, 3 months

SPSS 24

by ASHKAN MOAZZEZ

Hi, I am trying to install CEM for SPSS 24. The most updated version is for V23. I receive an error message "SPSS not found. Aborting installation". Am I doing something wrong or the versions won't match? Best, -Ashkan

7 years, 3 months

cem by stratum

by Zack Mabel

Hello, I am looking into using CEM in a context where I need to match subsets of the treatment group separately based on the timing of treatment (unfortunately, the simple solution of using time as a matching component won't work in my case). After matching, I would like to combine the matched subsets to estimate a pooled treatment effect. Do the cem_weights need to be revised to account for pooling? If so, is there a reference that describes how the cem_weights are created? I understand their basic purpose - to account for differential strata sizes - but I'm not clear on the actual formula that is used to generate the weights for matched controls. Thank you, Zack

7 years, 3 months

Re: [cem] 2-level CEM?

by Gary King

Hi Catherine, thanks for your note to the list. It sounds like you could define this as at one level but with multiple (rather than binary) treatment regimes. Our papers on cem explain how that works. Best of luck with your research. Gary King --- GaryKing.org 617-500-7570 On Dec 2, 2016 4:49 PM, "Catherine E Hendrick" <emily.hendrick(a)utexas.edu> wrote: I am considering using Coarsened Exact Matching in order to study the effects of teen mothers' degree type (GED v. HS diploma) on long-term outcomes. In order to sufficiently address my research questions, it seems that I may need two levels of matching: 1) matching those who attained a GED with those who attained a HS diploma (to isolate the effects of degree type on outcomes) 2) matching teen mothers with women who began childbearing after the teenage years (to determine if diploma type effects on later outcomes are the same or different for women who began childbearing during the teenage years v. later) I haven't seen any literature using CEM that conducts two levels of matching as I am proposing. Do you know of literature where researchers have done this and/or do you see any methodological reason for NOT conducting two levels of matching, as I have proposed, when using CEM? Many thanks for any information you're able to pass along.

7 years, 4 months

Re: [cem] 2-level CEM? Cem Digest, Vol 83, Issue 1

by cesare riillo

Dear Catherine E Hendrick, I think your problem can be easily re-conducted to the case of Multiple treatment doses (e.g., doses of a drug). Among other see section 6.1.4 of Stuart (2010) paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2943670/ Stat Sci. 2010 Feb 1; 25(1): 1–21. doi: 10.1214/09-STS313 I was using CEM to pairwise compare outcomes of three groups of firms with different treatments (degree of environmental commitment ) http://www.sciencedirect.com/science/article/pii/S0959652616313804 but I miss a formal/ theoretical discussion and I will be very interested on proper guidance on this point. Cesare From: "cem-request(a)lists.gking.harvard.edu" <cem-request(a)lists.gking.harvard.edu> To: cem(a)lists.gking.harvard.edu Sent: Saturday, 3 December 2016, 18:00 Subject: Cem Digest, Vol 83, Issue 1 Send Cem mailing list submissions to cem(a)lists.gking.harvard.edu To subscribe or unsubscribe via the World Wide Web, visit https://lists.gking.harvard.edu/mailman/listinfo/cem or, via email, send a message with subject or body 'help' to cem-request(a)lists.gking.harvard.edu You can reach the person managing the list at cem-owner(a)lists.gking.harvard.edu When replying, please edit your Subject line so it is more specific than "Re: Contents of Cem digest..." Today's Topics: 1. 2-level CEM? (Catherine E Hendrick) ---------------------------------------------------------------------- Message: 1 Date: Fri, 2 Dec 2016 15:49:08 -0600 From: Catherine E Hendrick <emily.hendrick(a)utexas.edu> To: cem(a)lists.gking.harvard.edu Subject: [cem] 2-level CEM? Message-ID: <CAJWC3AAG_ejC3Y2S1FUo6aG2DJ4RFnSCpNh=npijS-PoNCqmbA(a)mail.gmail.com> Content-Type: text/plain; charset="utf-8" I am considering using Coarsened Exact Matching in order to study the effects of teen mothers' degree type (GED v. HS diploma) on long-term outcomes. In order to sufficiently address my research questions, it seems that I may need two levels of matching: 1) matching those who attained a GED with those who attained a HS diploma (to isolate the effects of degree type on outcomes) 2) matching teen mothers with women who began childbearing after the teenage years (to determine if diploma type effects on later outcomes are the same or different for women who began childbearing during the teenage years v. later) I haven't seen any literature using CEM that conducts two levels of matching as I am proposing. Do you know of literature where researchers have done this and/or do you see any methodological reason for NOT conducting two levels of matching, as I have proposed, when using CEM? Many thanks for any information you're able to pass along.

7 years, 5 months

2-level CEM?

by Catherine E Hendrick

I am considering using Coarsened Exact Matching in order to study the effects of teen mothers' degree type (GED v. HS diploma) on long-term outcomes. In order to sufficiently address my research questions, it seems that I may need two levels of matching: 1) matching those who attained a GED with those who attained a HS diploma (to isolate the effects of degree type on outcomes) 2) matching teen mothers with women who began childbearing after the teenage years (to determine if diploma type effects on later outcomes are the same or different for women who began childbearing during the teenage years v. later) I haven't seen any literature using CEM that conducts two levels of matching as I am proposing. Do you know of literature where researchers have done this and/or do you see any methodological reason for NOT conducting two levels of matching, as I have proposed, when using CEM? Many thanks for any information you're able to pass along.

7 years, 5 months

← Newer
1
2
3
4
5
6
7
8
9
...
16
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Cem