Re: [cem] Few matched strata (and individuals within strata) after implementing cem

6 Apr 2017

Hi Gary,

Thanks for letting me know about the MatchingFrontier package, I’ll definitely explore
it.

Sergio

From: Gary King [mailto:thegaryking@gmail.com]
Sent: 05 April 2017 19:00
To: Sergio Salis
Cc: cem(a)lists.gking.harvard.edu
Subject: Re: [cem] Few matched strata (and individuals within strata) after implementing
cem

Sergio, the idea you describe in your first paragraph has been formalized
with this algorithm <http://projects.iq.harvard.edu/frontier/home>, and so
that's another option.  With CEM, you would decide how important it is to
get matches for each variable and coursen more for less important
variables.

Gary
---
http://gking.harvard.edu
617-500-7570

On Wed, Apr 5, 2017 at 12:32 PM, Sergio Salis
<Sergio.Salis@natcen.ac.uk<mailto:Sergio.Salis@natcen.ac.uk>> wrote:
Hi Gary,

Thanks very much for your advice. I understand the idea is trying different coarsening
strategies (among those which make sense) for each variable and see which one produces the
lowest imbalance, measured by means of the Multivariate L1 distance (the univariate
imbalances should also be looked at individually). Is this correct?

For variables like income and assets I guess it makes sense to use percentiles as there is
no obvious value to create cut-off points. If so, shall I use

cem var1 var 2 …. income(P1 P2 …. Pn) , treatment(treated)

(where P1=value of the 1st percentile, P2=value of the 2nd percentile ….. Pn=value of the
last percentile)?

(#10) will produce 10 equally sized bins but I am not sure whether equal size means equal
base (e.g. bin 1 includes those with income between 1 to 10, bin 2 those with income 11 to
20, etc.) or equal frequencies (in which case a bin defines a percentile). I am also not
sure what Sturge's rule and Scott’s algorithm are, I cannot find any description in
the Stata help file.

Thanks again for your help< very much appreciated.

Sergio

From: Gary King [mailto:king@harvard.edu<mailto:king@harvard.edu>]
Sent: 05 April 2017 16:06
To: Sergio Salis
Cc: cem@lists.gking.harvard.edu<mailto:cem@lists.gking.harvard.edu>
Subject: Re: [cem] Few matched strata (and individuals within strata) after implementing
cem

Hi Sergio, you can adjust the coarsening rather than using the defaults in CEM.  more
coarse bins will generate more observations.  you want to make the choices based on the
substance of the variables, and which ones are more important to match finely on

Gary
--
Gary King - Albert J. Weatherhead III University Professor - Director,
IQSS<http://iq.harvard.edu/> - Harvard University
GaryKing.org<http://GaryKing.org> - King@Harvard.edu<mailto:King@Harvard.edu>
- @KingGary<https://twitter.com/kinggary> - 617-500-7570<tel:(617)%20500-7570>
- Assistant<mailto:king-assist@iq.harvard.edu>:
617-495-9271<tel:(617)%20495-9271>

On Wed, Apr 5, 2017 at 10:04 AM, Sergio Salis
<Sergio.Salis@natcen.ac.uk<mailto:Sergio.Salis@natcen.ac.uk>> wrote:

Hi all,

I’m considering using the cem Stata programme to evaluate the impact of a welfare-to-work
programme in the UK. However, I have never used cem before so I am trying to understand
some basic issues before proceeding with the estimation.

The first thing I’d be interested in understanding is: How does one handle situations
where after running cem the number of matched strata (and units within them) are very
small?

Applying the cem algorithm to data from a previous impact evaluation I get:

Number of strata: 8883
Number of matched strata: 132

             0     1
      All                            8208  1584
  Matched                     179   141
Unmatched               8029  1443

If I calculate the ATT using cem matched data I get an impact estimate which is positive
(around 5ppts; based on 320 obs only) while using psmatch2 on all data (i.e. not only
those in cem matched strata; around 8,237 obs are used) with kernel weights I get an
estimate of around -5.7ppts. This means I reach opposite conclusions about the impact of
the programme of interest using cem and psmatch2.

I understand the cem-based estimates are based on better matched data (i.e. produce less
biased estimates) compared to my psmatch2 estimate with kernel weights) but this comes at
the expense of external validity: inference on the initial population is made based on a
very small subset of data (estimates based on cem are not statistically significant while
my original estimate was highly significant). Any advice about how one can handle
situations of this type?

Many thanks,
Sergio

NatCen Social Research
35 Northampton Square
London EC1V 0AX
020 7250 1866

Visit our website. www.natcen.ac.uk<http://www.natcen.ac.uk>
Read our latest blog. http://www.natcen.ac.uk/blog
Follow us. @NatCen <https://twitter.com/natcen>
Email us. info@natcen.ac.uk<mailto:info@natcen.ac.uk>

NatCen Social Research is certificated to ISO/IEC 27001:2013 for Information Security
Management Systems and to ISO 20252:2012, the international standard for market, opinion
and social research.

Company limited by guarantee. Registered in England No. 4392418. Charity registered in
England and Wales (1091768) and in Scotland (SC038454).

Confidentiality: The information in this email and any attachments are confidential and
may include some that is legally privileged. It must not be disclosed to or used by
persons other than the intended recipient. If received in error, please notify us
immediately and then delete this document.
Content: Any views or opinions expressed do not necessarily represent those of NatCen
Social Research. Please note the content of this e-mail may be intercepted, monitored or
recorded for compliance purposes. Sensitive personal data should not normally be
transmitted by e-mail.
Copyright: Copyright in this e-mail and any attachments created by NatCen Social Research
belong to NatCen Social Research unless otherwise stated.
Care: NatCen Social Research shall not be liable to the recipient or any third party for
any loss or damage howsoever arising from this e-mail and/or its content, including loss
or damage caused by virus. It is the responsibility of the recipient to ensure the opening
or use of this message and any attachments shall not adversely affect systems or data.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

Re: [cem] Few matched strata (and individuals within strata) after implementing cem