Without seeing the characteristics of your data it is impossible to
determine if this issue is data-related or program related. However, I would
argue that it is certainly possible that casting the net out wider would
result in fewer matches if in fact those additional "non-treated" units are
substantially different than the treated units (and for that matter,
different from the non-treated units in the more local geography).
You could easily check this hypothesis by comparing balance stats on the
non-treated "near" vs non-treated "far" vs treated. If you see that
the
"non-treated" far are substantially different than the non-treated
"near" or
the treated units.
Again, this is only based on the assumption that this a issue with the
characteristics of the groups, and not something else that's driving the
discrepancy.
I hope this helps
Ariel
From: cem-bounces(a)lists.gking.harvard.edu
[mailto:cem-bounces@lists.gking.harvard.edu] On Behalf Of Ben Hoen
Sent: Monday, July 07, 2014 9:37 PM
To: cem(a)lists.gking.harvard.edu
Subject: [cem] Understaning CEM's use of a categorical variable and #0
Hi all,
I have been using the program cem in Stata (Version 13 MP, with Windows 7
Pro 64 bit), and thought I understood what it was doing well enough but
today something occurred which surprised (read worried) me, in that it acted
as I would NOT have expected it to.
I am trying to match target (i.e,, treated) homes to similar (i.e.,
"comparable") homes that do not have the treatment. In this case, the
"treatment" is whether the home does or does not have a photovoltaic energy
system (pv). I have 100 pv homes (treated), and ~ 5,000 non-pv homes
(comparable).
To match these homes I am using some basic characteristics of the home -
e.g., square feet of living space (sfla), size of the parcel (acres), age of
the home (age), as well as the year in which it sold (sale year) to ensure
the comparable home sold in the same year as the target home and, finally, a
geographic variable (such as the block group) to ensure the comparable home
is located in the same geography. For sale year and the geogrpahy, they must
match perfectly; i.e., the comparable homes must have sold in the same year
as the target (pv) home and also be located in the same geography. For the
purposes of this discussion those geographies could be either the census
block group (blockgroup) or the county (county). All of the block groups
fall within the counties, and there are many more block groups than counties
delineated in the data. For example, I have approximately 30 block groups
(each with at least one treated and one comparable case) and 10 counties
(each with at least one treated and one comparable). In practice, though, in
most geographies I have ~ 20-50 times the number of pv homes available as
comparables to match to.
Using the sample data and talking to local experts, I have established
appropriate cut points for my various characteristics and run a command
similar to the following, when blockgroup is used as the geography:
cem sfla(0 1000 2000 3000 5000) age(0 1 10 20 100) acres(0.05 0.15 0.5 1 10)
saleyear(#0) blockgroup(#0) , treatment(pv)
And the following, when county is used as the geography:
cem sfla(0 1000 2000 3000 5000) age(0 1 10 20 100) acres(0.05 0.15 0.5 1 10)
saleyear(#0) county(#0) , treatment(pv)
So, here's the confusing part:
I will have ~ 70 matching pv homes, and 300 comparable homes if blockgroup
is used, but only 20 matching pv homes, and 100 comparables homes if county
is used. In other words, when I allow a broader geography of comparables to
be drawn from, I get fewer matching cases. i would think the exact opposite
would be the case; if a cast a broader geographic net, I would have more
matches not less.
Any ideas why this would occur?
Thanks, in advance, for any insight you could offer.
Ben
Berkeley Lab
Ben Hoen
Staff Research Associate
Lawrence Berkeley National Laboratory
Office: 845-758-1896
Cell: 718-812-7589
bhoen(a)lbl.gov
http://emp.lbl.gov/staff/ben-hoen
Visit our publications at:
http://emp.lbl.gov/reports/re
Sign up for our email list to receive publication notifications at:
https://spreadsheets.google.com/a/lbl.gov/spreadsheet/viewform?formkey=dGlFS
1U1NFlUNzQ1TlBHSzY2VGZuN1E6MQ