Hi there,
This was an issue in an earlier version of CEM, but the latest version on
SSC should have this fixed. Perhaps you can try to reinstall from the repo
and see if it is still an issue? Note that the matching still works and
cem_matched is correct in these situations. Hope that helps!
Cheers,
Matt
On Thu, Jan 11, 2018 at 3:43 AM LEE Matthew <matthew.lee(a)insead.edu> wrote:
Dear all,
I am writing to follow up on a past thread with the same subject line
(original message below). I have encountered the same issue in which after
calling cem in STATA, I have some observations for which cem_matched == 1
but cem_strata is missing. I have not been able to solve it, but do have
some additional clues and would be interested to know if the community has
any ideas here.
It seems that the cem command is truncating the assignment of cem_strata
at a fixed limit of 32,740 strata (I don’t know if this value is general or
specific to my data). When executing my original match (which has many
theoretical strata based on the coarsened variables: 28 buckets X 10 X 5 X
5 X 5 X 5 = 175,000), the assignment of strata stops at 32,740. If I
coarsen further so that the number of theoretical buckets < 32,740 and
re-run CEM, there are no longer missing observations for cem_strata
(unfortunately this further coarsening does not work for my study).
One more clue: the truncation appears to operate according to ordered
values of the first matching variable called by CEM. In my original
matching attempt described above, the first variable was a year variable,
which ranges from 2009-2013. In the results, the cem_strata values are
defined for 2009 and 2010 and stop somewhere in 2011 — subsequent years
have cem_strata missing.
It would be great to know if anyone has further ideas about what might be
going wrong here. Does cem in STATA have a theoretical maximum number of
strata? Could it be a working memory issue?
Many thanks,
Matthew
*Matthew Lee*
Assistant Professor of Strategy
INSEAD | 1 Ayer Rajah Avenue, Singapore 138676
*matthew.lee(a)insead.edu <matthew.lee(a)insead.edu> |
matthewscottlee.com
<https://urldefense.proofpoint.com/v2/url?u=http-3A__matthewscottlee.com&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=CILNH4ze0e8gCCy9emkfZT9QmB8iwJfQmAcJM_WO-PM&s=jPSlCAi47TVi9sUTbdvyG0GIzYbB1T7ldglSNsT04mU&e=>*
*--*
https://lists.gking.harvard.edu/pipermail/cem/2014-September/000154.html
Ben Hoen bhoen at
lbl.gov
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lbl.gov&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=CILNH4ze0e8gCCy9emkfZT9QmB8iwJfQmAcJM_WO-PM&s=vbLbC5Hv_OsO7Q5Lw_bSwZp9dCLqVWhKKMutjzUjyP0&e=>
Tue Sep 9 15:42:53 EDT 2014
Hi all,
I had been using a cem matching output to run regressions and have just now
found that a large set of the output has the variable "cem_matched" ==1
while the "cem_strata" ==. (a.k.a. missing). For those cases, there is
also
a weight stored in "cem_weights".
Is this a common occurance? If so, would you be able to explain when/why
this occurs?
Ben
Ben Hoen
Staff Research Associate
Lawrence Berkeley National Laboratory
Office: 845-758-1896
Cell: 718-812-7589
bhoen at
lbl.gov
<https://urldefense.proofpoint.com/v2/url?u=http-3A__lbl.gov&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=CILNH4ze0e8gCCy9emkfZT9QmB8iwJfQmAcJM_WO-PM&s=vbLbC5Hv_OsO7Q5Lw_bSwZp9dCLqVWhKKMutjzUjyP0&e=>
<http://emp.lbl.gov/staff/ben-hoen
<https://urldefense.proofpoint.com/v2/url?u=http-3A__emp.lbl.gov_staff_ben-2Dhoen&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=CILNH4ze0e8gCCy9emkfZT9QmB8iwJfQmAcJM_WO-PM&s=8h3bhYZjpHp1T71FIvh7t8bhonCkSUALdIAgP2wV3kk&e=>>
http://emp.lbl.gov/staff/ben-hoen
<https://urldefense.proofpoint.com/v2/url?u=http-3A__emp.lbl.gov_staff_ben-2Dhoen&d=DwMGaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=CILNH4ze0e8gCCy9emkfZT9QmB8iwJfQmAcJM_WO-PM&s=8h3bhYZjpHp1T71FIvh7t8bhonCkSUALdIAgP2wV3kk&e=>
--