Just a quick answer to experiment with coarsening: use option
eval.imb=FALSE
inside the cem() command [or equivalent in Stata]
what slows down the computation is the L1 measure not the coarsening or cem itself
we are working on this issue for the next release
stefano
Inviato da iPhone
Il giorno 07/apr/2011, alle ore 16:25, "Scherer, Ethan"
<escherer(a)prgs.edu> ha scritto:
Dear CEM list,
I am using CEM to match individuals receiving workers compensation from an injury to
non-injured workers. We have several continuous variables (e.g., income, firm size, age),
a categorical variable (e.g., industry) as well as some dichotomous variables (e.g.,
gender, born in state).
The sample is quite large with many more potential controls (1.2 million) than injured
workers (4 thousand). Prior to using CEM I coarsened the data myself by putting income
into quintiles, four firm size categories, 4 age groups, and 10 industry categories. I
then ran CEM with automatic cuts. However, based upon the sample size Stuge’s Rule
creates 22 bins for each variable which in many cases don’t exist (1/2 a woman). The bins
tend not to be very “coarse” with approximately 2,000 strata.
To try and improve this, I put in some cut points similar (coarser than above mention)
and then the program never seemed to finish running (2 days later I killed it).
Thus, I am thinking of using a different set of auto cuts, but I think the
Freedman-Diaconis rule would yield even more cutpoints and I wasn’t sure what other
algorithms were available (none listed in the Stata Journal Article).
Do you have any suggestion how to coarsen the data further so that I can get the most out
of the program?
Thanks in advance for your help!
Ethan Scherer MPP, CPA
Doctoral Fellow, Pardee RAND Graduate School
1776 Main St., Mailstop M1N
Santa Monica, CA 90401
W: 310-393-0411 x6056
E: escherer(a)rand.org
__________________________________________________________________________
This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.