[cem] RE: Stata Question - Types of Autocuts and "Coarseness"

7 Apr 2011

Dear CEM list,

I am using CEM to match individuals receiving workers compensation from
an injury to non-injured workers.  We have several continuous variables
(e.g., income, firm size, age), a categorical variable (e.g., industry)
as well as some dichotomous variables (e.g., gender, born in state).

The sample is quite large with many more  potential controls (1.2
million) than injured workers (4 thousand).  Prior to using CEM I
coarsened the data myself by putting income into quintiles, four firm
size categories, 4 age groups, and 10 industry categories.  I then ran
CEM with automatic cuts.  However, based upon the sample size Stuge's
Rule creates 22 bins for each variable which in many cases don't exist
(1/2 a woman).  The bins tend not to be very "coarse" with approximately
2,000 strata.

To try and improve this, I put in some cut points similar (coarser than
above mention) and then the program never seemed to finish running (2
days later I killed it).

Thus, I am thinking of using a different set of auto cuts, but I think
the Freedman-Diaconis rule would yield even more cutpoints and I wasn't
sure what other algorithms were available (none listed in the Stata
Journal Article).

Do you have any suggestion how to coarsen the data further so that I can
get the most out of the program?   

Thanks in advance for your help!

Ethan Scherer MPP, CPA

Doctoral Fellow, Pardee RAND Graduate School

1776 Main St., Mailstop M1N

Santa Monica, CA 90401

W: 310-393-0411 x6056

E: escherer(a)rand.org

__________________________________________________________________________

This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

[cem] RE: Stata Question - Types of Autocuts and "Coarseness"