Hi Antonio,
One way to think about missing data is that it is a selection problem
of its own. Listwise deletion selects observations based on their
completeness, so it's only unbiased if the partially observed data has
the same distribution as the fully observed data (though, you do lose
some efficiency). The flip-side of this is that selection problems are
just a special type of missing data problem.
You could try to build a bigger model that addresses both the
missingness and the selection together, but this would be quite a
modeling task. Your suggested approach of imputing, then applying the
selection model to the imputations should work is a simpler
alternative that should work. You also might want to check out CEM
(
http://gking.harvard.edu/cem/) as it has ways of matching with
multiply imputed data.
Cheers,
matt.
On Tue, Feb 1, 2011 at 5:12 PM, Antonio P. Ramos
<ramos.grad.student(a)gmail.com> wrote:
Hi all,
I've a question about multiple imputation for a data set that will be later
analysed using a selection models. I am re-analysing the chapter 5 of
Przeworski at all Democracy and Development, on the effects of political
regime on demography, where the authors use selection models (dynamic probit
version of the Heckman models) to account for regime selection effects.
There is a massive number of missing data in their analysis (sometimes they
use less than a 1/3 of the observations). However, my main concern is that
both, the missing data mechanism and the selection process are not enough
independent.
From my understanding both selection models and multiple imputation are
trying to account for missing data, but, perhaps, in different ways. Yet, it
is not clear to me how to compare them. One way is this: Multiple imputation
is trying to help us is using all information available in the data set,
without creating actually new information. In fact the missing cells should
be fulfilled based on the information available from the other cells. On the
other hand, selection models are actually supposed to generate new data, as
if there is no selection process going on and thus the assignment of the
treatment and the control groups are actually random. If this is the right
direction about how to think, then maybe running selection models after
multiple imputation with Amelia would not be a problem. But I am not
sure...any suggestions?
Help and advice really appreciated,
Best,
Antonio.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia