To follow up slightly, the amount of information in the missing data
dictates how many imputations you need to get the right answer. We
can't, of course, observe this since it is defined by the missing
data.
A different issue (very much related to Gary's point) is the validity
of the assumptions underlying the procedure. The central assumption
(called MAR, or missing at random), is the following: the missing
values can depend on variables we observe, but not on unobserved
variables. This, again, is an untestable assumption, at least from the
data at hand. In addition, increasing the number of imputations will
not fix the problem. The best solution, as Gary points out, is to
collect more data.
A more satisfying approach may be to make sure the imputation model is
rich enough for the amount of missing data. That is, there are enough
observed variables to make the MAR assumption plausible.
regards,
matt.
On Fri, Apr 25, 2008 at 4:46 AM, Gary King <king(a)harvard.edu> wrote:
increasing the number of imputations will help with simulation error if you
have lots of missingness.
but the big problem in this situation is model-dependence. you don't want
your answers to depend heavily on your choices of an imputation model. but
the more missingness you have, the more model dependent your inferences will
be. this is true whether you use Amelia II or any other method. there
isn't much you can do about this other than either (a) go out and collect
some of the missing observations, and/or (b) remove imputations that require
inferences outside of or far from the convex hull (see the first 2 papers at
http://gking.harvard.edu/projects/cause.shtml)
Gary
On Thu, 24 Apr 2008, Gustavo de las Casas wrote:
Is there a cut-off for rate of missingness, past
which we should employ
other methods (i.e. Not Amelia 2)? Or does it depend on the
diagnostic
results?
More specifically, if my imputations:
a) don't give me error 34 (which says there is not enough data to do
imputations
properly) and;
b) my diagnostics seem kosher (distributions of imputed/actual
observations
overlap nicely, there is convergence, etc.),
can I relax about the rate of missingness in the original data?
Simply: I got a time-series, cross-sectional dataset. 10 years, 50
countries. 6
independent vars. Of the 6, 3 have 65% missingness. Yet, these
3 independent vars with 65% missingness have significant relationships with
the rest of the vars, and Amelia 2 was able to give me a decent-looking
imputation. [I can offer the misschk results from Stata if necessary to
answer this question.]
Is there a cut-off in the fraction of missingness past which I must worry?
Or
Amelia would have already told me so?
King also mentions that upping the imputations (to, say, 10) can help deal
with
higher rates of missingness. Something I should do just to make sure?
You can also direct me to somewhere in the literature where you think this
is
specifically addressed. Thanks much.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: