Fraction of missingness question

List overview All Threads
Download

newer

older

Amelia Update and Diagnostics

add range prior

Gustavo de las Casas

24 Apr 2008 24 Apr '08

8:24 a.m.

Is there a cut-off for rate of missingness, past which we should employ other methods (i.e. Not Amelia 2)? Or does it depend on the diagnostic results? More specifically, if my imputations: a) don't give me error 34 (which says there is not enough data to do imputations properly) and; b) my diagnostics seem kosher (distributions of imputed/actual observations overlap nicely, there is convergence, etc.), can I relax about the rate of missingness in the original data? Simply: I got a time-series, cross-sectional dataset. 10 years, 50 countries. 6 independent vars. Of the 6, 3 have 65% missingness. Yet, these 3 independent vars with 65% missingness have significant relationships with the rest of the vars, and Amelia 2 was able to give me a decent-looking imputation. [I can offer the misschk results from Stata if necessary to answer this question.] Is there a cut-off in the fraction of missingness past which I must worry? Or Amelia would have already told me so? King also mentions that upping the imputations (to, say, 10) can help deal with higher rates of missingness. Something I should do just to make sure? You can also direct me to somewhere in the literature where you think this is specifically addressed. Thanks much. - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

Show replies by date

Gary King

25 Apr 25 Apr

4:46 a.m.

...

- Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

Matt Blackwell

11:18 a.m.

To follow up slightly, the amount of information in the missing data dictates how many imputations you need to get the right answer. We can't, of course, observe this since it is defined by the missing data. A different issue (very much related to Gary's point) is the validity of the assumptions underlying the procedure. The central assumption (called MAR, or missing at random), is the following: the missing values can depend on variables we observe, but not on unobserved variables. This, again, is an untestable assumption, at least from the data at hand. In addition, increasing the number of imputations will not fix the problem. The best solution, as Gary points out, is to collect more data. A more satisfying approach may be to make sure the imputation model is rich enough for the amount of missing data. That is, there are enough observed variables to make the MAR assumption plausible. regards, matt. On Fri, Apr 25, 2008 at 4:46 AM, Gary King <king(a)harvard.edu> wrote:

...

Is there a cut-off for rate of missingness, past which we should employ

other methods (i.e. Not Amelia 2)? Or does it depend on the diagnostic results?

More specifically, if my imputations: a) don't give me error 34 (which says there is not enough data to do

imputations

properly) and; b) my diagnostics seem kosher (distributions of imputed/actual

observations overlap nicely, there is convergence, etc.),

can I relax about the rate of missingness in the original data? Simply: I got a time-series, cross-sectional dataset. 10 years, 50

countries. 6 independent vars. Of the 6, 3 have 65% missingness. Yet, these 3 independent vars with 65% missingness have significant relationships with the rest of the vars, and Amelia 2 was able to give me a decent-looking imputation. [I can offer the misschk results from Stata if necessary to answer this question.]

Is there a cut-off in the fraction of missingness past which I must worry?

Or Amelia would have already told me so?

King also mentions that upping the imputations (to, say, 10) can help deal

with higher rates of missingness. Something I should do just to make sure?

You can also direct me to somewhere in the literature where you think this

is specifically addressed. Thanks much.

- Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

Gustavo de las Casas

26 Apr 26 Apr

2:09 p.m.

Follow-up q to the answers by King n Blackwell. First of all, thanks f/the help. The follow-up is whether we can/should use linear interpolation to check against the multiple imputations. More below. Going back to my story (skip this para. if u remember): I got a time-series, cross-sectional dataset. 10 years, 50 countries. 6 independent vars. Of the 6, 3 have 65% missingness. Yet, these 3 independent vars with 65% missingness have significant relationships with the rest of the vars, and Amelia 2 was able to give me a decent-looking imputation. i.e. diagnostics look fine. Of the 3 independent vars with missingness, 2 have this feature: Out of 10 possible years for data, these two variables each have values in 2 years only. (Note I say "each", so the previous feature applies to each individual variable). Further, the years with values for these 2 variables are 8 years apart. Remember though, that for the other vars in the dataset, there is data for all those in-between years, that these other vars have significant relationships with the vars w/missingness, and that Amelia has been able to give me good results so far. Finally, because these 2 vars w/missingness really came from 2 cross-sectional datasets (at t and t+8), I didn't treat the data as time-series in Amelia 2 (that is, I performed the MIs without a time indicator). A suggestion has been to linearly interpolate between t and t+8 for these 2 vars w/65% missingness, kind of as a check of the Amelia results. Is it necessary in principle? I see it as an inexpensive "just to make sure" test. Any thoughts or particular cautions I should take? I'm implementing the previous advice I got here, btw. My position on this stuff, as mentioned before, is that the best way to know whether MI can work, is to try it and then diagnose the results. That is, we should not dismiss a dataset for MI treatment just because it looks to have a worrisome pattern of missingness - in fact, these are some of the best opportunities to put MI to work. Is this on the money? Thanks. Quoting Gary King <king(a)harvard.edu>du>:

...

increasing the number of imputations will help with simulation error if you have lots of missingness. but the big problem in this situation is model-dependence. you don't want your answers to depend heavily on your choices of an imputation model. but the more missingness you have, the more model dependent your inferences will be. this is true whether you use Amelia II or any other method. there isn't much you can do about this other than either (a) go out and collect some of the missing observations, and/or (b) remove imputations that require inferences outside of or far from the convex hull (see the first 2 papers at http://gking.harvard.edu/projects/cause.shtml) Gary On Thu, 24 Apr 2008, Gustavo de las Casas wrote: > Is there a cut-off for rate of missingness, past which we should > employ other methods (i.e. Not Amelia 2)? Or does it depend on the > diagnostic results? > > More specifically, if my imputations: > a) don't give me error 34 (which says there is not enough data to > do imputations > properly) and; > b) my diagnostics seem kosher (distributions of imputed/actual > observations overlap nicely, there is convergence, etc.), > > can I relax about the rate of missingness in the original data? > > Simply: I got a time-series, cross-sectional dataset. 10 years, 50 > countries. 6 independent vars. Of the 6, 3 have 65% missingness. > Yet, these 3 independent vars with 65% missingness have significant > relationships with the rest of the vars, and Amelia 2 was able to > give me a decent-looking imputation. [I can offer the misschk > results from Stata if necessary to answer this question.] > > Is there a cut-off in the fraction of missingness past which I must > worry? Or Amelia would have already told me so? > > King also mentions that upping the imputations (to, say, 10) can > help deal with higher rates of missingness. Something I should do > just to make sure? > > You can also direct me to somewhere in the literature where you > think this is specifically addressed. Thanks much. > - > Amelia mailing list served by Harvard-MIT Data Center > [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

- Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

5853

days inactive

5855

days old

amelia@lists.gking.harvard.edu

Manage subscription

3 comments

3 participants

tags (0)

participants (3)

Gary King
Gustavo de las Casas
Matt Blackwell