I have encountered this question in my own research recently, although with large-T, small-N data. For what it's worth, this is what I've pieced together from the manuals and paper, and my other readings. Can anyone tell me if I'm way off on this?
There will always be some uncertainty about our estimates, because
they are simulations that represent possible values of data that we do
not have. The only sure-fire means of validating the imputations is to
have the actual values, which would eliminate the need for imputation.
Ultimately, you have to make a judgment about the credibility of the
imputation model itself — does it create reasonable estimates?
AMELIA II offers two diagnostics tools for judging imputed values —
compare and overimpute (both explained in Prof. King's recommended readings).
The former command lets you compare the distribution of
reported and imputed variable’s values. Ask yourself, should the missing values have
the same distribution as reported values, or should their distribution
have a different central tendency, dispersion and/or skew than the
reported values? The graphs produced here will allow you to assess your imputation's conformity to these expectations.
Compare will allow you to check whether these
expectations about imputed value distributions are fulfilled. Imputed values’s distribution do not have
to match the distribution of reported values, but the differences
between the two should be explainable.
Overimpute treats a sub-sample of
your reported data as if it were missing, then allows you to compare
simulated and actual values. For this test, you are basically trying to
see whether your imputation model renders predictions that approximate
actual values.
Under some circumstances, your model will not predict
extreme values well. This has happened to me some times. My principle
concern here is that these extreme values do not constitute much of
your sample, or do not present data points with undue influence on your
results. If such observations influence your model, then you have a concern with which I have not yet dealt.
In addition, you can use your preferred spreadsheet or statistical
package to graph reported and imputed values within panels. Compare the
imptued and reported values within panels, and ask yourself whether the
imputed models make sense. If the variable is expected to take the form
of a random walk, then do the imputed values also suggest such a walk?
If the variable is one that maintains stable trends within panels over
time, do the imputed values roughly approximate this stable trend?
There is always judgment involved, and it is important that you are
able to make a case for the reader to believe in your imputations.
This is how I've interpreted the materials recommended by Prof. King, but I am not a leading expert in missing data imputation. If I am completely mistaken, someone please tell me.
Joe
Joseph Nathan Cohen
Assistant Professor of Sociology
City University of New York, Queens College
Powdermaker 252CC
65-30 Kissena Blvd
Flushing, NY 11367
e-mail: joseph.cohen(a)qc.cuny.edu
web: www.josephncohen.com
----- Original Message ----
From: "Blais, Martin" <blais.martin(a)uqam.ca>
To: king(a)harvard.edu; amelia(a)lists.gking.harvard.edu
Cc: "Raymond, Sarah" <raymond.sarah(a)uqam.ca>
Sent: Tuesday, January 29, 2008 8:56:30 PM
Subject: RE : [amelia] longitudinal data imputation
RE : [amelia] longitudinal data imputation
Thank you, I'll have a look to it!
Martin
-------- Message d'origine--------
De: Gary King [mailto:king@harvard.edu]
Date: mar. 2008-01-29 18:32
À: Blais, Martin; amelia(a)lists.gking.harvard.edu
Cc: Raymond, Sarah
Objet : Re: [amelia] longitudinal data imputation
Have a look at the paper by Honaker and King on this subject at the web site. That and the manual should do it.
---
Sent from my phone; please excuse the terse note.
Gary King
http://gking.harvard.edu
-----Original Message-----
From: "Blais, Martin" <blais.martin(a)uqam.ca>
Date: Tue, 29 Jan 2008 17:06:03
To:amelia@lists.gking.harvard.edu
Cc:"Raymond, Sarah" <raymond.sarah(a)uqam.ca>
Subject: [amelia] longitudinal data imputation
Hello
We are using longitudinal data (3 time-point) to evaluate the effects of an intervention program. We have missing data (about 80% of missing cases at T2 for the control group and some missing cases for both experimental and control group at T3) and want to impute missing data using Amelia II.
I am looking for detail procedures for imputation of longitudinal missing data with Amelia II to answer simple questions like: Should the data be organised in long or wide format? Are there any tutorials (website, technical papers) available for this purpose beside the Amelia II documentation?
Any reference on imputation of longitudinal data (mostly about the options and the best practices in the context of program evaluation) are also welcome.
Very many thanks for any help,
Martin Blais, Ph.D.
Professeur
Département de sexologie
Université du Québec à Montréal
C.P. 8888, succ. Centre-ville
Montréal (Québec)
Canada H3C 3P8
Vox : (514) 987-3000 poste 4031
Fax : (514) 987-6787
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi,
I got the following error message from Amelia:
> ameliaoutput10<-amelia(merge24,
p2s=2,lgstc=c(8,9,33,40,82:96),ords=c(10,37,38,39,49,50,51,52,53,72:81),
logs=c(3:7,25,26,29,36,41:47,54:71),ts=1,cs=2,polytime=2,intercs=TRUE,
archive=TRUE)
amelia starting
beginning prep functions
running bootstrap
-- Imputation 1 --
setting up EM chain indicies
1(347357!) 2(51802!) 3(45797!) 4(40634!) 5(46795!) 6(40806!) 7(40043!)
8(35876!) 9(33598!)10(34136!)
11(28776!)12(25746!)13(23438!)14(21293!)15(30009!)16(32754!)17(24128!)18(23706!)19
Loading required package: foreign
Error in e[e > tol] <- 1/e[e > tol] :
NAs are not allowed in subscripted assignments
Calls: amelia -> emarch -> emfred -> amsweep -> mpinv
Execution halted
Any suggestions?
Thanks,
Anders
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia