New subject: RE : longitudinal data imputation

30 Jan 2008

I have encountered this question in my own research recently, although with large-T,
small-N data.  For what it's worth, this is what I've pieced together from the
manuals and paper, and my other readings.  Can anyone tell me if I'm way off on
this?

There will always be some uncertainty about our estimates, because
they are simulations that represent possible values of data that we do
not have. The only sure-fire means of validating the imputations is to
have the actual values, which would eliminate the need for imputation.
Ultimately, you have to make a judgment about the credibility of the
imputation model itself — does it create reasonable estimates?

AMELIA II offers two diagnostics tools for judging imputed values —
compare and overimpute (both explained in Prof. King's recommended readings). 

The former command lets you compare the distribution of
reported and imputed variable’s values. Ask yourself, should the missing values have
the same distribution as reported values, or should their distribution
have a different central tendency, dispersion and/or skew than the
reported values?  The graphs produced here will allow you to assess your imputation's
conformity to these expectations.

Compare will allow you to check whether these
expectations about imputed value distributions are fulfilled. Imputed values’s
distribution do not have
to match the distribution of reported values, but the differences
between the two should be explainable.

Overimpute treats a sub-sample of
your reported data as if it were missing, then allows you to compare
simulated and actual values. For this test, you are basically trying to
see whether your imputation model renders predictions that approximate
actual values. 

Under some circumstances, your model will not predict
extreme values well. This has happened to me some times. My principle
concern here is that these extreme values do not constitute much of
your sample, or do not present data points with undue influence on your
results.  If such observations influence your model, then you have a concern with which I
have not yet dealt.

In addition, you can use your preferred spreadsheet or statistical
package to graph reported and imputed values within panels. Compare the
imptued and reported values within panels, and ask yourself whether the
imputed models make sense. If the variable is expected to take the form
of a random walk, then do the imputed values also suggest such a walk?
If the variable is one that maintains stable trends within panels over
time, do the imputed values roughly approximate this stable trend?

There is always judgment involved, and it is important that you are
able to make a case for the reader to believe in your imputations.

This is how I've interpreted the materials recommended by Prof. King, but I am not a
leading expert in missing data imputation.  If I am completely mistaken, someone please
tell me.

Joe

Joseph Nathan Cohen  
Assistant Professor of Sociology
City University of New York, Queens College
Powdermaker 252CC
65-30 Kissena Blvd
Flushing, NY    11367
e-mail: joseph.cohen(a)qc.cuny.edu
web: www.josephncohen.com

----- Original Message ----
From: "Blais, Martin" &lt;blais.martin(a)uqam.ca&gt;
To: king(a)harvard.edu; amelia(a)lists.gking.harvard.edu
Cc: "Raymond, Sarah" &lt;raymond.sarah(a)uqam.ca&gt;
Sent: Tuesday, January 29, 2008 8:56:30 PM
Subject: RE : [amelia] longitudinal data imputation

RE : [amelia] longitudinal data imputation

Thank you, I'll have a look to it!

Martin

-------- Message d'origine--------

De: Gary  King [mailto:king@harvard.edu]

Date: mar. 2008-01-29 18:32

À: Blais, Martin; amelia(a)lists.gking.harvard.edu

Cc: Raymond, Sarah

Objet : Re: [amelia] longitudinal data imputation

Have a look at the paper by Honaker and King on this subject at the web site. That and the
manual should do it.

---

Sent from my phone; please excuse the terse note.

Gary King

http://gking.harvard.edu

-----Original Message-----

From: "Blais, Martin" &lt;blais.martin(a)uqam.ca&gt;

Date: Tue, 29 Jan 2008 17:06:03

To:amelia@lists.gking.harvard.edu

Cc:"Raymond, Sarah" &lt;raymond.sarah(a)uqam.ca&gt;

Subject: [amelia] longitudinal data imputation

Hello

We are using longitudinal data (3 time-point) to evaluate the effects of an intervention
program. We have missing data (about 80% of missing cases at T2 for the control group and
some missing cases for both experimental and control group at T3) and want to impute
missing data using Amelia II.

I am looking for detail procedures for imputation of longitudinal missing data with Amelia
II to answer simple questions like: Should the data be organised in long or wide format?
Are there any tutorials (website, technical papers) available for this purpose beside the
Amelia II documentation?

Any reference on imputation of longitudinal data (mostly about the options and the best
practices in the context of program evaluation) are also welcome.

Very many thanks for any help,

Martin Blais, Ph.D.

Professeur

Département de sexologie

Université du Québec à Montréal

C.P. 8888, succ. Centre-ville

Montréal (Québec)

Canada     H3C 3P8

Vox : (514) 987-3000 poste 4031

Fax : (514) 987-6787

-

Amelia mailing list served by Harvard-MIT Data Center

[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

Re: RE : [amelia] longitudinal data imputation