Thanks so much for the suggestion Shane.  Too bad they don't have a trial period.  Once I have determined if I should use imputation, and whether I can master the techniques, I will give it consideration.  

 

Glenys Lafrance
Ph.D candidate
University of Toronto
Ontario Institute for Studies in Education
Theory & Policy Studies, Higher Education


From: Shane MacDonald [mailto:shane_is@yahoo.com]
Sent: Friday, August 31, 2007 4:28 AM
To: James Honaker; Glenys Lafrance
Cc: amelia@lists.gking.harvard.edu
Subject: Re: [amelia] Combine m datasets?

 

Dear List members,

 

I got Zumastat last year and found it very useful. Check out the home page too see what it does apart from combining m data sets. It's a stand alone statistical program that can be integrated easily with the menu bars of SPSS and Excel. 

 

http://www.zumastat.com/Home.htm

 

COMBINING MULTIPLE IMPUTATIONS (in Zumastat)

 

This utility uses the results from multiple imputation analyses

and combines them to yield an average parameter value and a

"pooled" standard error (using formulas from the computer

program Amelia).

 

For each imputed data set, enter the value of the parameter

and the standard error of the parameter. Enter the number

of imputed data sets.

 

Check t test to divide the parameter by its standard error and

test for significance using a t distribution. Check z test

to use a normal distribution instead. If yu check t test, enter

the degrees of freedom you want to use.

 

You can input data from an ascii file. List the parameter

estimates first, one per line. Then enter the standard errors,

also one per line. If you have 5 imputed data sets, you

will have 10 lines of data.

 

Regards

 

/Shane MacDonald



James Honaker <tercer@ucla.edu> wrote:


Dear Glenys,

There do exist single imputation methods that people use, for example,
interpolating the data, but while the best ones (such as EM) can give
you a good estimate of the expected value of the missing value, they
generally all fail to rigourously also provide a measure of uncertainty in
that imputation. We don't want to treat the data that we didn't observe
with the same confidence as the data that we were able to measure
directly.

With the mulitply imputed datasets you have created, the distribution
across the values of any missing observation reflects the uncertainty in
that imputation. If the observed predicts that observation with great
confidence, then all the imputations will be very similar. If no
information exists to impute the observation, the distribution of the
imputations will be large (approximately with variance close to the
unconditional variance of the observed values of that variable).

So, collapsing your imputations into one dataset might give you good
predictions for your missing values, but no measure of how confident you
can be in those predictions and how much weight you should give them, or
how much like the observed data you should treat them. This can trip up
your analysis, including causing bias in parameters of interest.

However, one of the computational benefits of multiple imputation (and
perhaps the shorter answer) is that each one of these imputed datasets
you have generated can be treated as if there was no missing data problem.
This is why multiple imputation is sometimes described as a two step
approach. You can run whatever analysis you would have run if you had the
complete data in each imputed dataset (like you had hoped to run in the
single imputed dataset). Generally (and in SPSS) this just requires
taking whatever command you were going run, and placing it into a loop (if
you are using a script). Now you have m sets of results (where m is the
number of imputed datasets and 5 or 10 is in most cases plenty
sufficient). To get the final answer you actually want to report to your
reader, there is a simple formula to combine these results. For most
quantities of interest (like regression coefficients) you just average
each quantity of interest across the m datasets. The standard errors on
these averaged quantities are slightly trickier, as they reflect both the
average error across the models and the additional error between the
models. In Stata and R we have simple functions that will do all of this
seamlessly for you. I'm not sure if something similar exists in SPSS
already. If anyone on the list knows of a function to combine a
list of imputed results (or even to Bayesian model average) please chime
in. If such a thing does not exist presently, I'm pretty sure we can whip
one up for you.

regards,
James Honaker

On Fri, 31 Aug 2007, Glenys Lafrance wrote:

> Hi Listers,
>
>
>
> Hope rookie questions are welcome here, as I don't have a strong
> quantitative background.
>
>
>
> Can I impute missing data from a survey (with mainly dichotomous and ordinal
> questions) and end up with 1 credible dataset with which to conduct the
> analysis? I have been able to impute the m datasets using AMELIA, but I'd
> like to combine them appropriately to use just 1 file with SPSS and AMOS
> (SEM). If anyone can recommend a procedure I'd be super grateful.
>
>
>
> If it is not something I can do competently, are there folks out there who
> do fee-for-service consulting or analysis? If so, I'd like to hear from
> you.pls respond off-list. I need to keep it very economical and expedient,
> I'm fulltime, my research is not funded and I am very close to the deadline
> for PhD completion. Thanks,
>
>
>
> Glenys Lafrance
>
> Ph.D candidate
>
> University of Toronto
>
> Ontario Institute for Studies in Education
>
> Theory & Policy Studies, Higher Education
>
>
>
>
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

 

 


Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out.