On 12/11/2008 07:58 AM, Catherine Hughes wrote:

Hi

I am a PhD student and a statistics novice. I would very much appreciate your advice regarding my approach to using Amelia. I have administered 4 research instruments to 218 study participants. The instruments and number of items in each is as follows:

CDI – 36 items

SDQII - 66 items

Parental Bonding Instrument (PBI) – 25 items for Female Parent form and 25 items for Male Parent Form

“I am … Test” – yields a single score for each subject.

Out of the total sample of 218, only 140 (64.22%) of the total sample provided complete information across all research questionnaires. The proportion of cases with missing data is consistent with King, Honaker, Joseph, & Scheve’s (2001) estimate that on average one third of cases in a data set will have missing values. To preserve as many cases as possible I want to use Amelia II to multiply impute missing values.

Initially this is what I did:

1.       Data were not MCAR, but MAR assumption appeared to hold, therefore Amelia II seemed promising.

2.       When all items of the CDI, PBI, SDQ, and “I am …” Test were combined into one large data set, the number of cases or observations (n = 218) was only slightly greater than the number of variables in the combined data set. Even a ridge prior of 11 (5.05% of 218) was found to be insufficient to estimate the imputations. Following the failure Amelia II to estimate the imputations with the implementation of a moderate ridge prior, the CDI data set, the PBI data set, and the SDQII data set were submitted to Amelia separately. This ensured that the number of observations was adequate for the number of imputations to be estimated by Amelia II. For each of the CDI, SDQ, PBI Mother Form and PBI Father Form, this preserved all cases except where the participant did not provide any data whatsoever for the questionnaire concerned.  The resulting 5 data sets for each instrument were produced. The “I am Test” could not be submitted to AmeIia II for multiple imputation of missing values because the test yields only one score (the %S score which is the % of Collectivist responses based on content analysis of answers provided by the participant).

right. its not only that the number of variables is approximately equal to the number of cases, but the number of parameters in your original run (p*(1-p)/3, where p=num of vars) is much larger than the number of observations. so you need to cut down on the variables. if you need to cut more, cut down to the variables you actually use (or which can predict the missingness) rather than the whole data set.

3. I calculated scale scores for each of the instruments, involving each of the 5 datasets produced by Amelia II for each instrument. The scale scores were to be used for multiple regression analysis.

My new problem is that where study participants did not complete any of the items in an instrument, Amelia II inserted NA in the entire row. This means that there are still missing values in the data set comprised of scale scores that I wish to analyse using multiple regression.

if the observations are independent and you know nothing about some people, then only magic would enable you to impute anything other than NAs. so if you know something about these folks, code them up as variables.

Another approach I had in mind is to:

1. Before submitting the instruments to Amelia II, calculate scale scores for all instruments, except for cases that have skipped items and therefore have missing values on one or more items in a scale.

2. Submit the scale scores for all instruments to Amelia II to produce 5 data sets with imputed missing values.

3. Run multiple regression analysis with each of the 5 datasets comprised of the same obseved and different estimates of missing values for scale scores.

This approach would mean that Amelia II would be imputing values for missing scale scores even when a study participant did not complete any items whatsoever in the scale concerned, or indeed, the entire instrument. This is like "getting something out of nothing". Would this be an appropriate or inappropriate use of Amelia II?

its not inappropriate and you might need to do it, but in general its better to impute the component measures rather than the summary (index) score. the reason is that if you have say 10 measures of something and 1 of them is missing, it would be easy for Amelia to impute the missing one since the other 9 measures are observed.

Gary
---
http://gking.harvard.edu

Your assistance would be greatly appreciated. I apologise for the length of the query. I also apologise if my query is somewhat elementary.

Regards

Cathy Hughes

- Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia