I am a PhD student and a statistics novice. I would very much appreciate your advice regarding my approach to using Amelia. I have administered 4 research instruments to 218 study participants. The instruments and number of items in each is as follows:

CDI – 36 items
SDQII - 66 items
Parental Bonding Instrument (PBI) – 25 items for Female Parent form and 25 items for Male Parent Form
“I am … Test” – yields a single score for each subject.

Out of the total sample of 218, only 140 (64.22%) of the total sample provided complete information across all research questionnaires. The proportion of cases with missing data is consistent with King, Honaker, Joseph, & Scheve’s (2001) estimate that on average one third of cases in a data set will have missing values. To preserve as many cases as possible I want to use Amelia II to multiply impute missing values.

Initially this is what I did:

1. Data were not MCAR, but MAR assumption appeared to hold, therefore Amelia II seemed promising.

2. When all items of the CDI, PBI, SDQ, and “I am …” Test were combined into one large data set, the number of cases or observations (n = 218) was only slightly greater than the number of variables in the combined data set. Even a ridge prior of 11 (5.05% of 218) was found to be insufficient to estimate the imputations. Following the failure Amelia II to estimate the imputations with the implementation of a moderate ridge prior, the CDI data set, the PBI data set, and the SDQII data set were submitted to Amelia separately. This ensured that the number of observations was adequate for the number of imputations to be estimated by Amelia II. For each of the CDI, SDQ, PBI Mother Form and PBI Father Form, this preserved all cases except where the participant did not provide any data whatsoever for the questionnaire concerned. The resulting 5 data sets for each instrument were produced. The “I am Test” could not be submitted to AmeIia II for multiple imputation of missing values because the test yields only one score (the %S score which is the % of Collectivist responses based on content analysis of answers provided by the participant).

3. I calculated scale scores for each of the instruments, involving each of the 5 datasets produced by Amelia II for each instrument. The scale scores were to be used for multiple regression analysis.

My new problem is that where study participants did not complete any of the items in an instrument, Amelia II inserted NA in the entire row. This means that there are still missing values in the data set comprised of scale scores that I wish to analyse using multiple regression.

Another approach I had in mind is to:

1. Before submitting the instruments to Amelia II, calculate scale scores for all instruments, except for cases that have skipped items and therefore have missing values on one or more items in a scale.

2. Submit the scale scores for all instruments to Amelia II to produce 5 data sets with imputed missing values.

3. Run multiple regression analysis with each of the 5 datasets comprised of the same obseved and different estimates of missing values for scale scores.

This approach would mean that Amelia II would be imputing values for missing scale scores even when a study participant did not complete any items whatsoever in the scale concerned, or indeed, the entire instrument. This is like "getting something out of nothing". Would this be an appropriate or inappropriate use of Amelia II?

Your assistance would be greatly appreciated. I apologise for the length of the query. I also apologise if my query is somewhat elementary.

Regards

Cathy Hughes