Hi all,
I do not have a licence for stata to combine the imputations for amelia. I have thought about using an aggregate function in SPSS to generate means and standard deviations for cases across all the five files (My gut instinct is that this is way too simple). I have also thought about using NORM to combine the five file parameters (Again seems to simple). Any advice on how to combine the amelia data files without stata would be much appreciated.
Kind regards Paul
Dear Dr. King and other list members,
We are testing some models on survey data that look sort of like this:
Model A: Y = a + (ba1*xa1 + ba2*xa2 .... ban*xan) + e
Model B: Y = a + (ba1......xan) + ( bb1*xb1 + ...... bbn*xbn) +e
(and so on for Model C and Model D)
This is to say that we are starting by regressing Y on a block A of predictors (xa1 through xan), adding block B of predictors (xb1 through xbn), block C, and block D. The test statistics of primary interest are the p values for the F tests for the change in R-squared between models A and B, B and C, and C and D. Of lesser interest are the F values for the change in R-squared themselves, the values of R-squared for the models, and the regression coefficients at each stage and the F values and p values associated with each coefficient.
If we were doing this with one data set with no missing values, all of this would be straightforward and we would have a table of regression coefficients, F values, R-squareds, and p values. Well, we are using Amelia to impute missing values, and the presentation of the results in the article we are trying to write is posing some questions which are new to us. Basically, if we run the models on the five sets created by Amelia and average all the numbers over the five sets, we get a set of numbers which don't really mesh.
For example, suppose that Block B has only one new predictor in it. In that case, on any single data set, the F value for the change in R-squared from model A to model B will be the same as the F value for the regression coefficient for that predictor in model B. But the average of those F values is NOT exactly the same as the F value that we get if you start with the averaged R-squareds for model A and for model B and compute the test based on that difference with the appropriate degrees of freedom.
(I should say that we have not actually averaged the p values - we have averaged the F statistics and computed the p values from the F distribution. Is that wrong?)
So, what should we really be doing here?
Can I assume that this kind of phenomenon is normal for Amelia, or is it a sign that we are doing something very wrong?
If it is normal, is it a mistake to try to present the whole table of numbers that we would normally present if we were using a data set with no missing values? Should we somehow say, "in using this method, we are concentrating on a small number of test statistics which are important; presenting all the other data would be useless and misleading"?
Or is it ok to present them, with a note to the reader that "these numbers are averages and therefore it is normal that they don't mesh together the way they would if they had been computed from a single data set with no missing values"?
Should we be actually averaging the p values, or is it ok to average the F values and compute the p values based on the averages?
Any other advice?
Sincerely,
Peter A. Kimball
Data Analyst/Statistician
American College of Healthcare Executives
(312) 424-9442
pkimball(a)ache.org
Register for ACHE's Congress on Healthcare Management, March 14-17, 2005, in Chicago. Online registration is now available!
For more information, visit the Education area of www.ache.org.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.hmdc.harvard.edu/?info=amelia