Hi Erin,

The possibility of added variability by using the separate analyses and then combining the results is actually a feature, not a bug, of multiple imputation. The goal of MI is to give you a complete dataset and allow you to honestly represent your uncertainty, which includes uncertainty due to the imputed data. So, even with your more complicated modeling, it is still important to do separate models and then combine them after.

Now, the only issue I can see with that for your approach is with the model searching that you are doing after imputation. If you have a particular parameter that you are interested in and that parameter will always be in the model, then you can do the model selection in each imputed dataset and then do the averaging for the particular parameter as you always would, using the Rubin rules. If you are interested in something more complicated or something that could not be defined because of variation in the model selection (though, that might not be a great parameter to focus on if the imputations cause it to be selected out of the model), then you could use a single, imputation-averaged dataset to choose the model and then run that model on each of the imputed datasets separately. Then combine as usual.

Hope that helps!

Cheers,

Matt

~~~~~~~~~~~

Matthew Blackwell

Assistant Professor of Government

Harvard University

url: http://www.mattblackwell.org

On Thu, Jun 25, 2015 at 1:49 AM Graham, Erin <Erin.Graham@oregonstate.edu> wrote:

I have a question about combining imputed data from Amelia. I understand the rationale for running your end analysis on each imputed data set separately, and then combining the model results. However, what if your analysis is more complicated than a simple LM? For example, for my analysis, I am using imputed data sets (5) of time series variables (12 independent water quality variables, all time series, and 1 dependent time series). I am decomposing each series using loess and extracting the trend only. Then, I am using prewhitening and cross correlation to identify lags of variables that may be useful predictors. Finally, I am differencing each series and creating and comparing ARIMA models with external regressors to find the best model. I am having a hard time understanding how going through each of these steps with each imputed data set separately (and trying to combine the best models) is not going to create more variability and decrease the confidence of the model compared to averaging the imputed data sets before doing any analysis.

In short, if my imputed data sets are not "that" different, and the range of values for each of my predictors is relatively small, could it possibly be better to average the data first instead of trying to combine the best model from each?

I would greatly appreciate any comments or suggestions.

Thank you for your help.

--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Amelia mailing list
Amelia@lists.gking.harvard.edu

To unsubscribe from this list or get other information:

https://lists.gking.harvard.edu/mailman/listinfo/amelia