Hi Erin,
The possibility of added variability by using the separate analyses and
then combining the results is actually a feature, not a bug, of multiple
imputation. The goal of MI is to give you a complete dataset and allow you
to honestly represent your uncertainty, which includes uncertainty due to
the imputed data. So, even with your more complicated modeling, it is still
important to do separate models and then combine them after.
Now, the only issue I can see with that for your approach is with the model
searching that you are doing after imputation. If you have a particular
parameter that you are interested in and that parameter will always be in
the model, then you can do the model selection in each imputed dataset and
then do the averaging for the particular parameter as you always would,
using the Rubin rules. If you are interested in something more complicated
or something that could not be defined because of variation in the model
selection (though, that might not be a great parameter to focus on if the
imputations cause it to be selected out of the model), then you could use a
single, imputation-averaged dataset to choose the model and then run that
model on each of the imputed datasets separately. Then combine as usual.
Hope that helps!
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Government
Harvard University
url:
http://www.mattblackwell.org
On Thu, Jun 25, 2015 at 1:49 AM Graham, Erin <Erin.Graham(a)oregonstate.edu>
wrote:
I have a question about combining imputed data from
Amelia. I understand
the rationale for running your end analysis on each imputed data set
separately, and then combining the model results. However, what if your
analysis is more complicated than a simple LM? For example, for my
analysis, I am using imputed data sets (5) of time series variables (12
independent water quality variables, all time series, and 1 dependent time
series). I am decomposing each series using loess and extracting the trend
only. Then, I am using prewhitening and cross correlation to identify lags
of variables that may be useful predictors. Finally, I am differencing each
series and creating and comparing ARIMA models with external regressors to
find the best model. I am having a hard time understanding how going
through each of these steps with each imputed data set separately (and
trying to combine the best models) is not going to create more variability
and decrease the confidence of the model compared to averaging the imputed
data sets before doing any analysis.
In short, if my imputed data sets are not "that" different, and the range
of values for each of my predictors is relatively small, could it possibly
be better to average the data first instead of trying to combine the best
model from each?
I would greatly appreciate any comments or suggestions.
Thank you for your help.
--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia
Amelia mailing list
Amelia(a)lists.gking.harvard.edu
To unsubscribe from this list or get other information:
https://lists.gking.harvard.edu/mailman/listinfo/amelia