Imputing for a very large dataset - Amelia

31 May 2007

Hi,

I'm working with a very large dataset that has a relatively small amount of
missingness in a few of the variables.  (Any one variable has at most, say,
10% missingness).  Amelia won't run on the entire thing since R runs out of
memory. This happens even when I pare the dataset down to only those
variables used in the analysis.  I can get Amelia to run on 5% subsets of
the data, but even 10% subsets are too large.

So, is the best thing to do here to randomly split the data into 20 5%
chunks and impute separately within each chunk?  If so, how should I
recombine the subsets of imputed data to perform my analysis?

Thanks in advance for your help,

Dennis
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia