if the rows are all arguably independent, then I'd randomly (be sure it
really is random) split the data into nonoverlapping chunks, running each
through Amelia, and then restack (one on top of the the other vertical)
each of the imputed data sets. that should work as is. if in addition,
you pass through an index variable (with say the observation number), then
you can sort on that at the end, you can verify that the order of the
observations is as you want.
or just get a bigger computer!
Gary
On Thu, 31 May 2007, Dennis Feehan wrote:
Hi,
I'm working with a very large dataset that has a relatively small amount of
missingness in a few of the variables. (Any one variable has at most, say,
10% missingness). Amelia won't run on the entire thing since R runs out of
memory. This happens even when I pare the dataset down to only those
variables used in the analysis. I can get Amelia to run on 5% subsets of
the data, but even 10% subsets are too large.
So, is the best thing to do here to randomly split the data into 20 5%
chunks and impute separately within each chunk? If so, how should I
recombine the subsets of imputed data to perform my analysis?
Thanks in advance for your help,
Dennis
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia