[amelia] imputation time growing exponentially?

7 Jul 2010

Hi,

I've just started working with Amelia II to do multiple imputation for
large data sets. It works great but I have some questions about how well
it scales.

In the Honaker & King "What to do about Missing Values..." paper the
authors mention imputing for data sets with 240 variables and 32,000
observations, which I would love to do, but I estimate this would take
~10^6 hours to do one imputation.

I did some test runs and it seems like computing time grows
exponentially with the number of variables. I timed several runs in R
2.10.1 (on an Intel Xeon desktop) and fit a regression that gave me the
roughly the following:

time [seconds] = 10^-4 * (# of imputations) * (# of subjects)^0.92 *
1.118^(# of variables)

In these runs I used up to 25,000 subjects and 24 variables. Missing
rates were ~7-12% for most variables.

Based on this it looks like using ~200 variables would take O(10^6)
hours while 120 variables could be done in about a week. As
parallelization only reduces # of imputations/processor, not # of
variables it doesn't look like that would help.

Can anyone comment on run times for large sets? It's possible I've
missed something or the exponential relation doesn't hold for more
variables.

Thanks!
Kurt

-- 
Kurt Smith, PhD
Scientist II
Archimedes Inc
201 Mission Street, 29th Floor
San Francisco, CA  94105
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[amelia] imputation time growing exponentially?