Re: [amelia] what slows the imputation down so much?

14 Mar 2008

Mark, Are you running Amelia on hmdc condor? I found running 50 separate 
Amelia programs simultaneously on hmdc relatively quick and easy -- the 
first 5 imputed datasets were returned within two or three weeks, and then 
after that I got about 1 more condor egg ;) per day. I could start my data 
analysis with the first five datasets and then additional datasets were 
added as they came in. Anders

Note to the powers that be: it would be useful to set condor to give 
streaming output. It is a little difficult to know what is going on with 
Amelia (can't take advantage of the verbose function) when this standard 
condor function isn't set. Thank you for such a great service hmdc!

On Fri, 14 Mar 2008, Mark Manger wrote:

...
  Hi,

 I apologize in advance for the lengthy question, but it's representative of 
 many issues I face when working with large panels of economic data, so I 
 would be extremely grateful for your suggestions, best practices, experiences 
 etc.

 I'm wondering what I could do to speed up the imputation of my rather large 
 dataset (a panel of N 2120 x T 80 = 169600 obs). At this pace, my imputations 
 would run months. Memory is not the issue, rather I think that I have too 
 many priors and/or too many missings on certain variables. See below, 
 especially lnAid and lnFDI. Note that the missings are concentrated on 
 certain T points (in early time points) rather than specific cross-sectional 
 units.
 |
 Variable    ||    ||  |       Obs        Mean    Std. Dev.       Min 
 Max
 Polity    ||      ||  |    168160    .8924833    6.955011        -10 
 10
 Corruptlvl||   ||     |    157820    5.441431    1.799652          0 
 10
 RuleofLaw  ||   ||    |    157820    5.247434    2.204846          0 
 10
 GovStab     ||      |||    157820    5.935454    2.064963          0 
 10
 log of bilat. Aid |     76079    1.919392    2.338255  -2.302585   9.692112
 log of FDI in host|     32080    3.918487    2.928901  -2.372018   10.98025
 Capital openness||||  |    155200   -.2888318    1.379179  -1.766966 
 2.602508
 Polcon V  ||     ||   |    154320    .3490876    .3158385          0 
 .89
 log of GDPcap_host|    154560     7.95649    1.053043   4.933741   10.48464
 ||log of ||GDP_host   |    166480    29.62135     3.04193   22.97718 
 43.12974
 ||log of ||GDP_home   |    147381     31.1144    2.128313   26.15253 
 37.36032

 |If I don't set range priors, I get nonsensical values for most of the 
 variables: negative GDP (real GDP, not negative log values), polity scores 
 out of range, etc. I haven't even tried higher-order polynomials or 
 interactions with cross-sectional units, although I would prefer to given 
 that FDI exhibits a clear trend. Breaking up the dataset randomly into pieces 
 by cross-sections doesn't improve speed.

 It seems that I have to make tradeoffs. What do you think would be the best 
 thing to do, i.e. what is the most time-consuming issue for the EM algorithm?
 Constrain/shorten the sample to have a higher proportion of observed values 
 on lnAid and lnFDI?
 Accept imputations that are out of range (probably not)?
 ||Break up the dataset "vertically" into one with Aid and one with the FDI 
 variable, run two sets of imputations, and merge it again?

 Many thanks,

 Mark

 -- 
 Mark S. Manger, PhD
 Assistant Professor
 Department of Political Science, McGill University
 mark.manger(a)mcgill.ca

 on leave 2007-08:
 Advanced Research Fellow, Program on US-Japan Relations
 Weatherhead Center for International Affairs
 Harvard University
 61 Kirkland Street, Room 301
 Cambridge, MA 02138
 617-495-5998
 -
 Amelia mailing list served by Harvard-MIT Data Center
 [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
 -
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] what slows the imputation down so much?