[amelia] what slows the imputation down so much?

14 Mar 2008

Hi,

I apologize in advance for the lengthy question, but it's representative 
of many issues I face when working with large panels of economic data, 
so I would be extremely grateful for your suggestions, best practices, 
experiences etc.

I'm wondering what I could do to speed up the imputation of my rather 
large dataset (a panel of N 2120 x T 80 = 169600 obs). At this pace, my 
imputations would run months. Memory is not the issue, rather I think 
that I have too many priors and/or too many missings on certain 
variables. See below, especially lnAid and lnFDI. Note that the missings 
are concentrated on certain T points (in early time points) rather than 
specific cross-sectional units.
|
Variable    ||    ||  |       Obs        Mean    Std. Dev.       
Min        Max
Polity    ||      ||  |    168160    .8924833    6.955011        
-10         10
Corruptlvl||   ||     |    157820    5.441431    1.799652          
0         10
RuleofLaw  ||   ||    |    157820    5.247434    2.204846          
0         10
GovStab     ||      |||    157820    5.935454    2.064963          
0         10
log of bilat. Aid |     76079    1.919392    2.338255  -2.302585   9.692112
log of FDI in host|     32080    3.918487    2.928901  -2.372018   10.98025
Capital openness||||  |    155200   -.2888318    1.379179  -1.766966   
2.602508
Polcon V  ||     ||   |    154320    .3490876    .3158385          
0        .89
log of GDPcap_host|    154560     7.95649    1.053043   4.933741   10.48464
||log of ||GDP_host   |    166480    29.62135     3.04193   22.97718   
43.12974
||log of ||GDP_home   |    147381     31.1144    2.128313   26.15253   
37.36032

|If I don't set range priors, I get nonsensical values for most of the 
variables: negative GDP (real GDP, not negative log values), polity 
scores out of range, etc. I haven't even tried higher-order polynomials 
or interactions with cross-sectional units, although I would prefer to 
given that FDI exhibits a clear trend. Breaking up the dataset randomly 
into pieces by cross-sections doesn't improve speed.

It seems that I have to make tradeoffs. What do you think would be the 
best thing to do, i.e. what is the most time-consuming issue for the EM 
algorithm?
Constrain/shorten the sample to have a higher proportion of observed 
values on lnAid and lnFDI?
Accept imputations that are out of range (probably not)?
||Break up the dataset "vertically" into one with Aid and one with the 
FDI variable, run two sets of imputations, and merge it again?

Many thanks,

Mark

-- 
Mark S. Manger, PhD
Assistant Professor
Department of Political Science, McGill University
mark.manger(a)mcgill.ca

on leave 2007-08:
Advanced Research Fellow, Program on US-Japan Relations
Weatherhead Center for International Affairs
Harvard University
61 Kirkland Street, Room 301
Cambridge, MA 02138
617-495-5998
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[amelia] what slows the imputation down so much?