Hi,
I apologize in advance for the lengthy question, but it's representative
of many issues I face when working with large panels of economic data,
so I would be extremely grateful for your suggestions, best practices,
experiences etc.
I'm wondering what I could do to speed up the imputation of my rather
large dataset (a panel of N 2120 x T 80 = 169600 obs). At this pace, my
imputations would run months. Memory is not the issue, rather I think
that I have too many priors and/or too many missings on certain
variables. See below, especially lnAid and lnFDI. Note that the missings
are concentrated on certain T points (in early time points) rather than
specific cross-sectional units.
|
Variable || || | Obs Mean Std. Dev.
Min Max
Polity || || | 168160 .8924833 6.955011
-10 10
Corruptlvl|| || | 157820 5.441431 1.799652
0 10
RuleofLaw || || | 157820 5.247434 2.204846
0 10
GovStab || ||| 157820 5.935454 2.064963
0 10
log of bilat. Aid | 76079 1.919392 2.338255 -2.302585 9.692112
log of FDI in host| 32080 3.918487 2.928901 -2.372018 10.98025
Capital openness|||| | 155200 -.2888318 1.379179 -1.766966
2.602508
Polcon V || || | 154320 .3490876 .3158385
0 .89
log of GDPcap_host| 154560 7.95649 1.053043 4.933741 10.48464
||log of ||GDP_host | 166480 29.62135 3.04193 22.97718
43.12974
||log of ||GDP_home | 147381 31.1144 2.128313 26.15253
37.36032
|If I don't set range priors, I get nonsensical values for most of the
variables: negative GDP (real GDP, not negative log values), polity
scores out of range, etc. I haven't even tried higher-order polynomials
or interactions with cross-sectional units, although I would prefer to
given that FDI exhibits a clear trend. Breaking up the dataset randomly
into pieces by cross-sections doesn't improve speed.
It seems that I have to make tradeoffs. What do you think would be the
best thing to do, i.e. what is the most time-consuming issue for the EM
algorithm?
Constrain/shorten the sample to have a higher proportion of observed
values on lnAid and lnFDI?
Accept imputations that are out of range (probably not)?
||Break up the dataset "vertically" into one with Aid and one with the
FDI variable, run two sets of imputations, and merge it again?
Many thanks,
Mark
--
Mark S. Manger, PhD
Assistant Professor
Department of Political Science, McGill University
mark.manger(a)mcgill.ca
on leave 2007-08:
Advanced Research Fellow, Program on US-Japan Relations
Weatherhead Center for International Affairs
Harvard University
61 Kirkland Street, Room 301
Cambridge, MA 02138
617-495-5998
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia