Re: [amelia] what slows the imputation down so much?

14 Mar 2008

a translation: condor distributes jobs to a 200-node research computing 
cluster we have built at the Harvard-MIT Data Center, which is part of the 
Institute for Quantitative Social Science at Harvard.  unfortunately, that 
system isn't open to the public (altho we are working at an open source 
version of the software we produced to create the cluster).  Anders' point 
more generally is that Amelia can be used well in parallel environment.

Gary

On Fri, 14 Mar 2008, Anders Schwartz Corr wrote:

...
  Mark, Are you running Amelia on hmdc condor? I found
running 50 separate 
 Amelia programs simultaneously on hmdc relatively quick and easy -- the first 
 5 imputed datasets were returned within two or three weeks, and then after 
 that I got about 1 more condor egg ;) per day. I could start my data analysis 
 with the first five datasets and then additional datasets were added as they 
 came in. Anders

 Note to the powers that be: it would be useful to set condor to give 
 streaming output. It is a little difficult to know what is going on with 
 Amelia (can't take advantage of the verbose function) when this standard 
 condor function isn't set. Thank you for such a great service hmdc!

 On Fri, 14 Mar 2008, Mark Manger wrote:

   Hi,

  I apologize in advance for the lengthy question, but it's representative
  of many issues I face when working with large panels of economic data, so
  I would be extremely grateful for your suggestions, best practices,
  experiences etc.

  I'm wondering what I could do to speed up the imputation of my rather
  large dataset (a panel of N 2120 x T 80 = 169600 obs). At this pace, my
  imputations would run months. Memory is not the issue, rather I think that
  I have too many priors and/or too many missings on certain variables. See
  below, especially lnAid and lnFDI. Note that the missings are concentrated
  on certain T points (in early time points) rather than specific
  cross-sectional units.
 |
  Variable    ||    ||  |       Obs        Mean    Std. Dev.       Min Max
  Polity    ||      ||  |    168160    .8924833    6.955011        -10 10
  Corruptlvl||   ||     |    157820    5.441431    1.799652          0 10
  RuleofLaw  ||   ||    |    157820    5.247434    2.204846          0 10
  GovStab     ||      |||    157820    5.935454    2.064963          0 10
  log of bilat. Aid |     76079    1.919392    2.338255  -2.302585
  9.692112
  log of FDI in host|     32080    3.918487    2.928901  -2.372018
  10.98025
  Capital openness||||  |    155200   -.2888318    1.379179  -1.766966
  2.602508
  Polcon V  ||     ||   |    154320    .3490876    .3158385          0 .89
  log of GDPcap_host|    154560     7.95649    1.053043   4.933741
  10.48464
 | | log of ||GDP_host   |    166480    29.62135     3.04193   22.97718
  43.12974
 | | log of ||GDP_home   |    147381     31.1144    2.128313   26.15253
  37.36032

 | If I don't set range priors, I get nonsensical values for most of the
  variables: negative GDP (real GDP, not negative log values), polity scores
  out of range, etc. I haven't even tried higher-order polynomials or
  interactions with cross-sectional units, although I would prefer to given
  that FDI exhibits a clear trend. Breaking up the dataset randomly into
  pieces by cross-sections doesn't improve speed.

  It seems that I have to make tradeoffs. What do you think would be the
  best thing to do, i.e. what is the most time-consuming issue for the EM
  algorithm?
  Constrain/shorten the sample to have a higher proportion of observed
  values on lnAid and lnFDI?
  Accept imputations that are out of range (probably not)?
 | | Break up the dataset "vertically" into one with Aid and one with the 
 | | FDI
  variable, run two sets of imputations, and merge it again?

  Many thanks,

  Mark

  --
  Mark S. Manger, PhD
  Assistant Professor
  Department of Political Science, McGill University
  mark.manger(a)mcgill.ca

  on leave 2007-08:
  Advanced Research Fellow, Program on US-Japan Relations
  Weatherhead Center for International Affairs
  Harvard University
  61 Kirkland Street, Room 301
  Cambridge, MA 02138
  617-495-5998
  -
  Amelia mailing list served by Harvard-MIT Data Center
  [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
   -
 Amelia mailing list served by Harvard-MIT Data Center
 [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

 -
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] what slows the imputation down so much?