Re: [amelia] Is single imputation faster in parallel? Need help speeding up imputation. - Amelia

3 Jul 2013

Isaac,

In a time-series cross-sectional setting, I might. Suggest 5% (to 10%) of the n in each
cross-section (which is going to be be smaller typically than 1% of the total n).  So in
your series of 21 observations per cross section, an empri=1 (or 2) should again aid
stability and not shrink the coefficients significantly. This is advice from a mix of
intuition, exploration and experience from use cases, but of course this could really vary
in some settings.

Off list I got some follow up email about my earlier note, which made it clear that I
wasn't very clear.  The "tolerance" argument I also suggested adjusting
changes how the EM algorithm judges whether it has converged.  This is a separate thing
you might adjust in addition to empirical/ridge priors.  Larger numbers would mean that
the model parameters (on z-transformed data) can have larger changes between EM-steps and
be considered converged.

James
--
James Honaker, Senior Research Scientist
//// Institute for Quantitative Social Science, Harvard University

-----Original message-----
From: Isaac Petersen &lt;dadrivr(a)gmail.com&gt;
To: "Honaker, James" &lt;jhonaker(a)iq.harvard.edu&gt;
Cc: Amelia Mailing List &lt;amelia(a)lists.gking.harvard.edu&gt;
Sent: Tue, Jul 2, 2013 21:13:19 GMT+00:00
Subject: Re: [amelia] Is single imputation faster in parallel? Need help speeding up
imputation.

Thanks, James.  Your response was very helpful.  Just to clarify on the ridge prior:

My matrix to be imputed is 12,285 rows by 62 columns, composed of 585 cross sectional
units and 21 time series units.  Would a good ridge prior be 1 percent of 21 (where 21 is
the number of rows---i.e., time series units---within each cross-sectional unit)?

Thanks for clarifying.
-Isaac

On Tue, Jul 2, 2013 at 10:41 AM, Honaker, James
<jhonaker@iq.harvard.edu<mailto:jhonaker@iq.harvard.edu>> wrote:
Isaac,

In addition to the newer "multicore" abilities you mention, a small empirical
prior, will speed up convergence.  The "empri" argument sets an empirical/ridge
prior.  A value of a half to 1 percent of the sample size would be small, aid numerical
stability, and unlikely to noticably change results (unless you are using time series
cross sectional data, in which case you might use 1 percent of the sample within any cross
sectional unit).

The "tolerance" changes the point at which the EM algorithm is judged to have
converged, and setting that larger, (like .001, or even .005) is probably quite safe.  We
were very conservative with this tolerance choice, and should reexamine other options to
set it dynamically.

Best,
James.

--
James Honaker, Senior Research Scientist
//// Institute for Quantitative Social Science, Harvard University
________________________________
From:
amelia-bounces@lists.gking.harvard.edu<mailto:amelia-bounces@lists.gking.harvard.edu>
[amelia-bounces@lists.gking.harvard.edu<mailto:amelia-bounces@lists.gking.harvard.edu>]
on behalf of Isaac Petersen [dadrivr@gmail.com<mailto:dadrivr@gmail.com>]
Sent: Tuesday, July 02, 2013 9:55 AM
To: Amelia Mailing List
Subject: [amelia] Is single imputation faster in parallel? Need help speeding up
imputation.

I'm looking to speed up the run time of a single imputation on a large data set with
repeated measures that takes many hours.  Will running the imputation in parallel with the
parallel="multicore" option and 6 cores speed up the run time of a single
imputation, or will it only speed up the run time of multiple imputations (by running them
simultaneously)?  What are my best options for making the single imputation run faster
while minimizing any sacrifices in imputation accuracy?

Many thanks!
-Isaac