Re: [amelia] Three questions - Amelia

31 May 2020

Hi Matthew, a few notes below...
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary
<https://twitter.com/kinggary> - 617-500-7570 - Assistant
&lt;king-assist(a)iq.harvard.edu&gt;du>: 617-495-9271

On Mon, May 25, 2020 at 7:29 PM Matthew Simonson <
simonson.m(a)northeastern.edu&gt; wrote:

...
  Hello Amelia Team,

 Three questions about a dataset I'm working with.

 1) My dataset consists of a 10-wave survey in which some survey questions
 were only asked in the final 2 waves. As a block matrix it looks like this

 A M
 B C

 where all values in block M are structurally missing and not of interest.
 I want to impute the missing values in blocks A, B, and C.  I could
 either a) run amelia on the full dataset, b) split it into two datasets,
 one with A and B, the other with only C, or c) split it into two
 overlapping datasets, one with A and B, the other with B and C. I'm
 hesitant to use the full dataset because including block M increases the
 missingness from 25% to 55%, but I don't know if there are theoretical
 objections to the other two approaches. What do you suggest?

There isn't a correct answer here, but your suggestions are reasonable. I'd
mainly just make sure that the imputation model fits the data.  that will
enable you to pick an approach. (You might also have a look at this
<https://gking.harvard.edu/files/abs/not-abs.shtml> somewhat related paper,
but I'd still use Amelia rather than coding it up separately.)

...

 2) In my analysis, I examine the interaction between treatment and
 ideology. I run treatment*(ideology>4) in one regression, but I also try other
 models with ideology>5, 6, etc. to see if the cutoff makes a difference.
 For Amelia, would it be sufficient to include a continuous
 treatment*ideology column in my data, or do I need to dichotomize this
 column in Amelia as well (and hence run Amelia multiple times, once for
 each cutoff)?

you definitely want to include at least as much information in the
imputation model as in your analysis model.

...

 3) In order to get bootstrapped confidence intervals in my analysis, I
 bootstrap the original data 1000 times and then run Amelia (with m=5) on
 each bootstrapped dataset before analyzing. Although the original dataset
 works just fine, the bootstrapped versions throw errors about 1/3 of the
 time: first a few hundred "chol(): given matrix is not symmetric" warnings
 followed by a "inv_sympd(): matrix is singular or not positive definite"
 error. Usually all 5 imputations fail for a given bootstrapped data set,
 but sometimes only some of them do. Suggestions?

i'd hunt this down.  i'm guessing that you have included some dummy
variables that don't exclude the baseline (so the all sum to 1), or
something close to that.  you probably have either too small an n or
perfect collinearity.  i'd track this down since it might be a data error
that affects everything else too.

Best of luck with your work.

Gary

...

 Thanks,

 Matthew Simonson
 Doctoral Student, Northeastern University, Boston
 * The COVID-19 Consortium for Understanding the Public’s Policy
 Preferences Across States *
 *Research Areas: Networks, Civil Wars, COVID-19, Causal Inference*
 *www.msimonson.com <http://www.msimonson.com/>*
 *www.covidstates.org <http://www.covidstates.org>*