Hi Matthew, a few notes below...
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS <http://iq.harvard.edu/> - Harvard University
GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary
<https://twitter.com/kinggary> - 617-500-7570 - Assistant
<king-assist(a)iq.harvard.edu>du>: 617-495-9271
On Mon, May 25, 2020 at 7:29 PM Matthew Simonson <
simonson.m(a)northeastern.edu> wrote:
Hello Amelia Team,
Three questions about a dataset I'm working with.
1) My dataset consists of a 10-wave survey in which some survey questions
were only asked in the final 2 waves. As a block matrix it looks like this
A M
B C
where all values in block M are structurally missing and not of interest.
I want to impute the missing values in blocks A, B, and C. I could
either a) run amelia on the full dataset, b) split it into two datasets,
one with A and B, the other with only C, or c) split it into two
overlapping datasets, one with A and B, the other with B and C. I'm
hesitant to use the full dataset because including block M increases the
missingness from 25% to 55%, but I don't know if there are theoretical
objections to the other two approaches. What do you suggest?
There isn't a correct answer here, but your suggestions are reasonable. I'd
mainly just make sure that the imputation model fits the data. that will
enable you to pick an approach. (You might also have a look at this
<https://gking.harvard.edu/files/abs/not-abs.shtml> somewhat related paper,
but I'd still use Amelia rather than coding it up separately.)
2) In my analysis, I examine the interaction between treatment and
ideology. I run treatment*(ideology>4) in one regression, but I also try other
models with ideology>5, 6, etc. to see if the cutoff makes a difference.
For Amelia, would it be sufficient to include a continuous
treatment*ideology column in my data, or do I need to dichotomize this
column in Amelia as well (and hence run Amelia multiple times, once for
each cutoff)?
you definitely want to include at least as much information in the
imputation model as in your analysis model.
3) In order to get bootstrapped confidence intervals in my analysis, I
bootstrap the original data 1000 times and then run Amelia (with m=5) on
each bootstrapped dataset before analyzing. Although the original dataset
works just fine, the bootstrapped versions throw errors about 1/3 of the
time: first a few hundred "chol(): given matrix is not symmetric" warnings
followed by a "inv_sympd(): matrix is singular or not positive definite"
error. Usually all 5 imputations fail for a given bootstrapped data set,
but sometimes only some of them do. Suggestions?
i'd hunt this down. i'm guessing that you have included some dummy
variables that don't exclude the baseline (so the all sum to 1), or
something close to that. you probably have either too small an n or
perfect collinearity. i'd track this down since it might be a data error
that affects everything else too.
Best of luck with your work.
Gary
Thanks,
Matthew Simonson
Doctoral Student, Northeastern University, Boston
* The COVID-19 Consortium for Understanding the Public’s Policy
Preferences Across States *
*Research Areas: Networks, Civil Wars, COVID-19, Causal Inference*
*www.msimonson.com <http://www.msimonson.com/>*
*www.covidstates.org <http://www.covidstates.org>*