Hello Amelia Team,
Three questions about a dataset I'm working with.
1) My dataset consists of a 10-wave survey in which some survey questions were only
asked in the final 2 waves. As a block matrix it looks like this
A M
B C
where all values in block M are structurally missing and not of interest. I want to
impute the missing values in blocks A, B, and C. I could
either a) run amelia on the full dataset, b) split it into two datasets, one with A and B, the other with only C, or c) split it into two overlapping datasets, one with A and B, the other with B and C. I'm
hesitant to use the full dataset because including block M increases
the missingness from 25% to 55%, but I don't know if there are theoretical objections to the other two approaches. What do you suggest?
2) In
my analysis, I examine the interaction between treatment and ideology. I run treatment*(ideology>4) in one regression, but I also try other
models with ideology>5, 6, etc. to see if the cutoff makes a difference. For Amelia, would it be sufficient to include a continuous treatment*ideology column in my data, or do I need to dichotomize this column in Amelia as well (and hence run Amelia multiple
times, once for each cutoff)?
3) In order to get bootstrapped confidence intervals in my analysis, I bootstrap the
original data 1000 times and then run Amelia (with m=5) on each bootstrapped dataset before analyzing. Although the original dataset works just fine, the bootstrapped versions throw errors about 1/3 of the time: first a few hundred "chol(): given matrix is
not symmetric" warnings followed by a "inv_sympd(): matrix is singular or not positive definite" error. Usually all 5 imputations fail for a given bootstrapped data set, but sometimes only some of them do. Suggestions?
Thanks,
Matthew Simonson
Doctoral Student,
Northeastern University, Boston
The COVID-19 Consortium for Understanding the Public’s Policy Preferences Across States
Research Areas: Networks, Civil
Wars, COVID-19, Causal Inference
www.covidstates.org