1) My dataset consists of a 10-wave survey in which some survey questions were only asked in the final 2 waves. As a block matrix it looks like this

where all values in block M are structurally missing and not of interest. I want to impute the missing values in blocks A, B, and C. I could either a) run amelia on the full dataset, b) split it into two datasets, one with A and B, the other with only C, or c) split it into two overlapping datasets, one with A and B, the other with B and C. I'm hesitant to use the full dataset because including block M increases the missingness from 25% to 55%, but I don't know if there are theoretical objections to the other two approaches. What do you suggest?

2) In my analysis, I examine the interaction between treatment and ideology. I run treatment*(ideology>4) in one regression, but I also try other models with ideology>5, 6, etc. to see if the cutoff makes a difference. For Amelia, would it be sufficient to include a continuous treatment*ideology column in my data, or do I need to dichotomize this column in Amelia as well (and hence run Amelia multiple times, once for each cutoff)?

3) In order to get bootstrapped confidence intervals in my analysis, I bootstrap the original data 1000 times and then run Amelia (with m=5) on each bootstrapped dataset before analyzing. Although the original dataset works just fine, the bootstrapped versions throw errors about 1/3 of the time: first a few hundred "chol(): given matrix is not symmetric" warnings followed by a "inv_sympd(): matrix is singular or not positive definite" error. Usually all 5 imputations fail for a given bootstrapped data set, but sometimes only some of them do. Suggestions?

Matthew Simonson

Doctoral Student, Northeastern University, Boston

The COVID-19 Consortium for Understanding the Public’s Policy Preferences Across States

Research Areas: Networks, Civil Wars, COVID-19, Causal Inference