Amelia May 2020

amelia@lists.gking.harvard.edu

3 participants
3 discussions

by Gary King

Hi Matthew, a few notes below... Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Mon, May 25, 2020 at 7:29 PM Matthew Simonson < simonson.m(a)northeastern.edu> wrote: > Hello Amelia Team, > > Three questions about a dataset I'm working with. > > 1) My dataset consists of a 10-wave survey in which some survey questions > were only asked in the final 2 waves. As a block matrix it looks like this > > A M > B C > > where all values in block M are structurally missing and not of interest. > I want to impute the missing values in blocks A, B, and C. I could > either a) run amelia on the full dataset, b) split it into two datasets, > one with A and B, the other with only C, or c) split it into two > overlapping datasets, one with A and B, the other with B and C. I'm > hesitant to use the full dataset because including block M increases the > missingness from 25% to 55%, but I don't know if there are theoretical > objections to the other two approaches. What do you suggest? > There isn't a correct answer here, but your suggestions are reasonable. I'd mainly just make sure that the imputation model fits the data. that will enable you to pick an approach. (You might also have a look at this <https://gking.harvard.edu/files/abs/not-abs.shtml> somewhat related paper, but I'd still use Amelia rather than coding it up separately.) > > 2) In my analysis, I examine the interaction between treatment and > ideology. I run treatment*(ideology>4) in one regression, but I also try other > models with ideology>5, 6, etc. to see if the cutoff makes a difference. > For Amelia, would it be sufficient to include a continuous > treatment*ideology column in my data, or do I need to dichotomize this > column in Amelia as well (and hence run Amelia multiple times, once for > each cutoff)? > you definitely want to include at least as much information in the imputation model as in your analysis model. > > > 3) In order to get bootstrapped confidence intervals in my analysis, I > bootstrap the original data 1000 times and then run Amelia (with m=5) on > each bootstrapped dataset before analyzing. Although the original dataset > works just fine, the bootstrapped versions throw errors about 1/3 of the > time: first a few hundred "chol(): given matrix is not symmetric" warnings > followed by a "inv_sympd(): matrix is singular or not positive definite" > error. Usually all 5 imputations fail for a given bootstrapped data set, > but sometimes only some of them do. Suggestions? > i'd hunt this down. i'm guessing that you have included some dummy variables that don't exclude the baseline (so the all sum to 1), or something close to that. you probably have either too small an n or perfect collinearity. i'd track this down since it might be a data error that affects everything else too. Best of luck with your work. Gary > > Thanks, > > Matthew Simonson > Doctoral Student, Northeastern University, Boston > * The COVID-19 Consortium for Understanding the Public’s Policy > Preferences Across States * > *Research Areas: Networks, Civil Wars, COVID-19, Causal Inference* > *www.msimonson.com <http://www.msimonson.com/>* > *www.covidstates.org <http://www.covidstates.org>* > > >

3 years, 10 months

Create composite before or after imputation

by Brandon Mccormick

Hi All, For my analysis, I have a number of event count variables (22) I want to add up to make a composite. My original plan was to Square root transform the counts during imputation. Then add them up to make the composite after imputation. But alternatively, I could add them up prior to imputation, and then root transform them during imputation. Which is better? (Sorry if it’s a bad question I have little experience with MI) -- Brandon McCormick Doctoral Student Clinical Psychology - Psychology and Law The University of Alabama <https://www.ua.edu/> 101 McMillan Tuscaloosa, AL 35401 Phone 205-460-8678 bfmccormick(a)crimson.ua.edu [image: The University of Alabama stacked logo with box A] <https://www.ua.edu/>

3 years, 10 months

Three questions

by Matthew Simonson

Hello Amelia Team, Three questions about a dataset I'm working with. 1) My dataset consists of a 10-wave survey in which some survey questions were only asked in the final 2 waves. As a block matrix it looks like this A M B C where all values in block M are structurally missing and not of interest. I want to impute the missing values in blocks A, B, and C. I could either a) run amelia on the full dataset, b) split it into two datasets, one with A and B, the other with only C, or c) split it into two overlapping datasets, one with A and B, the other with B and C. I'm hesitant to use the full dataset because including block M increases the missingness from 25% to 55%, but I don't know if there are theoretical objections to the other two approaches. What do you suggest? 2) In my analysis, I examine the interaction between treatment and ideology. I run treatment*(ideology>4) in one regression, but I also try other models with ideology>5, 6, etc. to see if the cutoff makes a difference. For Amelia, would it be sufficient to include a continuous treatment*ideology column in my data, or do I need to dichotomize this column in Amelia as well (and hence run Amelia multiple times, once for each cutoff)? 3) In order to get bootstrapped confidence intervals in my analysis, I bootstrap the original data 1000 times and then run Amelia (with m=5) on each bootstrapped dataset before analyzing. Although the original dataset works just fine, the bootstrapped versions throw errors about 1/3 of the time: first a few hundred "chol(): given matrix is not symmetric" warnings followed by a "inv_sympd(): matrix is singular or not positive definite" error. Usually all 5 imputations fail for a given bootstrapped data set, but sometimes only some of them do. Suggestions? Thanks, Matthew Simonson Doctoral Student, Northeastern University, Boston The COVID-19 Consortium for Understanding the Public’s Policy Preferences Across States Research Areas: Networks, Civil Wars, COVID-19, Causal Inference www.msimonson.com<http://www.msimonson.com/> www.covidstates.org

3 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia May 2020