Amelia

amelia@lists.gking.harvard.edu

669 discussions

Re: [amelia] Query on Amelia::combine.output()

by Matt Blackwell

Hi Stuart, Ah, `combine.output()` actually just takes multiple Amelia runs that were done separately and combines them into one object, as if you didn't them all together. This is helpful if you want to run additional imputations after a first batch. Here is some code that will take Amelia output and create a stacked data frame of all imputations with a column for imputation numbers: library(Amelia) data(africa) imps <- 5 a.out <- amelia(africa, cs = "country", ts = "year", m = imps) stacked_df <- do.call(rbind, a.out$imputations) stacked_df$imp_number <- rep(1:imps, each = nrow(africa)) Having said all of that, you probably don't want to do this. Instead, you probably want to apply your analysis model to each of the imputed data sets and then combine the coefficients/model parameters using the Rubin rules described in the various Amelia papers. Cheers, Matt ~~~~~~~~~~~ Matthew Blackwell Associate Professor of Government Harvard University url: http://www.mattblackwell.org On Fri, Oct 9, 2020 at 10:33 PM <stuart.reece(a)bigpond.com> wrote: > Hi Amelia Users. > > > > I am running a Windows computer i9-9900K CPU 3.6GHz, 64MB RAM, 64-bit > system. > > > > I have the R Studio 1.3.1093 based on R 4.0.2, just re-installed today. > > > > Amelia works on my data and runs models very nicely. The parallel > routines work really well, which I very much appreciate on my 16 CPU’s. > > > > However I use complex geospatial models and would love to model the > complete imputed geospatial data in R::splm. > > > > So combining all the imputations into one df would be a fantastic > assistance. > > > > I think combine.output should do this very nicely. > > > > I think the syntax for combine.output is probably like that of ameliabind > – really simple…. Can’t find the syntax online…. > > > > But whenever I run combine.output – with whatever syntax – I always get > the same error message which reads: > > > > CombAmelia1616 <- Amelia::combine.output(a.r.CS.Raw.ETOPFA.LIR.02.16, > a.r.CS.Raw.ETOPFA.LIR.02.16e) > > Error: 'combine.output' is not an exported object from 'namespace:Amelia' > > > > I was wondering please if something is wrong?? > > > > Also – could someone please confirm that the correct syntax for > combine.output is the same as ameliabind – super simple???? > > > > Thanks so much, > > > > Stuart Reece. > > > > > > >

3 years, 7 months

Re: [amelia] Including all my analysis FEs

by Gary King

Hi Nick, the way to think about it is that if you omit a variable, you're implicitly including the variable and restricting its coefficient (in any regression) to 0, unless that relationship is picked up by other variables you do include. Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Mon, Jul 13, 2020 at 7:28 PM Nick Eubank <nick(a)nickeubank.com> wrote: > Hi All, > > I'm working with panel data (state-years) and a difference-in-difference > design. I'm looking to fill some values in the dependent variable time > series. > > My main specification is thus just two-way fixed effects (state FEs and > year FEs) and a treatment variable. > > I've gotten the best results modeling the missing data by running a local > polynomial regression (a la Honaker and King 2010) for each state, and then > using those predicted values as the predictor in Amelia. (i.e. run I run a > local polynomial of my DV against time for state S, fill those values in to > my predictor variable for state S, repeat for all states) > > Everything I've read says I should definitely include all variables I plan > to use in my analysis in Amelia, but I worry about failing to meet > multi-variate normality conditions with the FEs, and running the model with > and without them, I'm not sure they're adding much. > > Do I *need* to include all those FEs (i.e. will I introduce some weird > bias in my subsequent analysis if I don't)? And if I do, is there anything > I can do to deal with them definitely not being multi-variate normal (or do > I not need to worry about that)? > > Thanks! > > Nick >

3 years, 9 months

Including all my analysis FEs

by Nick Eubank

Hi All, I'm working with panel data (state-years) and a difference-in-difference design. I'm looking to fill some values in the dependent variable time series. My main specification is thus just two-way fixed effects (state FEs and year FEs) and a treatment variable. I've gotten the best results modeling the missing data by running a local polynomial regression (a la Honaker and King 2010) for each state, and then using those predicted values as the predictor in Amelia. (i.e. run I run a local polynomial of my DV against time for state S, fill those values in to my predictor variable for state S, repeat for all states) Everything I've read says I should definitely include all variables I plan to use in my analysis in Amelia, but I worry about failing to meet multi-variate normality conditions with the FEs, and running the model with and without them, I'm not sure they're adding much. Do I *need* to include all those FEs (i.e. will I introduce some weird bias in my subsequent analysis if I don't)? And if I do, is there anything I can do to deal with them definitely not being multi-variate normal (or do I not need to worry about that)? Thanks! Nick

3 years, 10 months

New Amelia user - bug when starting AmeliaView

by Gustavo S. Libardi

Dear all, I have installed AmeliaView, but it is presenting a bug. When I start it, a terminal screen appears and closes quickly. I tried to run as administrator, and it showed an error dialog, saying "Can not find script file C:\WINDOWS\System32\amelia.vbs." I tried to copy such file from AmeliaView's folder to the System32, and it returned another error message "Script: C:\WINDOWS\System32\amelia.vbs Line: 5 Char: 1 Error: File not found Code: 800A0035 Source: Microsoft VBScript runtime error" Could someone help me to solve this? Thanks in advance! Gustavo. -- *|MSc. Gustavo Simões Libardi - "Napister"* *|*Biólogo e Mestre em Ciências (Universidade de São Paulo/Brasil) *|*CV Lattes: http://lattes.cnpq.br/8451514538020691 |CRBio: 72563/01-D

3 years, 11 months

Re: [amelia] Three questions

by Gary King

Hi Matthew, a few notes below... Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org <http://garyking.org/> - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Mon, May 25, 2020 at 7:29 PM Matthew Simonson < simonson.m(a)northeastern.edu> wrote: > Hello Amelia Team, > > Three questions about a dataset I'm working with. > > 1) My dataset consists of a 10-wave survey in which some survey questions > were only asked in the final 2 waves. As a block matrix it looks like this > > A M > B C > > where all values in block M are structurally missing and not of interest. > I want to impute the missing values in blocks A, B, and C. I could > either a) run amelia on the full dataset, b) split it into two datasets, > one with A and B, the other with only C, or c) split it into two > overlapping datasets, one with A and B, the other with B and C. I'm > hesitant to use the full dataset because including block M increases the > missingness from 25% to 55%, but I don't know if there are theoretical > objections to the other two approaches. What do you suggest? > There isn't a correct answer here, but your suggestions are reasonable. I'd mainly just make sure that the imputation model fits the data. that will enable you to pick an approach. (You might also have a look at this <https://gking.harvard.edu/files/abs/not-abs.shtml> somewhat related paper, but I'd still use Amelia rather than coding it up separately.) > > 2) In my analysis, I examine the interaction between treatment and > ideology. I run treatment*(ideology>4) in one regression, but I also try other > models with ideology>5, 6, etc. to see if the cutoff makes a difference. > For Amelia, would it be sufficient to include a continuous > treatment*ideology column in my data, or do I need to dichotomize this > column in Amelia as well (and hence run Amelia multiple times, once for > each cutoff)? > you definitely want to include at least as much information in the imputation model as in your analysis model. > > > 3) In order to get bootstrapped confidence intervals in my analysis, I > bootstrap the original data 1000 times and then run Amelia (with m=5) on > each bootstrapped dataset before analyzing. Although the original dataset > works just fine, the bootstrapped versions throw errors about 1/3 of the > time: first a few hundred "chol(): given matrix is not symmetric" warnings > followed by a "inv_sympd(): matrix is singular or not positive definite" > error. Usually all 5 imputations fail for a given bootstrapped data set, > but sometimes only some of them do. Suggestions? > i'd hunt this down. i'm guessing that you have included some dummy variables that don't exclude the baseline (so the all sum to 1), or something close to that. you probably have either too small an n or perfect collinearity. i'd track this down since it might be a data error that affects everything else too. Best of luck with your work. Gary > > Thanks, > > Matthew Simonson > Doctoral Student, Northeastern University, Boston > * The COVID-19 Consortium for Understanding the Public’s Policy > Preferences Across States * > *Research Areas: Networks, Civil Wars, COVID-19, Causal Inference* > *www.msimonson.com <http://www.msimonson.com/>* > *www.covidstates.org <http://www.covidstates.org>* > > >

3 years, 11 months

Create composite before or after imputation

by Brandon Mccormick

Hi All, For my analysis, I have a number of event count variables (22) I want to add up to make a composite. My original plan was to Square root transform the counts during imputation. Then add them up to make the composite after imputation. But alternatively, I could add them up prior to imputation, and then root transform them during imputation. Which is better? (Sorry if it’s a bad question I have little experience with MI) -- Brandon McCormick Doctoral Student Clinical Psychology - Psychology and Law The University of Alabama <https://www.ua.edu/> 101 McMillan Tuscaloosa, AL 35401 Phone 205-460-8678 bfmccormick(a)crimson.ua.edu [image: The University of Alabama stacked logo with box A] <https://www.ua.edu/>

3 years, 11 months

Three questions

by Matthew Simonson

Hello Amelia Team, Three questions about a dataset I'm working with. 1) My dataset consists of a 10-wave survey in which some survey questions were only asked in the final 2 waves. As a block matrix it looks like this A M B C where all values in block M are structurally missing and not of interest. I want to impute the missing values in blocks A, B, and C. I could either a) run amelia on the full dataset, b) split it into two datasets, one with A and B, the other with only C, or c) split it into two overlapping datasets, one with A and B, the other with B and C. I'm hesitant to use the full dataset because including block M increases the missingness from 25% to 55%, but I don't know if there are theoretical objections to the other two approaches. What do you suggest? 2) In my analysis, I examine the interaction between treatment and ideology. I run treatment*(ideology>4) in one regression, but I also try other models with ideology>5, 6, etc. to see if the cutoff makes a difference. For Amelia, would it be sufficient to include a continuous treatment*ideology column in my data, or do I need to dichotomize this column in Amelia as well (and hence run Amelia multiple times, once for each cutoff)? 3) In order to get bootstrapped confidence intervals in my analysis, I bootstrap the original data 1000 times and then run Amelia (with m=5) on each bootstrapped dataset before analyzing. Although the original dataset works just fine, the bootstrapped versions throw errors about 1/3 of the time: first a few hundred "chol(): given matrix is not symmetric" warnings followed by a "inv_sympd(): matrix is singular or not positive definite" error. Usually all 5 imputations fail for a given bootstrapped data set, but sometimes only some of them do. Suggestions? Thanks, Matthew Simonson Doctoral Student, Northeastern University, Boston The COVID-19 Consortium for Understanding the Public’s Policy Preferences Across States Research Areas: Networks, Civil Wars, COVID-19, Causal Inference www.msimonson.com<http://www.msimonson.com/> www.covidstates.org

3 years, 11 months

Smoothing TSCS Imputed data?

by Xiaobo Lu

Hi Matt, Thanks again for your help. I am running another issue with Amelia imputation and I hope you can point me to the right direction. I am using Amelia to impute historical CCP party membership and population at the county level. I have used the following Amelia imputation comments, which has specified the TSCS structure and added lags and leads as well as the squared terms to improve the imputation results. a.out=amelia(data, m = 10, idvars = c("province", "prefecture", "county", "prov_id", "pref_id", "sgn_base2", "js_base2", "jcj_base2", "base1", "base2"), ts = "year", cs = "county_id", empri = 100, lags = c("population", "ccp_member"), leads = c("population", "ccp_member"), logs = c("population", "ccp_member","population_sq", "ccp_member_sq", "ccp_branch", "ccp_comission","ccp_branch_sq", "ccp_comission_sq"), polytime=2, incheck = TRUE, max.resample = 1000, tolerance = 0.01) However, the imputed data has very high variance across different years, which defy the patterns that we should have observed in non-missing data. For example, the original party membership with missing value sand the imputed values for one county looks like the following: Year Original Imputed 1938 75 75 1939 588 588 1940 25.15989 1941 29.70148 1942 17.14454 1943 5.593282 1944 35.50387 1945 288.5483 1946 248.8093 1947 82.34441 1948 124.5035 1949 358 358 What worries me is that the imputed party membership jumps up and down so much between 1940 and 1948. I have seen other counties with complete time trend data, I don't think the imputed values in this case are in any way close to the reality. What I can do to smooth the time trend for the imputed values? Any suggestions would be much appreciated. Meanwhile, I have also an unrelated question: how can I only analyze a subset of the imputed data? The "subset" command seems to be incompatible with imputed dataset, as I cannot use the following command "a.out.1945 <- subset(a.out$imputations$year<=1945)" Thanks a ton in advance! Best, Xiaobo -- Xiaobo Lü Associate Professor Department of Government University of Texas at Austin Tel: (512) 232-7257 Fax: (512) 471-1061 Website: www.xiaobolu.com

4 years, 2 months

Generating a new variable in the imputed dataset

by Xiaobo Lu

Hi there, This may sound like a stupid question so I hope someone can help me out there. I have created five imputed dataset using Amelia II in R, and I am using Zelig to analyze the data. I would like to create a new variable based on the imputed variables in these imputed datasets. How can I do this because the data is in "a.out"? Thanks a ton in advance! Best, Xiaobo

4 years, 3 months

Amelia.plot apparent bug

by Paul Dunmore

When exactly one case is missing in a data column, plot.amelia includes that column in those to be plotted. However, if compare=TRUE (the default), the call to compare.density() fails with the fatal error message "need at least 2 points to select a bandwidth automatically". Now that I know, I can work around it; but either the case could be avoided or Amelia could provide its own message to be more informative? Cheers, Paul.

5 years, 5 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia