Amelia November 2008

amelia@lists.gking.harvard.edu

4 participants
5 discussions

Calculate bivariate correlations after using Amelia for MI

by Carlos Rodriguez

Hello, Can anyone tell me how to get and report bivariate correlations in the multiple imputed datasets with Amelia? Shall I calculate the correlations between the variables of interst for each of the MI datasets and then average them? Is there a way to do this with "miest" (this is the program I am using to combine the imputations). What about measures of goodness of fit? I have used miest, but it doesn't work with some models and not for matching....Is there an alternative, not very complex way to combine the MI datasets for analysis? Thanks again!! Carlos - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

15 years, 6 months

imputation model, fixed effects and polynomials of time

by Helen Brown

Good evening, I am new to Amelia and have been reading the related papers (King et al. 2001, Honaker and King 2008, and Honaker and King 2007)and the manual, but I still have three doubts and I would appreciate your help or perhaps you could suggest some more readings about how to deal with these practical issues *1)* In the APSR article by King et al (2001) it says: "The prediction required is not causal [...]. To an extent then, the analyst, rather than the world that generates the data, controls the degree to which the MAR assumption fits. It can be made to fit the data by including more variables in the imputation process to predict the pattern of missingness." *Should this be taken to mean that the more variables included in the imputation model (even if they are not very relevant for the estimation model), the better? If this is right, I assume that such variables should have no missing obs. (otherwise it'd be like demanding more from Amelia in terms of imputation rather than improving the imputation), is this right?* At some other point in the text, the authors (or this is what I interpreted) hint that "overcontrolling" is not a problem in the imputation model. However, I have TSCS data (700 obs.) and *when I ran AmeliaII, I had to drop many variables (even some interaction terms that are relevant for my analysis model) and the dummies for countries because I kept on getting an error message reading more or less: "the number of parameters to estimate is too large relative to observations" and another message about multicollinearity*. So, I ended up with fewer variables in my imputation model than in my analsysi model, which leads to my 2 question: *2)* On p. 57 the authros mention that the imputation model should contain at least as much information as the analysis model. Does this means that *if I am using fixed effects in the analysis model the country dummies cannot be excluded from the imputation model?* *3)* Finally, I have read about the *polynomials of time* option in the manual but I am a litte confused...Is the option polynomials of time related to the *lags* (if I tick "1" in the polynomials box would this be equivalent to L1 and if I tick "2" would this be equivalent to L2..., etc.? Is it used for the lags only when the polynomials are interacted with the cross-section?... Or are they completely different options. IF the polynomials of time refer to "*trends*" I am wondering if it would be redundant to include in the imputation model a variable that belongs in my analysis model and which is a time-trend varaible. Sorry about so many questions, but I have been trying to solve this issues on my own and with the readings but I am unsure as to whether what I am doing is OK. Thank you so much. Sincerely, Helena Brown

15 years, 6 months

MAR assumption

by Carlos Rodriguez

Hello, What would be evidence that the MAR assumption holds? Could I for example, run a regression in which the dependent variable is a dummy for "missing obs." and the independent variables are the other covariates in the analysisi model? Or would it be enought to show relatively strong bivariate correlations between missingness and control variables? I guess too strong correlation between missing observations for a variable and a dummy for time period or county will pose a problem for multiple imputation because it may be a sign of truncation or systematic missingnes...then MAR would not hold, right? Are there any formal tests for showing that the data are not missing at completly random (MCRA) but at MAR? THANKS TO ALL and have a nice day, Carlos - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

15 years, 6 months

Lagged variables and MI/Amelia II

by Carola Herrera

Good evening, Could anyone please tell me what's the best way to get lagged variables with MI AmeliaII? should I do the multiple imputation and then get STATA to generate the lagged variables in each of the imputed datasets? should I first generate the lagged variables (even if data are missing) and then do the imputation? should I get AmeliaII to create the lagged variables at the same time it imputes the missing values for the variables? I have tried using the "lag" option in TSCS in AmeliaII but no lagged variables were created so I'm no sure how it works. thanks for your help and sorry about two postings tonight. best regards, carola - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

15 years, 6 months

Question about analysis of imputed data

by Steve Shewfelt

Hello, I'm using site dummies in lieu of fixed effects and the "miest" command in Stata. When I do this, I get a "conformability" error. I am pretty sure is a consequence of different sites being dropped from different imputed datasets when I run the logit model. For example, the imputation process generated a situation in which site A is dropped from the analysis in imputed dataset 1 because there is no variation on the dependent variable for site A in dataset 1. However, for dataset 2, the imputation process generated variation in the dependent variable for site A, so site A is not dropped from that analysis. Then, when "miest" attempts to combine the coefficients and standard errors from the imputed datasets, it is doing so with different matrices and running into problems. Since I don't really care about the coefficients for the site dummies, can I just run the regressions one at a time on each of the imputed datasets and then manually combine the coefficients on the variables with which I am concerned? I understand it is necessary to sum the within and between variance to get proper standard errors; my question is whether the fact that different sites will be dropped from different analyses invalidates the combining of the coefficients and their standard errors. Thanks to anyone who can help. Steve Shewfelt PhD Candidate Department of Political Science Yale University - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

15 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia November 2008