Amelia September 2009

amelia@lists.gking.harvard.edu

10 participants
7 discussions

by Fabr�cio Fialho

Hello, My question is not about how Amelia works, but when should I use it in my analysis. I am analyzing survey data, and some variables I intend analyze are derived from other ones. An example: I want to create, for instance, a single variable (say, an index) of political engagement using information from variables about different political activities -- let's suppose I want to create an additive index of political activism summing five variables of political activities, where 1 = the respondent took part in the activity, 0 = s/he didn't take. However, I have missing cases in these five variables I would like to sum. Let's also suppose I don't have prior justification to attribute either 0 or 1 to the missing cases. This way, when I add up these five variables, I will also have some missing cases in my index. Trying to eliminate these missing cases, how should I proceed? Should I impute data in my original variables and, then, create my index (adding up the already "complete" cases)? Or should I create the index using variables with missing case and impute data later (that is, imputing data in the final index)? I am using an additive index as example, yet I believe this question might also apply to other techniques as factor analysis and so one. Should missing data be imputed before or after of this kind data processing? Thanks, Fabricio Fialho -- web: sites.google.com/site/fabriciofialho/

14 years, 6 months

setting up an imputation

by Tom� Kubi�

Hi, I would like to ask a question regarding a dataset with survey data in which each person has an id. There were different objects evaluated with the same set of categorical variables. Thus a person occures in the data more times and thus more rows have the same id. Should I set this variable as ids = "id" or cs="id". Or do you think it is ok to treat this id as a normal variable, i.e. not to make any special settings? Thanks. Best regards, Tomas

14 years, 7 months

Why do we need to model the complete data?

by Cyrus Samii

I often work with survey datasets for which n<p(p+3)/2 but where the number of variables with missing data is a lot less than n. My intuition tells me that in these cases, there is no reason to impute from a model with p(p+3)/2 parameters. It seems excessive and requires that we fiddle with dropping variables. I know we could ridge the covariance matrix, but I'd like to bracket that for now to consider the question of whether we really need those p(p+3)/2 parameters in the first place. As an example, suppose we have data with 4 variables, A,B,C, and D. We have complete data for A and B. C and D exhibit arbitrary missingness (maybe monotone, maybe not). In this case, a full joint normal approach such as Amelia's algorithm would estimate parameters for a 4-dimensional normal distribution---that is, 4 means and 10 covariances, or 14 parameters. Suppose as an alternative, we impute using a bivariate normal model, where C and D are modeled as bivariate normal, with means that are regressions onto A and B, and conditional on A and B, C and D covary according to Cov(C,D) not zero. This distribution is thus characterized by 4 regression coefficients, Var(C), Var(D), and Cov(C,D)---that is, 7 parameters (i.e. half as many as in the full joint model) . Is there any reason that the second, bivariate, approach wouldn't be preferred? (If so, the obvious follow-up is, why are we trying to build more model than we need?) Thanks for any illumination! Cyrus -- Cyrus Samii Political Science Columbia University cds81(a)columbia.edu Burundi Survey: www.columbia.edu/~cds81/burundisurvey/ ISERP Statistical Consulting: www.iserp.columbia.edu/services/statistical_consulting.html Comparative Political Economy Blog: cpecolumbia.blogspot.com - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 7 months

by student09＠web.de

Hi I am new to multiple imputation and so my question is apparently very basic. But still: So far I was unable to find any examples showing how to report results based on, say, 5 imputated datasets. How should one report the results - *simply* take the average for each regression coeff./s.e. across all imputed datasets? Many thanks Jan ________________________________________________________________ Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/ - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 7 months

Re: [zelig] Amelia, MatchIt & Zelig

by Gary King

On 09/02/2009 12:20 AM, Donald Braman wrote: > I can't seem to subscribe to the MatchIt list, so I'm posting here in > hopes that the same people read both. > > I'm working with multiply imputed data from Amelia & wondering if any > progress has been made at integrating Amelia, MatchIt, and Zelig -- in > particular, pre-processing multiply imputed data and preparing them > for use in Zelig. > > Not at all pressing, and (again) apologies for posting here rather > than the MatchIt list. > > Cheers, Don At the moment, MatchIt doesn't allow missing data on input (unless you temporarily code it as observed and exact match on the missing values). In part, this is because most matching algorithms don't have procedures to deal with missingness. An exception is CEM which has a procedure for multiply imputed data, but to access that you'll need to use the (R or Stata) version of CEM directly (see http://gking.harvard.edu/cem) instead of the one in MatchIt. (separate message coming to fix your matchit subscription...) Gary --- Gary King Albert J. Weatherhead III University Professor Director, Institute for Quantitative Social Science Harvard University, 1737 Cambridge St, Cambridge, MA 02138 http://GKing.Harvard.Edu, King(a)Harvard.Edu Direct 617-495-2027, Assistant 495-9271, eFax 812-8581 - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 7 months

Re: Re: [amelia] the same question asked: could we use multiply imputed dataset for a descriptive purpose?

by paulandpen＠optusnet.com.au

On this point Frank, I would suggest that any information you generate via your imputation process is dependent on the other information you have available in the imputation model. For example, if I wanted to impute a variable that would be used as a predictor in regression, and I have a lot of salient information about my respondents (associated attitudes, demographrics etc) that helped me predict this variable accurately, than my imputation process offers a substantial amount to my final estimation of that value. That does not differ when you are imputing a nominal or ordinal variable. Gary has tools available in amelia to evaluate the models performance. Using the density comparisons and overimpute offers this validation. Linking overimpute to the the datamining literature, you could look at the use of 'holdout cases' as a method of model validation that they typically use. Thanks Paul > Gary King <king(a)harvard.edu> wrote: > - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 7 months

the same question asked: could we use multiply imputed dataset for a descriptive purpose?

by Frank C.S. Liu

14 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia September 2009