Amelia October 2012

amelia@lists.gking.harvard.edu

5 participants
4 discussions

by N. Janz

Dear all, I have two questions and would be very grateful for your help: 1) Is there a problem with running imputations on different subsets of your full data set when I use the same variables in my models from different imputations? 2) Do I have to include lags in the imputation specification that I expect I 'might' use in my models (although I'm not sure yet)? For example, all independent variables 'might' be lagged one year to allow for their effect to 'spread' to the outcome variable. If I don't include them and decide to use lags after a first run of imputations, do I have to go back to Amelia, include the lags, and run it again? Best, Nicole Nicole Janz, PhD Cand. Lecturer at Social Sciences Research Methods Centre 2012/13 University of Cambridge Department of Politics and International Studies www.nicolejanz.de | nj248(a)cam.ac.uk | Mobile: +44 (0) 7905 70 1 69 4 Skype: nicole.janz

11 years, 5 months

Imputation of subsets for different Models / comparability

by N. Janz

Dear all, I have a panel data set with economic and political variables for country-years. I want to estimate 2 slightly different models and compare the results. One Model includes Var1 and some controls, the second includes a break-down of Var1 into its components, and the same (!) controls. I claim that Var1 can better explain the outcome when broken down into its components. More concretely, I compare the effects of total FDI with effects of FDI broken down into business sectors. Model 1 is: Y = Var1 + Controls Model 2 is: Y = Var1a + Var1b + Var1c + Controls Ideally I would run an imputation that includes all variables (in model 1 and 2), and then estimate the models. However, my problem is: For Model 2 I have very bad data availability for a bulk of country-years in Var1a, Var1b, Var1c. Solution 1: I could kick out country-years with >80% missingness from the complete panel data set and run one overall imputation. I would then estimate Model 1 and 2 with the same imputed data set(s) and the estimates would remain comparable. However, this means that Model 1 would be estimated with less country-years than originally possible, just because I want to compare it with Model 2 which has high missingness in its variables. Solution 2: I am also thinking of building subsets of my master table to run 2 separate imputations. Subset 1 for Model 1 would include all years and countries - but not Var1a + Var1b + Var1c from model 2. Subest 2 for Model 2 would include all variables; but I would cut out 5 years and 20 countries which have >80% missingness. Var1a + Var1b + Var1c would obviously remain in the subset. I am hesitating because there is an overlap in the variables in the two models (controls are the same, Var1 is broken down into its components), and I want to compare the results. What do you think conceptually and from the imputation point of view about the trade-off between comparability and preservation of data points for Model 1? Thank you very much in advance! Best, Nicole - Nicole Janz Doctoral Researcher University of Cambridge Politics and International Studies www.nicolejanz.de | nj248(a)cam.ac.uk | +44 (0) 7905 70 1 69 4 Skype: nicole.janz

11 years, 6 months

help: income panel data

by Francesco Giudici

Dear Amelia Users and Developers, I just started to use Amelia II to impute missing values a longitudinal dataset. I have a specific question about that and would like to know if you could help me to understand what is the correct way to impute the data in my case. My data are really simple: units are individuals with ID, gender, race, years of education, etc. and the income per hour for every year between age 30 and age 50. I would like to impute the missing values on the income per hour. I tried to impute the data already but it seems that the imputed values do not take into consideration prior and later observations. For example, if income at age 38 is missing, I would like to impute a value based on the income at age 37 and age 39, which is not the case for the moment. On this topic, I also found this post on a blog: http://stats.stackexchange.com/questions/12873/multiple-imputation-for-miss… but I am not sure if this correspond to my situation. The imputation I made was with the format I had (one line - one unit: first line = ID 1, income.age.30, income.age.31, ...; second line = ID 2, income.age.30, income.age.31...). Attached to this email you will find an extract of the data under this form. Based on this post, If I understood it correctly, I have to create a times series variables and transpose the same individual on different lines (e.g. first line = ID 1, income.age.30, income.age.31; second line = ID 1, income.age.30, income.age.31, ...). But I am not really sure about that. I will be happy if you could give me an advice or if you could tell me if the topic where already discussed in this mailing list. Thank you very much! Best wishes, Francesco Giudici -- Francesco Giudici Postdoctoral Fellow Teachers College, Columbia University New York, NY 10027 fg2296(a)columbia.edu -- Francesco Giudici Postdoctoral Fellow Teachers College, Columbia University New York, NY 10027 fg2296(a)columbia.edu

11 years, 6 months

Relationships to other variables

by Patrick Lam

Is there a way to bound variables via relationships with other variables in the dataset when multiply imputing with Amelia? For example, if we have a household income variable and a household savings variable (both with some missingness) in our dataset, is there a way to specify that savings must be less than income for each observation in the imputed datasets? Priors and logical bounds currently in Amelia don't seem like they are set up to do this exactly. Thanks! -- Patrick Lam Department of Government and Institute for Quantitative Social Science, Harvard University http://www.people.fas.harvard.edu/~plam

11 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia October 2012