Dear all,
I have two questions and would be very grateful for your help:
1) Is there a problem with running imputations on different subsets of your
full data set when I use the same variables in my models from different
imputations?
2) Do I have to include lags in the imputation specification that I expect
I 'might' use in my models (although I'm not sure yet)? For example, all
independent variables 'might' be lagged one year to allow for their effect
to 'spread' to the outcome variable. If I don't include them and decide to
use lags after a first run of imputations, do I have to go back to Amelia,
include the lags, and run it again?
Best,
Nicole
Nicole Janz, PhD Cand.
Lecturer at Social Sciences Research Methods Centre 2012/13
University of Cambridge
Department of Politics and International Studies
www.nicolejanz.de | nj248(a)cam.ac.uk | Mobile: +44 (0) 7905 70 1 69 4
Skype: nicole.janz
Dear all,
I have a panel data set with economic and political variables for
country-years. I want to estimate 2 slightly different models and compare
the results. One Model includes Var1 and some controls, the second includes
a break-down of Var1 into its components, and the same (!) controls. I
claim that Var1 can better explain the outcome when broken down into its
components. More concretely, I compare the effects of total FDI with
effects of FDI broken down into business sectors.
Model 1 is:
Y = Var1 + Controls
Model 2 is:
Y = Var1a + Var1b + Var1c + Controls
Ideally I would run an imputation that includes all variables (in model 1
and 2), and then estimate the models. However, my problem is: For Model 2 I
have very bad data availability for a bulk of country-years in Var1a,
Var1b, Var1c.
Solution 1: I could kick out country-years with >80% missingness from the
complete panel data set and run one overall imputation. I would then
estimate Model 1 and 2 with the same imputed data set(s) and the estimates
would remain comparable. However, this means that Model 1 would be
estimated with less country-years than originally possible, just because I
want to compare it with Model 2 which has high missingness in its
variables.
Solution 2: I am also thinking of building subsets of my master table to
run 2 separate imputations. Subset 1 for Model 1 would include all years
and countries - but not Var1a + Var1b + Var1c from model 2. Subest 2 for
Model 2 would include all variables; but I would cut out 5 years and 20
countries which have >80% missingness. Var1a + Var1b + Var1c would
obviously remain in the subset. I am hesitating because there is an overlap
in the variables in the two models (controls are the same, Var1 is broken
down into its components), and I want to compare the results.
What do you think conceptually and from the imputation point of view about
the trade-off between comparability and preservation of data points for
Model 1?
Thank you very much in advance!
Best,
Nicole
-
Nicole Janz
Doctoral Researcher
University of Cambridge
Politics and International Studies
www.nicolejanz.de | nj248(a)cam.ac.uk | +44 (0) 7905 70 1 69 4
Skype: nicole.janz
Dear Amelia Users and Developers,
I just started to use Amelia II to impute missing values a longitudinal
dataset. I have a specific question about that and would like to know if
you could help me to understand what is the correct way to impute the
data in my case. My data are really simple: units are individuals with
ID, gender, race, years of education, etc. and the income per hour for
every year between age 30 and age 50. I would like to impute the missing
values on the income per hour. I tried to impute the data already but it
seems that the imputed values do not take into consideration prior and
later observations. For example, if income at age 38 is missing, I would
like to impute a value based on the income at age 37 and age 39, which
is not the case for the moment.
On this topic, I also found this post on a blog:
http://stats.stackexchange.com/questions/12873/multiple-imputation-for-miss…
but I am not sure if this correspond to my situation. The imputation I
made was with the format I had (one line - one unit: first line = ID 1,
income.age.30, income.age.31, ...; second line = ID 2, income.age.30,
income.age.31...). Attached to this email you will find an extract of
the data under this form. Based on this post, If I understood it
correctly, I have to create a times series variables and transpose the
same individual on different lines (e.g. first line = ID 1,
income.age.30, income.age.31; second line = ID 1, income.age.30,
income.age.31, ...). But I am not really sure about that.
I will be happy if you could give me an advice or if you could tell me
if the topic where already discussed in this mailing list.
Thank you very much!
Best wishes,
Francesco Giudici
--
Francesco Giudici
Postdoctoral Fellow
Teachers College, Columbia University
New York, NY 10027
fg2296(a)columbia.edu
--
Francesco Giudici
Postdoctoral Fellow
Teachers College, Columbia University
New York, NY 10027
fg2296(a)columbia.edu
Is there a way to bound variables via relationships with other variables in
the dataset when multiply imputing with Amelia? For example, if we have a
household income variable and a household savings variable (both with some
missingness) in our dataset, is there a way to specify that savings must be
less than income for each observation in the imputed datasets? Priors and
logical bounds currently in Amelia don't seem like they are set up to do
this exactly.
Thanks!
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam