Hello,
Can anyone tell me how to get and report bivariate correlations in
the multiple imputed datasets with Amelia? Shall I calculate the
correlations between the variables of interst for each of the MI
datasets and then average them? Is there a way to do this with "miest"
(this is the program I am using to combine the imputations). What
about measures of goodness of fit?
I have used miest, but it doesn't work with some models and not for
matching....Is there an alternative, not very complex way to combine
the MI datasets for analysis?
Thanks again!!
Carlos
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Good evening,
I am new to Amelia and have been reading the related papers (King et al.
2001, Honaker and King 2008, and Honaker and King 2007)and the manual, but I
still have three doubts and I would appreciate your help or perhaps you
could suggest some more readings about how to deal with these practical
issues
*1)* In the APSR article by King et al (2001) it says: "The prediction
required is not causal [...]. To an extent then, the analyst, rather than
the world that generates the data, controls the degree to which the MAR
assumption fits. It can be made to fit the data by including more variables
in the imputation process to predict the pattern of missingness."
*Should this be taken to mean that the more variables included in the
imputation model (even if they are not very relevant for the estimation
model), the better? If this is right, I assume that such variables should
have no missing obs. (otherwise it'd be like demanding more from Amelia in
terms of imputation rather than improving the imputation), is this right?*
At some other point in the text, the authors (or this is what I
interpreted) hint that "overcontrolling" is not a problem in the imputation
model. However, I have TSCS data (700 obs.) and *when I ran AmeliaII, I had
to drop many variables (even some interaction terms that are relevant for
my analysis model) and the dummies for countries because I kept on getting
an error message reading more or less: "the number of parameters
to estimate is too large relative to observations" and another message about
multicollinearity*. So, I ended up with fewer variables in my imputation
model than in my analsysi model, which leads to my 2 question:
*2)* On p. 57 the authros mention that the imputation model should contain
at least as much information as the analysis model. Does this means that *if
I am using fixed effects in the analysis model the country dummies cannot
be excluded from the imputation model?*
*3)* Finally, I have read about the *polynomials of time* option in the
manual but I am a litte confused...Is the option polynomials of time related
to the *lags* (if I tick "1" in the polynomials box would this be equivalent
to L1 and if I tick "2" would this be equivalent to L2..., etc.? Is it used
for the lags only when the polynomials are interacted with the
cross-section?... Or are they completely different options. IF the
polynomials of time refer to "*trends*" I am wondering if it would be
redundant to include in the imputation model a variable that belongs in my
analysis model and which is a time-trend varaible.
Sorry about so many questions, but I have been trying to solve this issues
on my own and with the readings but I am unsure as to whether what I am
doing is OK.
Thank you so much.
Sincerely,
Helena Brown
Hello,
What would be evidence that the MAR assumption holds? Could I for
example, run a regression in which the dependent variable is a dummy
for "missing obs." and the independent variables are the other
covariates in the analysisi model? Or would it be enought to show
relatively strong bivariate correlations between missingness and
control variables?
I guess too strong correlation between missing observations for a
variable and a dummy for time period or county will pose a problem for
multiple imputation because it may be a sign of truncation or
systematic missingnes...then MAR would not hold, right?
Are there any formal tests for showing that the data are not missing
at completly random (MCRA) but at MAR?
THANKS TO ALL and have a nice day,
Carlos
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Good evening,
Could anyone please tell me what's the best way to get lagged variables with MI AmeliaII?
should I do the multiple imputation and then get STATA to generate the lagged variables in each of the imputed datasets?
should I first generate the lagged variables (even if data are missing) and then do the imputation?
should I get AmeliaII to create the lagged variables at the same time it imputes the missing values for the variables? I have tried using the "lag" option in TSCS in AmeliaII but no lagged variables were created so I'm no sure how it works.
thanks for your help and sorry about two postings tonight.
best regards,
carola
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Hello,
I'm using site dummies in lieu of fixed effects and the "miest" command
in Stata. When I do this, I get a "conformability" error. I am pretty
sure is a consequence of different sites being dropped from different
imputed datasets when I run the logit model. For example, the imputation
process generated a situation in which site A is dropped from the
analysis in imputed dataset 1 because there is no variation on the
dependent variable for site A in dataset 1. However, for dataset 2, the
imputation process generated variation in the dependent variable for
site A, so site A is not dropped from that analysis. Then, when "miest"
attempts to combine the coefficients and standard errors from the
imputed datasets, it is doing so with different matrices and running
into problems.
Since I don't really care about the coefficients for the site dummies,
can I just run the regressions one at a time on each of the imputed
datasets and then manually combine the coefficients on the variables
with which I am concerned? I understand it is necessary to sum the
within and between variance to get proper standard errors; my question
is whether the fact that different sites will be dropped from different
analyses invalidates the combining of the coefficients and their
standard errors.
Thanks to anyone who can help.
Steve Shewfelt
PhD Candidate
Department of Political Science
Yale University
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia