If the imputation model assumes that Var(r)=0, then the analysis model
will biased toward Var(r)=0, and the bias can spill over into other
parameters as well. For example, Var(e) may be overestimated and some of
the fixed effects may have underestimated standard errors.
Whether the bias is serious depends on the situation. There are some
simulations in this paper
J.P. Reiter, T. E. Raghunathan, and S. K. Kinney. "The importance of
modeling the sampling design in multiple imputation for missing data."
<i>Survey Methodology</i>
which frames the question in terms of sample clustering rather than
multilevel models.
One way to reduce the bias is to impute using a misspecified imputation
model that ignores r, but then omit the imputed Y values from your
analysis. Much of the bias that comes from this kind of misspecification
will be in the imputed Ys rather than the imputed Xs. This approach is
discussed in this paper:
von Hippel, P.T. (2007). Regression with Missing Ys: An Improved
Strategy for Analyzing Multiply-Imputed Data. Sociological Methodology
37(1).
which is available here:
http://www.sociology.ohio-state.edu/ptv/publications/publications.html
Best wishes --
Paul von Hippel
Ohio State University
Levi Littvay (UNL) wrote:
Thanks for the quick response.
I gather that whatever program you're using
to run the multilevel model
specifies explicitly what coefficiet is varying in which way. You can
approximate this pretty closely by using the right combination of
covariates in regression or Amelia. e.g., if E(Y)=a+b*X, and you want b
to vary by T[ime], you can say b=c+dT, and substitute this eqn into the
first, giving E(Y)=a+(c+dT)X = a + cX +d(T*X), so if you just put into
Amelia X and T*X, you'd be set. You can extend this pretty far of
course.
This makes perfect sense. (Actually, right now my choice of MLM
package is R's lmer4, which requires this exact specification. If a
Level 1 coefficient varries by a level 2 variable, it is modeled as an
interaction.)
But what I am really concerned with is the case when there is a residual
(random effect) across the level 2 untis associated with B. (In
English, B is allowed to wary across, lets say, across countries). To
use the above notation b=c+dT+r where r is normally distributed random
variation with a mean of 0. This would mean that Y=a+(c+dT)X+r+e (where
e is the level 1 residual term.)
What I am concerned with is the omission of r from the imputation model.
(What I really need to do is simulate up some examples and test bias
that way, but I figured I'd ask first. :)
Thanks
L
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
--
Paul von Hippel
Department of Sociology / Initiative in Population Research
Ohio State University
300 Bricker Hall
190 N. Oval Mall
Columbus OH 43210
614 688-3768
Office hours MW 3-5, F 10-12
I read email every weekday at 3.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia