Hi Philippe,
Unfortunately, there is no hard and fast rule about when to include or exclude a variable
from the imputation model. As you point out, there is a bias-variance trade-off inherent
in the choice of covariates. At a minimum, you should include any covariates that will be
in an analysis model. Beyond that, try to include the variables that are the most
predictive of the missingness in main variables of interest. This is the most efficient
strategy for fulfilling the "missing at random" assumption.
Hope that helps!
Cheers,
matt.
~~~~~~~~~~~
Matthew Blackwell
Institute for Quantitative Social Science
Department of Government
Harvard University
url:
http://www.mattblackwell.org
On Wednesday, June 20, 2012 at 8:59 AM, Philippe Sulger wrote:
Dear all
I have a question concerning the inclusion of (auxiliary) variables into the missing data
procedure. I understand that a rather "inclusive" strategy can increase
efficiency and reduced bias.
Now, I also have the feeling that the inclusion of a certain (auxiliary) variable can
have an additional cost that depends on the degree of missingness of this (auxiliary)
variable itself. If the latter is "too high", couldn't this result in a
higher disadvantage of including this variable relative to the advantage (increase in
efficiency and/or reduced bias) that the inclusion of the variable could have? If yes,
does there exist a measure/rule of thumb to evaluate and judge on this trade off?
Thank you for your efforts.
Philippe
--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia
Amelia mailing list
Amelia(a)lists.gking.harvard.edu (mailto:Amelia@lists.gking.harvard.edu)
https://lists.gking.harvard.edu/mailman/listinfo/amelia