Hi Philippe, 

Unfortunately, there is no hard and fast rule about when to include or exclude a variable from the imputation model. As you point out, there is a bias-variance trade-off inherent in the choice of covariates. At a minimum, you should include any covariates that will be in an analysis model. Beyond that, try to include the variables that are the most predictive of the missingness in main variables of interest. This is the most efficient strategy for fulfilling the "missing at random" assumption. 

Hope that helps!

Cheers,
matt.

~~~~~~~~~~~
Matthew Blackwell
Institute for Quantitative Social Science
Department of Government
Harvard University
url: http://www.mattblackwell.org

On Wednesday, June 20, 2012 at 8:59 AM, Philippe Sulger wrote:

Dear all

I have a question concerning the inclusion of (auxiliary) variables into the missing data procedure. I understand that a rather "inclusive" strategy can increase efficiency and reduced bias.

Now, I also have the feeling that the inclusion of a certain (auxiliary) variable can have an additional cost that depends on the degree of missingness of this
(auxiliary) variable itself. If the latter is "too high", couldn't this result in a higher disadvantage of including this variable relative to the advantage (increase in efficiency and/or reduced bias) that the inclusion of the variable could have? If yes, does there exist a measure/rule of thumb to evaluate and judge on this trade off?

Thank you for your efforts.

Philippe
--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Amelia mailing list
Amelia@lists.gking.harvard.edu
https://lists.gking.harvard.edu/mailman/listinfo/amelia