As a heads up, this was the product of a lot of collective frustration,
error messages, and core dumps on the part of myself, Jens, Ben, Federico,
and others. On other data sets, here are some things to watch out for (some
of which are in the notes in the code) on other datasets:
1) Try to use as few categorical variables as possible; reserve them for
things that really have no logical ordering. If you get an empty cells
error message, this might be the problem.
2) If you declare a variable to be categorical, it really can't have zeros
in it. This causes R to crash on my computer, and I think it is the
reason for the core dumps on the server.
3) If a continuous variable is strongly skewed, try transforming it so
that it is more approximately normal. This can be a reason for the em.mix
stage to go haywire.
4) As always, watch out for linear dependencies in the data. I
accidentally put log(x) and log(x^2) in at the same time and it wasn't
happy. Likewise, if you have a dummy variable (say a country) for which
all of the values of a different variable are missing (say GDP), then you
can't include the dummy; there is nothing to pin down the values.
5) Make sure the data object is a matrix before you start the mix
functions. Leaving it as a data frame produces a list cannot be coerced
to double error message.
6) Most of the problems seem to arise when the data doesn't look exactly
like you think it should look; look at it before running the mix
functions.
Cheers,
Mike
On Thu, 16 Dec 2004, Olivia Lau wrote:
Mike gets, I think, major brownie points for this. 8)
This
means that everyone should at least try multiple imputation for
their papers and report whether their coefficients and
quantities of interest display any bias before imputation. 8)