multilevel model in Amelia
by Natalia & David, Freedman & Pinto
Dear list server group,
Because I've previously analyzed already-imputed data in the past (DXA data
from NHANES) and performed some simple imputations in cross-sectional data,
I've volunteered (been nominated?) to help co-workers perform multiple
imputation in a longitudinal, multilevel data set.
The sample is ~1500 infants who were visited every month for the first year
of life, with the main exposure being parent-reported sugar-sweetened
beverage (SSB) consumption over the last week (yes/no); the outcome is
obesity at age 6 y. Overall, about 17% of the data for SSB is missing,
with the amount of missing data increasing in the latter monthly visits.
While SSB consumption generally increases from 1% to 11% over the 12
months, about 10% of the sample has a ‘yes’ that is followed by a ‘no’ for
SSB intake. There is also a large among of missing data on other level-1
variables, such as solid food introduction, and for level-2 covariates such
as family income, birth weight, mother’s weight status, etc.
In Amelia, I've been treating each child as a cross-sectional unit (cs=
’childID’) and using month of visit for the time-series variable
(ts=’month’). I've included SSB in the lags and leads options. An initial
attempt at using ’polytime=2’ (with or without the intercs=T option) failed
to converge even after an hour.
So, my question is whether this approach, based on using cs=, ts=, lags=,
and leads= is adequate for dealing with multilevel data of this type? Or
should I really be using polytime and interceps=T in Amelia, or using
mice.impute.2l.norm in MICE? None of the intercorrelations in the data are
very strong, with the highest being about r=0.20.
I’ve been running the imputations in Ubuntu with
options(amelia.parallel='multicore', amelia.ncpus=4).
Thanks very much for any help/suggestions
David Freedman, Division of Nutrition, CDC Atlanta