Dear list server group,

Because I've previously analyzed already-imputed data in the past (DXA data from NHANES) and performed some simple imputations in cross-sectional data, I've volunteered (been nominated?) to help co-workers perform multiple imputation in a longitudinal, multilevel data set.

The sample is ~1500 infants who were visited every month for the first year of life, with the main exposure being parent-reported sugar-sweetened beverage (SSB) consumption over the last week (yes/no); the outcome is obesity at age 6 y. Overall, about 17% of the data for SSB is missing, with the amount of missing data increasing in the latter monthly visits. While SSB consumption generally increases from 1% to 11% over the 12 months, about 10% of the sample has a ‘yes’ that is followed by a ‘no’ for SSB intake. There is also a large among of missing data on other level-1 variables, such as solid food introduction, and for level-2 covariates such as family income, birth weight, mother’s weight status, etc.

In Amelia, I've been treating each child as a cross-sectional unit (cs= ’childID’) and using month of visit for the time-series variable (ts=’month’). I've included SSB in the lags and leads options. An initial attempt at using ’polytime=2’ (with or without the intercs=T option) failed to converge even after an hour.

So, my question is whether this approach, based on using cs=, ts=, lags=, and leads= is adequate for dealing with multilevel data of this type? Or should I really be using polytime and interceps=T in Amelia, or using mice.impute.2l.norm in MICE? None of the intercorrelations in the data are very strong, with the highest being about r=0.20.

I’ve been running the imputations in Ubuntu with options(amelia.parallel='multicore', amelia.ncpus=4).

Thanks very much for any help/suggestions

David Freedman, Division of Nutrition, CDC Atlanta