Hi all,
I'm using Amelia to try and impute missing values on a large-ish dataset
(~100,000 observations and 18 variables). Several of the variables are
ordinal, two are id variables, and two are nominal. After running Amelia,
in the five imputed datasets that are outputted, one or more of the ordinal
variables still has missing values. If I allow Amelia to treat the ordinal
variables as continuous, it imputes all of the missing values. I receive no
error messages and one warning because one of my nominal variables has more
than 10 categories. (That var represents US States.)
Does anyone know offhand what might be causing this? Here is my call to
Amelia:
aout <- amelia( data=subdata,
p2s=2,
m=num.imputed.datasets,
noms=c("state", "delyr"),
ords=c("meduc6","gpc","mager8","mrace4","tobacco",
"alcohol", "chyper", "phyper",
"eclamp",
"sex", "congen", "singleton" ),
idvars=c("recwt", "ourid"),
write.out=FALSE,
tolerance=0.0005
)
Here is the (presumably unrelated) warning message:
Warning message:
The number of catagories in one of the variables marked nominal has greater
than 10 categories. Check nominal specification.
in: amcheck(data = data, m = m, idvars = numopts$idvars, priors = priors,
And here is a summary of the imputed data
summary(aout[[1]])
delyr outcome
state mager8
mrace4 meduc6 gpc sex
Min. :1997 Min. :0.000000 Min. : 1.00 Min. :1.000 Min.
:1.000 Min. :1.000 Min. :17.00 Min. :0.0000
1st Qu.:1999 1st Qu.:0.000000 1st Qu.:12.00 1st Qu.:3.000 1st
Qu.:1.000 1st Qu.:3.000 1st Qu.:38.00 1st Qu.:0.0000
Median :2000 Median :0.000000 Median :27.00 Median :4.000 Median
:1.000 Median :3.000 Median :39.00 Median :1.0000
Mean :2000 Mean :0.007362 Mean :27.30 Mean :4.033 Mean
:1.318 Mean :3.399 Mean :38.54 Mean :0.5117
3rd Qu.:2001 3rd Qu.:0.000000 3rd Qu.:41.00 3rd Qu.:5.000 3rd
Qu.:1.000 3rd Qu.:4.000 3rd Qu.:40.00 3rd Qu.:1.0000
Max. :2002 Max. :1.000000 Max. :56.00 Max. :9.000 Max.
:4.000 Max. :5.000 Max. :47.00 Max. :1.0000
chyper phyper eclamp tobacco
alcohol recwt lbw
Min. :-8.674e-19 Min. :0.0000 Min. :0.000e+00 Min. :0.000e+00
Min. :0.000000 Min. :1.000 Min. :5.425
1st Qu.: 0.000e+00 1st Qu.:0.0000 1st Qu.:0.000e+00 1st Qu.:0.000e+00
1st Qu.:0.000000 1st Qu.:1.000 1st Qu.:8.008
Median : 0.000e+00 Median :0.0000 Median :0.000e+00 Median :0.000e+00
Median :0.000000 Median :1.000 Median :8.115
Mean : 7.651e-03 Mean :0.0397 Mean :2.981e-03 Mean :1.228e-01
Mean :0.009487 Mean :1.000 Mean :8.083
3rd Qu.: 0.000e+00 3rd Qu.:0.0000 3rd Qu.:0.000e+00 3rd Qu.:0.000e+00
3rd Qu.:0.000000 3rd Qu.:1.000 3rd Qu.:8.212
Max. : 1.000e+00 Max. :1.0000 Max. :1.000e+00 Max. :1.000e+00
Max. :1.000000 Max. :1.178 Max. :8.842
NA's :1.283e+03 NA's :1.549e+04
congen singleton ourid
Min. :0.00000 Min. :0.0000 Min. : 80
1st Qu.:0.00000 1st Qu.:1.0000 1st Qu.: 4966824
Median :0.00000 Median :1.0000 Median : 9951050
Mean :0.01376 Mean :0.9687 Mean : 9989422
3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:15000000
Max. :1.00000 Max. :1.0000 Max. :20000000
Notice that eclamp and tobacco still have missing values.
I suppose I can just continuously impute the ordinal variables and sort them
back into categories afterwards, but that doesn't really seem optimal.
Thanks in advance,
Dennis
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia