Hi James,
A few thoughts. If the truck length variables changes over time, then
you can impute it along with the counts in the same Amelia run. If
there is no empirical time dependence, then Amelia will not use time
to impute the truck lengths.
As for the truly missing truck lengths. You can always go back into
your imputed data and manually code those as missing. R code would
look something like:
a.out <- amelia(***your call here***)
for (i in 1:length(a.out$imputations)) {
mask <- a.out$imputations[[i]]$count == 0
is.na(a.out$imputations[[i]]) <- mask
}
You'll want to double check that it works (I haven't tested it). The
reason why this will work is that the imputed cell has not added any
information to the data itself, it has only added the information from
the observed values of the cell. Thus, omitting that observation from
the imputation entirely would bias the imputation.
I hope that helps.
Cheers,
matt.
On Tue, Sep 14, 2010 at 1:03 AM, James Marca
<jmarca(a)translab.its.uci.edu> wrote:
Hi,
I have a data set that is based on observations of vehicles by lane.
For example, each truck that passes the detector will be counted, and
its characteristics recorded (length, weight etc). By summing up the
counts into higher time periods, say an hour, I can use Amelia to
impute missing counts of vehicles (statisticians look the other way,
but I tell Amelia that the time series varies by time of day (the ts
variable runs from 0 to 24) and by inserting day of week as the cs
(cross section) variable (0 through 6). While that may be
non-standard perversion of the input parameters, it seems to work
pretty well.) I have other data for the missing periods from other
detectors, so I think it makes sense to try to use Amelia rather than
simply estimating a time series model for the missing counts.
Now that I can impute counts I want to impute missing characteristics.
For example in an hour of good observation, every truck will have a
length recorded. When the detector is kaput for some reason, I want
to impute the missing average lengths along with the missing truck
counts.
The problem is that sometimes there are no observations (a true count
of zero) for a period, and so the expected length for the period is a
"true" NA, rather than just a missing variable. This is quite common;
while the trucks are *usually* in the right hand lanes, they are
sometimes are detected in the middle lanes. The middle lane detectors
therefore *usually* have a count of zero and indeterminate characteristics.
My question is how to proceed using Amelia. My naive strategy would
be to run Amelia once to impute the counts, and then run Amelia again
for each imputation (5 times), for the characteristics of the vehicles
(as a non-time dependent imputation) *only* for the non-zero periods
and lanes, and then use Zelig to compute average lengths. Does this
make sense, or have I crossed the line from imputation to imagination?
My other thought would be to aggregate up to daily periods and make it
so there should never be zero counts, but I'd really like to preserve
the hourly variation in the data.
One other note: I've coded my data by observation time (with multiple
lanes of data). I could also code it as one record per lane per
observation time, which would allow me to drop zero count lanes. I
just can't see how this would help.
Any advice would be appreciated.
Regards,
James Marca
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia