Hi,
I have a data set that is based on observations of vehicles by lane.
For example, each truck that passes the detector will be counted, and
its characteristics recorded (length, weight etc). By summing up the
counts into higher time periods, say an hour, I can use Amelia to
impute missing counts of vehicles (statisticians look the other way,
but I tell Amelia that the time series varies by time of day (the ts
variable runs from 0 to 24) and by inserting day of week as the cs
(cross section) variable (0 through 6). While that may be
non-standard perversion of the input parameters, it seems to work
pretty well.) I have other data for the missing periods from other
detectors, so I think it makes sense to try to use Amelia rather than
simply estimating a time series model for the missing counts.
Now that I can impute counts I want to impute missing characteristics.
For example in an hour of good observation, every truck will have a
length recorded. When the detector is kaput for some reason, I want
to impute the missing average lengths along with the missing truck
counts.
The problem is that sometimes there are no observations (a true count
of zero) for a period, and so the expected length for the period is a
"true" NA, rather than just a missing variable. This is quite common;
while the trucks are *usually* in the right hand lanes, they are
sometimes are detected in the middle lanes. The middle lane detectors
therefore *usually* have a count of zero and indeterminate characteristics.
My question is how to proceed using Amelia. My naive strategy would
be to run Amelia once to impute the counts, and then run Amelia again
for each imputation (5 times), for the characteristics of the vehicles
(as a non-time dependent imputation) *only* for the non-zero periods
and lanes, and then use Zelig to compute average lengths. Does this
make sense, or have I crossed the line from imputation to imagination?
My other thought would be to aggregate up to daily periods and make it
so there should never be zero counts, but I'd really like to preserve
the hourly variation in the data.
One other note: I've coded my data by observation time (with multiple
lanes of data). I could also code it as one record per lane per
observation time, which would allow me to drop zero count lanes. I
just can't see how this would help.
Any advice would be appreciated.
Regards,
James Marca