Re: [amelia] question about characteristics of counted events

14 Sep 2010

Hi James,

A few thoughts. If the truck length variables changes over time, then
you can impute it along with the counts in the same Amelia run. If
there is no empirical time dependence, then Amelia will not use time
to impute the truck lengths.

As for the truly missing truck lengths. You can always go back into
your imputed data and manually code those as missing. R code would
look something like:

a.out <- amelia(***your call here***)

for (i in 1:length(a.out$imputations)) {
  mask <- a.out$imputations[[i]]$count == 0
  is.na(a.out$imputations[[i]]) <- mask
}

You'll want to double check that it works (I haven't tested it). The
reason why this will work is that the imputed cell has not added any
information to the data itself, it has only added the information from
the observed values of the cell. Thus, omitting that observation from
the imputation entirely would bias the imputation.

I hope that helps.

Cheers,
matt.

On Tue, Sep 14, 2010 at 1:03 AM, James Marca
&lt;jmarca(a)translab.its.uci.edu&gt; wrote:
...
  Hi,

 I have a data set that is based on observations of vehicles by lane.
 For example, each truck that passes the detector will be counted, and
 its characteristics recorded (length, weight etc).  By summing up the
 counts into higher time periods, say an hour, I can use Amelia to
 impute missing counts of vehicles (statisticians look the other way,
 but I tell Amelia that the time series varies by time of day (the ts
 variable runs from 0 to 24) and by inserting day of week as the cs
 (cross section) variable (0 through 6).  While that may be
 non-standard perversion of the input parameters, it seems to work
 pretty well.)  I have other data for the missing periods from other
 detectors, so I think it makes sense to try to use Amelia rather than
 simply estimating a time series model for the missing counts.

 Now that I can impute counts I want to impute missing characteristics.
 For example in an hour of good observation, every truck will have a
 length recorded.  When the detector is kaput for some reason, I want
 to impute the missing average lengths along with the missing truck
 counts.

 The problem is that sometimes there are no observations (a true count
 of zero) for a period, and so the expected length for the period is a
 "true" NA, rather than just a missing variable.  This is quite common;
 while the trucks are *usually* in the right hand lanes, they are
 sometimes are detected in the middle lanes.  The middle lane detectors
 therefore *usually* have a count of zero and indeterminate characteristics.

 My question is how to proceed using Amelia.  My naive strategy would
 be to run Amelia once to impute the counts, and then run Amelia again
 for each imputation (5 times), for the characteristics of the vehicles
 (as a non-time dependent imputation) *only* for the non-zero periods
 and lanes, and then use Zelig to compute average lengths.  Does this
 make sense, or have I crossed the line from imputation to imagination?

 My other thought would be to aggregate up to daily periods and make it
 so there should never be zero counts, but I'd really like to preserve
 the hourly variation in the data.

 One other note: I've coded my data by observation time (with multiple
 lanes of data).  I could also code it as one record per lane per
 observation time, which would allow me to drop zero count lanes.  I
 just can't see how this would help.

 Any advice would be appreciated.

 Regards,
 James Marca
 -
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] question about characteristics of counted events