Re: [amelia] question about characteristics of counted events

24 Sep 2010

On Fri, Sep 24, 2010 at 04:23:55PM -0400, Matt Blackwell wrote:
...
   I have two
follow up questions.  First I tried to use tscsPlot to
 compare before and after setting the "zero-count weight estimates" to
 NA, and it didn't appear to do the right thing.  For example, on
 Mondays in 2008 for a site, there should have been no imputed truck
 weights at all after NA-ing the zero count records, but I see lots in
 that day's tscsPlot.  Aside from cutting, pasting and hacking, is
 there a way to get tscsPlot to plot the segment bars with the non-NA
 imputed estimates and the observed?  
 Unfortunately, the tscsPlot works by taking a bunch of draws from the
 imputation model, so it wouldn't know that certain observation really
 shouldn't be imputed. We might build in a feature to mark observations
 as "truly" missing, which would also make tscsPlot work for you, but
 this is still in development.

I got that by looking at the code after I sent the email.  It was
actually quite instructive to read the code and see what was being
done...now I have more of a clue of what Amelia is doing...estimating
the parameters of the distribution with the EM chains, and _then_
doing the random draws.  This explains the visual "pause" (for lack of
a better term) when the EM chains converge and the next one
starts...it is doing the random draws, and if I put a hard bound at
zero sometimes it does 1000 draws or whatever  before going with zero.

...

 I can set the max chain length with emburn, but I'm not sure a priori
 how long to set it (200 is okay, 2000 is too slow).  I also see there
 is a "tolerance" argument, but I'm not sure how to use it or what it
 means.  Is it better to leave tolerance alone and just cut off the
 chains, or up the tolerance to something like 0.01?

 Cutting off the chains is problematic because you cannot be sure that
 the parameters that you care about have converged. One idea might be
 to simply use totals instead of averages and then set those
 "structurally" missing cells to zero. 
Yeah, that is what I was thinking---not converged is not
converged. 

Also, as you suggest using totals instead of averages is the way
to go!  I think makes more sense and definitely runs faster.  I am now
imputing the sum total of all truck weights (and so on) observed in
the period and lane.  My concern was that this would introduce a stair
step kind of pattern to the data, but I guess the randomness of the
data plus the robustness of Amelia combine to make this not an issue
at all.

Regards,
James

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] question about characteristics of counted events