Re: [amelia] (highly) stochastic imputed data with Amelia

12 Oct 2009

Hi Dan,

For the unit specific linear trends, you can use the polynomials of
time argument (polytime = 1) and the cross-section interaction
argument (intercs = TRUE) to impute with a linear trend within each
unit. Your interpretation of how this works is right on--a time-trend
with interactions to account for between unit variation.

Cheers,
matt.

On Thu, Oct 8, 2009 at 12:19 PM, Dan Matisoff &lt;dmatisof(a)umail.iu.edu&gt; wrote:
...
  Matt -

 Thank you, this is very helpful, and it has given me a lot to think about.

 When I leave out polynomials of time the data seems to be quite stochastic
 (I'm not sure if its worse or better than with polynomials of time).  It
 does seem that the imputations are being heavily influenced by variance
 between units.  There is a lot of variation in the between unit data.
  Powerplants in the dataset vary from over a hundred years old, to new
 plants.  New plants often have negative cost data, due to capital
 depreciation schedules (which is why I can't use Bayesian priors to bound
 the data), while older plants have higher costs.  Many plants are not
 operated in certain years, or are operated at very low capacity, which leads
 them to have costs of 0.  Plant age, capacity, electricity generation, etc -
 provide a lot of explanatory power for costs, but certainly there is
 certainly a lot of variation between units.  This is why it's particularly
 important to use a MI program that accounts for unit.

 I also understand why I can't interact the cross-section with time due to
 computational resources...

 My remaining / follow-up questions are:

 Is it possible to interact a linear trend with a cross-section using Amelia?
  I'm not exactly positive what you mean... does this require estimating a
 single time trend, and interacting it with fixed effect intercepts for each
 unit?

 Kind regards,
 Dan Matisoff

 On Oct 8, 2009, at 9:12 AM, Matt Blackwell wrote:

 Hi Dan,

 I am curious what happens to these imputations when you leave out
 polynomials of time. It sounds to me like the imputations are being
 heavily influenced by the variance between units and not within units.
 Perhaps you could simply impute with fixed effects.

 The problem with adding an interaction with the cross section is that
 it adds NxT variables to the dataset, where N is the number of units
 and T is the order of the polynomials of time. You can see how this
 would add roughly 3000 variables to your model and why this would slow
 down your imputations considerably. Using just polynomials of time,
 you only add 3 (or fewer) variables to the regression.

 Perhaps you could try a linear trend interacted with the cross section.

 I hope that helps.

 On Wed, Oct 7, 2009 at 6:03 PM, Dan Matisoff &lt;dmatisof(a)umail.iu.edu&gt; wrote:

 Hi all-

 I have a (panel) dataset of about 1000 powerplants in the U.S., over 13
 years, including cost data, which includes total non-fuel expenditures,
 fixed costs, operations & maintenance expenses, and hours of operation.
 For
 each of the variables, I am missing about 20% of the observations..
 overall,
 I have about 50% complete observations, meaning, with listwise deletion I
 would lose 50% of my observations.  Thus, this seems like a perfect
 application of Amelia.

 If I were to run a simple OLS, I could predict each of the variables at an
 r
 squared of 75% to 95%, depending on whether I include lagged values.
  However, I can't use this to fill in missing data, because of the many
 missing values of predictor variable.  Again, the perfect reason to use
 Amelia.

 When I run Amelia, I am running into several problems.
 First, regardless of whether I use polynomials of time 0, 1, 2, or 3, my
 imputed dataset seems highly, highly stochastic - much more so than the
 original data.  fixed cost data is fluctuating from -$5 million one year,
 to
 $17 the next year, when all of the observed data for the same powerplant
 is
 a relatively stable  $8mil to $11mil over 8 years.  Another case, where I
 don't have observed data to compare it to - cost data varies from -$5 to
 +$2mil, to -$13mil, and then to +$11 million, all in a 4 year span! Even
 when I average the five datasets, the imputed data seems extremely
 stochastic and unrealistic, and highly dependent upon the polynomial of
 time
 I select.

 Am I doing something wrong?  Why is the imputed dataset so stochastic?

 It appears that much of the advice on the listserve suggests that one
 should
 proceed with the regressions, and not worry about the stochasticity of 20%
 of the observations; however, if I am going to be using this data as part
 of
 a dependent variable in a difference in differences model or as part of
 many
 other complex techniques - several steps down the road, after performing
 matching, etc - I would strongly prefer not to have to do each statistical
 step 5x each, and instead, come up with a reasonable dataset from which to
 proceed with my regression (and perform the regression steps once). Can I
 simply average each dataset to generate 1 useful dataset? (and again, why
 is
 the imputed data so variable?)

 Second, if I attempt to allow an individual time trend to be estimated for
 each individual, by interacting with the cross-section, my computer grinds
 for hours and never produces anything.  Once, after several hours, I got
 an
 unknown error.  Normally - with a 1, 2, or 3 time polynomial, it takes my
 computer (a Macbook OS X, with 2 gb ram, and a 2.4 ghz intel core duo
 chip)
 about 20 seconds to 1 minute to produce all five datasets.  Do I need a
 supercomputer - or am I doing something wrong?

 Thanks in advance for your help,

 Daniel Matisoff

 Indiana University
 Georgia Institute of Technology
 -
 Amelia mailing list served by Harvard-MIT Data Center
 [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
 More info about Amelia: http://gking.harvard.edu/amelia

 -
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] (highly) stochastic imputed data with Amelia