below...
Gary
--
*Gary King* - Albert J. Weatherhead III University Professor - Director,
IQSS - Harvard University
GKing.Harvard.edu <http://gking.harvard.edu/> - King(a)Harvard.edu -
@kinggary<http://twitter.com/kinggary>- 617-500-7570 - Asst 495-9271 -
Fax 812-8581
On Wed, Sep 29, 2010 at 8:01 AM, Wardt, Marc van de <M.vandeWardt(a)uva.nl>wrote;wrote:
Dear fellow Amelia users,
For my PhD project I am currently working on a pooled cross-sectional
time-series dataset which contains many missing values. Since I’m new to
Amelia, I have a couple of questions on the use of the program. I also tried
to find the answers to my questions in the archive, but some issues are
still unclear to me and I hope some of you could help me out. My questions
are as follows:
1) My dataset on political parties spans the 1984-2006 period and I only
have observations for 1984, 1988, 1992, 1996, 1999, 2002 and 2006, as data
is only gathered around elections. Altogether this entails that I have far
more missing values than observations. Would you still recommend using
Amelia with such an enormous degree of missingness?
if you do anything, yes. but we offer no guarantee that there's enough info
in your data to make the inferences you want to make!
All my other questions deal with possible violations of the criterion that
the imputation model should include at least as much information as will be
used in the analysis model. This is stressed in the Amelia manual (page 10)
and journal article.
2) My analysis model contains interaction-effects and Euclidian distance
measures, I understand that I also have to add these to the imputation
model. However, the consequence of this approach is that I end up with
imputed interaction-effects and Euclidian distance measures that don’t make
any sense, as Amelia does not know how these variables are constructed. For
example: in my analysis model, the interaction effect C is meant to be A
multiplied by B, but the Amelia algorithm will replace missing values of C
by something different than A*B. Since this fundamentally alters the goal of
my analysis, I wonder whether it is also allowed to transform the data AFTER
running Amelia. In case of the example above, this would imply that I would
only include A and B in my imputation model and compute interaction-effect C
myself after running Amelia. Is this a good procedure, or would it bias my
results?
yes, its a good idea. impute the basic variables, including nonlinearities
such as interactions. but then after you can make things consistent by
discarding the imputed interactions and recreating them from the imputed
main effects.
3)In my analysis model I focus on lagged or first-differenced effects of my
X variables on Y. Do I also have to make these transformations before I run
the imputation model, or can I lag/take first differences of my variables
after running Amelia? The latter would be much more practical to me, because
I always have regular 3-4 year intervals of missingness between 2 datapoints
in my time-series, which means that I will be unable to take first
differences of any variable before running the imputation model (as this
will generate a variable that is always missing).
in principle yes, but at some point given the relatively little info you
have you will have to just make some assumptions and leave some stuff out.
4)My final question is whether it is allowed to add new data to the
analysis model after the data has been imputed by Amelia. I want to do this,
because I would like to merge parties with their supporters on the basis of
left-right positions. However, in order to know the left-right positions of
the parties for every year, I first have to impute the missing data on the
parties’ left-right positions. After running these imputations I have all
the information I need in order to merge the parties with another dataset
that contains information on their supporters. Do I want to do this, or
would I again violate the assumption that the analysis model must contain
the same information as the imputation model?
it would be better in principle to include the other data set in there and
make the imputations with both if possible. but the same qualification
above applies about the lack of info. its also reasonable to use the
imputations to do other things, including using other data, to calculate
some specific quantity of interest.
I hope my questions make sense (I’m sorry in case they don’t) and hope that
some of you have any advice. Your help is very much appreciated. Let me know
if anything is unclear, so I can clarify it.
Best regards,
Marc