Dear fellow Amelia users,
For my PhD project I am currently working on a pooled cross-sectional time-series dataset
which contains many missing values. Since Im new to Amelia, I have a couple of questions
on the use of the program. I also tried to find the answers to my questions in the
archive, but some issues are still unclear to me and I hope some of you could help me out.
My questions are as follows:
1) My dataset on political parties spans the 1984-2006 period and I only have observations
for 1984, 1988, 1992, 1996, 1999, 2002 and 2006, as data is only gathered around
elections. Altogether this entails that I have far more missing values than observations.
Would you still recommend using Amelia with such an enormous degree of missingness?
All my other questions deal with possible violations of the criterion that the imputation
model should include at least as much information as will be used in the analysis model.
This is stressed in the Amelia manual (page 10) and journal article.
2) My analysis model contains interaction-effects and Euclidian distance measures, I
understand that I also have to add these to the imputation model. However, the consequence
of this approach is that I end up with imputed interaction-effects and Euclidian distance
measures that dont make any sense, as Amelia does not know how these variables are
constructed. For example: in my analysis model, the interaction effect C is meant to be A
multiplied by B, but the Amelia algorithm will replace missing values of C by something
different than A*B. Since this fundamentally alters the goal of my analysis, I wonder
whether it is also allowed to transform the data AFTER running Amelia. In case of the
example above, this would imply that I would only include A and B in my imputation model
and compute interaction-effect C myself after running Amelia. Is this a good procedure, or
would it bias my results?
3)In my analysis model I focus on lagged or first-differenced effects of my X variables on
Y. Do I also have to make these transformations before I run the imputation model, or can
I lag/take first differences of my variables after running Amelia? The latter would be
much more practical to me, because I always have regular 3-4 year intervals of missingness
between 2 datapoints in my time-series, which means that I will be unable to take first
differences of any variable before running the imputation model (as this will generate a
variable that is always missing).
4)My final question is whether it is allowed to add new data to the analysis model after
the data has been imputed by Amelia. I want to do this, because I would like to merge
parties with their supporters on the basis of left-right positions. However, in order to
know the left-right positions of the parties for every year, I first have to impute the
missing data on the parties left-right positions. After running these imputations I have
all the information I need in order to merge the parties with another dataset that
contains information on their supporters. Do I want to do this, or would I again violate
the assumption that the analysis model must contain the same information as the imputation
model?
I hope my questions make sense (Im sorry in case they dont) and hope that some of you
have any advice. Your help is very much appreciated. Let me know if anything is unclear,
so I can clarify it.
Best regards,
Marc