below...
Gary
--
Gary KingAlbert J. Weatherhead III University Professor - Director, IQSS - Harvard University
GKing.Harvard.edu - King@Harvard.edu - @kinggary - 617-500-7570 - Asst 495-9271 - Fax 812-8581



On Wed, Sep 29, 2010 at 8:01 AM, Wardt, Marc van de <M.vandeWardt@uva.nl> wrote:


Dear fellow Amelia users,

For my PhD project I am currently working on a pooled cross-sectional time-series dataset which contains many missing values. Since I’m new to Amelia, I have a couple of questions on the use of the program. I also tried to find the answers to my questions in the archive, but some issues are still unclear to me and I hope some of you could help me out. My questions are as follows:

1) My dataset on political parties spans the 1984-2006 period and I only have observations for 1984, 1988, 1992, 1996, 1999, 2002 and 2006, as data is only gathered around elections. Altogether this entails that I have far more missing values than observations. Would you still recommend using Amelia with such an enormous degree of missingness?


if you do anything, yes.  but we offer no guarantee that there's enough info in your data to make the inferences you want to make!  
 


All my other questions deal with possible violations of the criterion that the imputation model should include at least as much information as will be used in the analysis model. This is stressed in the Amelia manual (page 10) and journal article.

2) My analysis model contains interaction-effects and Euclidian distance measures, I understand that I also have to add these to the imputation model. However, the consequence of this approach is that I end up with imputed interaction-effects and Euclidian distance measures that don’t make any sense, as Amelia does not know how these variables are constructed. For example: in my analysis model, the interaction effect C is meant to be A multiplied by B, but the Amelia algorithm will replace missing values of C by something different than A*B. Since this fundamentally alters the goal of my analysis, I wonder whether it is also allowed to transform the data AFTER running Amelia. In case of the example above, this would imply that I would only include A and B in my imputation model and compute interaction-effect C myself after running Amelia. Is this a good procedure, or would it bias my results?


yes, its a good idea.  impute the basic variables, including nonlinearities such as interactions.  but then after you can make things consistent by discarding the imputed interactions and recreating them from the imputed main effects.
 


3)In my analysis model I focus on lagged or first-differenced effects of my X variables on Y. Do I also have to make these transformations before I run the imputation model, or can I lag/take first differences of my variables after running Amelia? The latter would be much more practical to me, because I always have regular 3-4 year intervals of missingness between 2 datapoints in my time-series, which means that I will be unable to take first differences of any variable before running the imputation model (as this will generate a variable that is always missing).


in principle yes, but at some point given the relatively little info you have you will have to just make some assumptions and leave some stuff out.


4)My final question is whether it is allowed to add new data to the analysis model after the data has been imputed by Amelia. I want to do this, because I would like to merge parties with their supporters on the basis of left-right positions. However, in order to know the left-right positions of the parties for every year, I first have to impute the missing data on the parties’ left-right positions. After running these imputations I have all the information I need in order to merge the parties with another dataset that contains information on their supporters. Do I want to do this, or would I again violate the assumption that the analysis model must contain the same information as the imputation model?    


it would be better in principle to include the other data set in there and make the imputations with both if possible.  but the same qualification above applies about the lack of info.  its also reasonable to use the imputations to do other things, including using other data, to calculate some specific quantity of interest.
 


I hope my questions make sense (I’m sorry in case they don’t) and hope that some of you have any advice. Your help is very much appreciated. Let me know if anything is unclear, so I can clarify it.

Best regards,
Marc