Your expertise - Amelia

23 May 2012

Dear Amelia developers,

Because of the big number of missings in my dataset and since I further plan evaluation of
counterfactuals, I wanted to use Amelia software.

In my study I am exploring the relationship between oil income and corruption. I use
corruption data from the ICRG dataset. This data ranges from 0 to 6. However, I have to
mention that there are many in-between values (92 unique values in the current version of
my dataset).  Nevertheless, since most of the observations fell into a limited number of
categories I feel obliged to use an ordered logit model. For the statistical analysis with
ordered logit model I have rounded original corruption data. 

In my dataset I have 3640 observations and 117 variables. This is a panel data with 167
countries for 23 years. As I tried to import my data into Amelia there were several error
messages and imputation process was broken. I therefore dropped all “unnecessary”
variables. By now there are 21 variables left in the dataset.  

(1)	For Amelia imputations I employed “raw” corruption data, i.e. without rounding it
beforehand. At the same time, I tried to declare corruption as ordinal data. As a result,
there was an error message in Amelia output. I therefore imported corruption data into
Amelia as a continuous one. The imputation output has shown that now this corruption data
ranges from [-.99700195 to 8.0574379]. I am afraid that I cannot use this data for my
analysis because it varies from the original corruption data range. Would you recommend me
that I round corruption data before I begin with Amelia imputations?

(2)	My main independent variable is oil income per capita. I took log of oil income before
imputation. As result of this transformation, the number of missing values increased from
54 to 1663 (mainly because of the zero values for oil income). Imputation ran successfully
and there are no missing values in the dataset. However, data was imputed for
non-oil-producing countries as well. Is it possible to limit the imputation process only
for oil-producing countries? I have a similar problem with the data for the incomes for
other resources (e.g. gas and coal).

(3)	If I would leave only oil-producing countries in my dataset before imputation, I
suppose it might cause another problem. There are also other quantities of interest, both
for oil- and non-oil-producing countries, for which I want to make imputations. Would it
make any sense to make imputation with different subsets of my original dataset and merge
them afterwards?

Thank you very much in advance for your help.

Best regards, 
Nurjamal Omurkanova

-- 
Nurjamal Omurkanova M.A.
Department of Politics and Management
University of Konstanz
Room D 229
P.O. Box 86
D-78457 Konstanz
Germany
Phone: +49-7531-88-2311
Fax: +49-7531-882774
nurjamal.omurkanova(a)uni-konstanz.de
http://www.polver.uni-konstanz.de/en/gschneider/members-of-staff/nurjamal-o…