Dear Ameliaists,
Trying to install the latest version of AmeliaView on my laptop I get an error message at the "Choose install location" stage of the setup wizard, telling me that the R directory is incorrect.
I am not an R user but have R (version 2.12.0) installed and it seems to be working fine. Locating the R directory manually at this stage of the setup wizard does not get me past this stage either. The R directory is C:\Program Files\R\R-2.12.0.
I would be grateful for any hints as to where I'm going wrong.
Best wishes,
Patrick
***
University of Aberdeen
Department of Politics and International Relations
Edward Wright Building
Dunbar Street
Aberdeen, AB24 3QY
United Kingdom
Phone: +44 (0)1224 272720
Fax: +44 (0)1224 27 2552
E-mail: p.bernhagen(a)abdn.ac.uk
Web: http://www.abdn.ac.uk/~pol209/
The University of Aberdeen is a charity registered in Scotland, No SC013683.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Hi,
I'm experimenting for the first time with MI and Amelia so I apologize if I'm
missing something obvious. I'm also perhaps trying to do something that is
unfeasible and/or inadvisable. I'm trying to do MI for a nominal variable with
many possible values, and many of those values are very uncommon. In certain
cases Amelia is giving me results that are highly suspicious. In particular, it
seems to be greatly reducing the probability of imputing the most common value,
and at times dropping this value completely. In other words, value "b" accounts
for 85% of the complete cases, and yet not a single one of the imputed values is
assigned "b" in any of the five sets of imputations. This doesn't seem right.
Here are more details about the specifics of what I'm trying to do. I'm looking
to do a rough approximation of an MI approach covered in the following paper:
Clogg, C.C., D.B. Rubin, Nathaniel Schenker, Bradley Schultz, and Lynn Weidman.
1991. "Multiple Imputation of Industry and Occupation Codes in Census Public-use
Samples Using Bayesian Logistic Regression." Journal of the American Statistical
Association 86:68–78. http://www.jstor.org/stable/2289716.
The authors used a sub-sample of Census observations that were double-coded
under both the 1970 and 1980 occupation coding schemes to multiply impute 1980
occupation codes for the entire 1970 Census. I'm looking to do a similar thing
but for the 1990 to 2000 change in occupation coding schemes. Clogg & al's
approach was to tackle each 1970 occupation code separately. So, for instance,
they would take all observations with 1970 occupation "funeral director" and
make this a separate sample (the sample would include both double-coded funeral
directors (complete cases) and those without 1980 codes (missing values)). They
examined the variety of 1980 occupation codes that were assigned to the "funeral
directors" in the double-coded dataset, and used observed characteristics (sex,
education, industry, etc) to impute 1980 occupation codes for those funeral
directors that were not double-coded. I'm looking to do a similar procedure, but
assigning 1990 occupation codes to observations with only 2000 codes. I have a
large sample of double-coded observations.
The challenge is that some occupations have a very large number of possible 1990
codes. For instance, I have 7,463 "chief executives" in my double-coded dataset,
and they are assigned to 183 different 1990 occupation codes. Most of these 183
codes are very uncommon, though, and over 75% of the double-coded observations
are assigned to a single code of "managers n.e.c.". When I use Amelia to do MI
and impute 1990 occupation codes for the "chief executives" in my dataset,
though, not a single observation in any of the five imputations is assigned the
"managers n.e.c." code. Instead they are distributed across pretty much every
code except the "managers n.e.c" code.
I think this has to do with very large number of possible values being imputed
in this nominal variable. Similar cases where there are a large number of
possible values tend to either have the same problem (no imputations at all of
the most common value) or it vastly under-represents the most common category
(e.g. 96% of the double-coded dataset has a particular code but only 22% of the
imputed values do). Cases where the number of possible codes are small seem to
have distributions that are more similar between the complete (double-coded) and
imputed values.
Does this have to do with how nominal variables are treated within Amelia? The
documentation indicates that nominal variables are transformed into a set of
dummy variables for the MI process, and then converted back to a nominal
variable at the end. Does the transformation to the set of dummy variables leave
the most common value as the omitted group? Is it possible that each of the
dummy variables is given a slightly higher probability than it should, so that
by the time it gets to the omitted group it's much less likely to be imputed
than it should be?
These are only vague guesses. As I said, I realize that trying to impute a
nominal variable with so many possible values is quite unusual, but at the same
time I am trying to use it for an application for which MI was originally
developed.
Any thoughts, advice, or criticism would be greatly appreciated. I am happy to
provide a sample dataset (just 200k) that demonstrates this problem.
Thank you for your help,
Matissa Hollister
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia