Amelia October 2010

amelia@lists.gking.harvard.edu

4 participants
2 discussions

by Bernhagen, Patrick

Dear Ameliaists, Trying to install the latest version of AmeliaView on my laptop I get an error message at the "Choose install location" stage of the setup wizard, telling me that the R directory is incorrect. I am not an R user but have R (version 2.12.0) installed and it seems to be working fine. Locating the R directory manually at this stage of the setup wizard does not get me past this stage either. The R directory is C:\Program Files\R\R-2.12.0. I would be grateful for any hints as to where I'm going wrong. Best wishes, Patrick *** University of Aberdeen Department of Politics and International Relations Edward Wright Building Dunbar Street Aberdeen, AB24 3QY United Kingdom Phone: +44 (0)1224 272720 Fax: +44 (0)1224 27 2552 E-mail: p.bernhagen(a)abdn.ac.uk Web: http://www.abdn.ac.uk/~pol209/ The University of Aberdeen is a charity registered in Scotland, No SC013683. - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

13 years, 6 months

Losing the most common value in nominal imputation

by Matissa Hollister

Hi, I'm experimenting for the first time with MI and Amelia so I apologize if I'm missing something obvious. I'm also perhaps trying to do something that is unfeasible and/or inadvisable. I'm trying to do MI for a nominal variable with many possible values, and many of those values are very uncommon. In certain cases Amelia is giving me results that are highly suspicious. In particular, it seems to be greatly reducing the probability of imputing the most common value, and at times dropping this value completely. In other words, value "b" accounts for 85% of the complete cases, and yet not a single one of the imputed values is assigned "b" in any of the five sets of imputations. This doesn't seem right. Here are more details about the specifics of what I'm trying to do. I'm looking to do a rough approximation of an MI approach covered in the following paper: Clogg, C.C., D.B. Rubin, Nathaniel Schenker, Bradley Schultz, and Lynn Weidman. 1991. "Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression." Journal of the American Statistical Association 86:68–78. http://www.jstor.org/stable/2289716. The authors used a sub-sample of Census observations that were double-coded under both the 1970 and 1980 occupation coding schemes to multiply impute 1980 occupation codes for the entire 1970 Census. I'm looking to do a similar thing but for the 1990 to 2000 change in occupation coding schemes. Clogg & al's approach was to tackle each 1970 occupation code separately. So, for instance, they would take all observations with 1970 occupation "funeral director" and make this a separate sample (the sample would include both double-coded funeral directors (complete cases) and those without 1980 codes (missing values)). They examined the variety of 1980 occupation codes that were assigned to the "funeral directors" in the double-coded dataset, and used observed characteristics (sex, education, industry, etc) to impute 1980 occupation codes for those funeral directors that were not double-coded. I'm looking to do a similar procedure, but assigning 1990 occupation codes to observations with only 2000 codes. I have a large sample of double-coded observations. The challenge is that some occupations have a very large number of possible 1990 codes. For instance, I have 7,463 "chief executives" in my double-coded dataset, and they are assigned to 183 different 1990 occupation codes. Most of these 183 codes are very uncommon, though, and over 75% of the double-coded observations are assigned to a single code of "managers n.e.c.". When I use Amelia to do MI and impute 1990 occupation codes for the "chief executives" in my dataset, though, not a single observation in any of the five imputations is assigned the "managers n.e.c." code. Instead they are distributed across pretty much every code except the "managers n.e.c" code. I think this has to do with very large number of possible values being imputed in this nominal variable. Similar cases where there are a large number of possible values tend to either have the same problem (no imputations at all of the most common value) or it vastly under-represents the most common category (e.g. 96% of the double-coded dataset has a particular code but only 22% of the imputed values do). Cases where the number of possible codes are small seem to have distributions that are more similar between the complete (double-coded) and imputed values. Does this have to do with how nominal variables are treated within Amelia? The documentation indicates that nominal variables are transformed into a set of dummy variables for the MI process, and then converted back to a nominal variable at the end. Does the transformation to the set of dummy variables leave the most common value as the omitted group? Is it possible that each of the dummy variables is given a slightly higher probability than it should, so that by the time it gets to the omitted group it's much less likely to be imputed than it should be? These are only vague guesses. As I said, I realize that trying to impute a nominal variable with so many possible values is quite unusual, but at the same time I am trying to use it for an application for which MI was originally developed. Any thoughts, advice, or criticism would be greatly appreciated. I am happy to provide a sample dataset (just 200k) that demonstrates this problem. Thank you for your help, Matissa Hollister - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

13 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia October 2010