Amelia September 2015

amelia@lists.gking.harvard.edu

4 participants
4 discussions

Error: contrasts can be applied only to factors with 2 or more levels

by Nandana Sengupta

Hi, I am running Amelia on a cleaned up subset of a single round of the National Longitudnal Survey of Youth data. The subset used in the analysis has 64 variables. I only kept factor variables with 2 to 10 levels in the subset, but when I run the amelia command I get the following error: **** Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels **** On the other hand the amelia code runs if I specify "incheck = FALSE" in the command options. I am also able to do multiple imputation on this dataset using other R software but am keen to work with Amelia, since I want to utilize the TSCS aspect of multiple imputation that the software provides. Could you shed some light on why I am seeing the error above and also if the imputation with errors suppressed is reliable? Thanks, Nandana Sengupta University of Chicago, Knowledge Lab

8 years, 6 months

Output Log AmeliaView: Clearing

by simon

Dear list & Matt, After running a successful imputation in AmeliaView I wish to produce a further few sets of (5) imputations and view the corresponding Diagnostics for each set. I wish to do this without exiting & restarting AmeliaView. If I do restart AmeliaView and run an (apparantly) identically specified set of imputations on identical data I notice that for each set of imputations the corresponding diagnostic plots vary slightly. I assume this is normal? However after running a second and subsequent set of imputations the Output log does not seem to change. Without restarting AmeliaView I change the output directory before each fresh set of imputations; AmeliaView behaves well, producing fresh sets of (5) imputations and saving them to the allocated directory (csv files). However the log does not appear to change and the diagnostic plots appear identical to those produced for the first set of imputations. Am I missing something please? many thanks Simon UK

8 years, 7 months

Imputation when (almost) all variables are missing for a given observation

by Trevor Lyons

Hello, I have TSCS data for 540 Brazilian municipalities from 1992 to 2012 collected from the national Treasury Department related to local budget receipts and expenditures. I have around 15-20 variables of interest, including total receipts and expenditures which are then broken down into categories such as "own-source tax revenue" or "spending on health and sanitation." With extremely few exceptions, the data are either entirely missing or all present, which occurs in 427 of my 8,000+ observations. Approximately 40% of municipalities have at least one missing year in the time series (13% total missing only one year, 10% missing two, 6% missing three, and tapering off with only 11 of 560 missing more than 6). There are two clear patterns that I've identified 1) There is a marked spike for 1998-1999, with over 90 missing for both, and almost always a municipality that is missing one of these two years is missing the other. Besides this, there are rarely any consecutive runs of missing values, and these 90 municipalities are no more or less likely to have missing observations outside of this time period. 2) The municipalities with more than three missing are clustered in a handful of states frequently associated with poverty and/or corruption The reasoning behind why a given year is missing is relevant - it means that they failed to turn over their annual accounts data to the federal government as required by law. As of 2001, there are even (in theory) sanctions for not providing this data that could lead to the withholding of grants or the removal of a mayor's ability to run for any elected office for the following eight years, although this is only sporadically enforced. There is a corresponding drop in the average number of missing observations after this point, with the very real possibility that at least the post 2001 missing data are related to administrative improprieties. My biggest bind is that there are exceedingly few other reliably available annual data at the municipal level up until 1999, leaving me with population size and age distribution as my only continuously changing controls for this time period. I am running a fixed effects model to test for the effect of a particular policy that was implemented in different cities beginning in 1989, and if I were to consider only the budgetary data from 1999 onwards I would have no "pre-implementation" observations for over half of the implementing municipalities. This leaves me highly dependent on properly addressing the missing data issue, but I am still a little unclear as to how I should use AMELIA in this particular situation. If I were just working with, say, 2000-2012, then I think I understand all the necessary steps. Most of the budgetary variables have a fairly stable tendency to increase over time, with only certain categories of taxes and spending categories going through drastic annual changes. I used a polynomial time trend unique to each unit, using "polytime = 2" and " intercs = TRUE". Looking at different graphs as diagnostics, the imputations seem to perform reasonably well, but I'm sure I have missed something. My questions are: 1) Is there a specific procedure when nearly all of the variables are missing for a given observation, assuming that the few continuous variables present satisfy the MAR assumption? 2) To improve both the strength of the imputation as well as the MAR assumption behind it, can I use variables that are only measured at a few points in time (i.e. decennial census data) or that are time-constant? 3) Is it better to perform the imputations for all of the missing variables at once, or should they be done incrementally? 4) Is this all a fool's errand because it is unrealistic to assume my data are MAR? Thank you, Trevor

8 years, 7 months

Issue with subset/gold.standard language in moPrep

by Sean Kates

I've been attempting to work with Amelia to multiply overimpute values for a specific variable that have been measured with error. I know which observations were measured with error (they all have a value in a particular range) but I cannot use moPrep to properly prepare the data for overimputation. I want to use the "subset" and "gold.standard" arguments, but depending on how I write these, some error is always returned. I cannot find a simple example of sample code that uses this means of prepping the data, rather than specifying the error proportion or sd (which I don't have/know). Can you either point me to an example, or suggest a simple line of code, using the following setup: dataframe= data variable with error: A subset with error: where A<5 Thanks for any help or guidance you can give. All the best, Sean

8 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia September 2015