Amelia February 2008

amelia@lists.gking.harvard.edu

7 participants
7 discussions

How to handle "conditional variables" in survey data with intentional NAs?

by Marcus M. Dapp

Hello I guess that is a common problem when imputing data, but I am rather confused by it. The availability (= not-NA-ness) of my survey data looks like this (-> screenshot). It shows the number of non-NA entries for each variable in the data frame. Areas A and C have rather complete answer sets. B is white because these questions have only been asked conditional on answers given beforehand (in A). But a part of the white area (in D) -could- be imputed. How does the imputation process look like? What to do with B? I could think of two variants that are both more or less unclear to me: A) Should I cut the dataframe and impute in two steps? But how do I then reintegrate the results from the first imputation? i) remove B+D, impute A+C ii) reinsert B/D and impute D. B) How do I specify the process for one MI step without cutting? Thank you very much for your thoughts on this. Best regards, Marcus -- Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch On the shoulders of giants? http://science.creativecommons.org

16 years, 1 month

MI with categorical survey data: specifying vars as noms or ords?

by Marcus M. Dapp

Hello I am in the process of coming up with an amelia() command for my survey data, starting with a small subset to get a start and learn in the process. It is about how to decide whether to specify variables as noms or ords. When I look at my data (mostly ordered factors), I think I should use the ords specification (see at bottom) and not "noms", but unfortunately "ords" gives an error message whereas "noms" only gives warnings. -- Thus, my two questions are: 1) Which specification approach A or B is the correct (or better) one? Or, more generally, should all factor variables be "noms=" and all ordered factors be "ords="? 2) What do the warnings or the error message mean? What to do about them? Thanks for having a look at it! Marcus > t<-md[20:1000,1:12] # subset of full data frame > # First, to give you a feel for the data (output truncated) > t[1:8,1:10] # to get a feel for the data frame id camp startyr exp.pjs insp.need insp.foss insp.pcss insp.feat 20 30019 free 1990 5 0 0 0 0 21 30020 <NA> NA 1 0 0 0 0 22 30021 free 2005 1 0 0 0 0 23 30022 both 2004 1 0 0 0 0 24 30023 both 2005 1 0 0 0 0 25 30024 both 2000 <NA> 0 0 0 1 26 30025 open 2004 5 0 0 0 1 27 30026 open 1985 30 0 1 0 0 > str(t, give.attr=FALSE) 'data.frame': 981 obs. of 12 variables: $ id : int 30019 30020 30021 30022 30023 30024 30025 30026 40001 $ camp : Factor w/ 3 levels "both","free",..: 2 NA 2 1 1 1 3 3 1 3 $ startyr : int 1990 NA 2005 2004 2005 2000 2004 1985 2002 1999 ... $ exp.pjs : Ord.factor w/ 6 levels "1"<"5"<"10"<"20"<..: 2 1 1 1 1 NA $ insp.need: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 2 ... $ insp.foss: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 2 2 1 ... $ insp.pcss: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ... $ insp.feat: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 2 2 1 2 2 ... $ insp.bugr: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 2 ... $ insp.insc: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 2 2 1 ... $ insp.outc: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 1 ... $ insp.offr: Ord.factor w/ 2 levels "0"<"1": 1 2 1 1 1 2 1 1 1 1 ... ===== A) MI with variant "noms" ===== > # specifying indep. variables as "noms"... works... > t_noms<-amelia(t, p2s=2, m=5, idvars=c("id"), noms=c("camp","exp.pjs","insp.need","insp.foss","insp.pcss","insp.feat","insp.bugr","insp.insc")) > ... but gives warnings (usually 30-50, multiples of ten it seems): -- Imputation 5 -- setting up EM chain indicies 1(140) 2(0) saving and cleaning Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit warnings()) > warnings() # with English translation 50: Bedingung hat Länge > 1 und nur das erste Element wird benutzt in: if (class(x.orig[, i]) == "logical") x.imp[, i] <- as.logical(x.imp[, 50: Condition has length>1 and only the first element is used in: if (class... # Plus: I occasionally get this warning (or error message?) in some # Imputations: The resulting variance matrix was not invertible. Please check your data for highly collinear variables. ===== B) MI with variant "ords" ===== > # specifying vars as ords throws an error message: > t_ords<-amelia(t, p2s=2, m=5, idvars=c("id"), noms=c("camp"), ords=c("exp.pjs","insp.need","insp.foss","insp.pcss","insp.feat","insp.bugr","insp.insc","insp.outc","insp.offr")) amelia starting Fehler in if (any(unique(na.omit(data[, i]))%%1 != 0)) { : Fehlender Wert, wo TRUE/FALSE nötig ist # Missing value, where TRUE/FALSE needed/expected Zusätzlich: Warning message: '%%' is not meaningful for ordered factors in: Ops.ordered(unique(na.omit(data[, i])), 1) -- Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch On the shoulders of giants? http://science.creativecommons.org - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

16 years, 1 month

range values using priors still contain negative values

by Anshu Mohllajee

Hi - I have a similar question that tdavis had in April of last year. When I set range values using priors option in Amelia View, I still get values outside the range I set. For instance I can have only a range of 0 to 12 for the variable months breastfeeding during one year. So I set a minimum at 0 and max at 12 with 0.99 confidence level. However, I still get negative values and values above 12. Is there anyway to go around this?! Thanks for your help! Best, Anshu - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

16 years, 1 month

Compute results from multiply imputed dataset- STREG

by Marguerite Duponchel

Hi, I am trying to compute results from multiply imputed datasets with Amelia for a survival analysis model with a log log distribution. It seems like clarify doesn't support streg, is there any other way with Stata? Thanks, -- Marguerite ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. -- Ce message a ete verifie par MailScanner pour des virus ou des polluriels et rien de suspect n'a ete trouve. - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

16 years, 1 month

Are identification variables used in the MI calculations or not?

by Marcus M. Dapp

Hello The manual says about identification variables: "idvars : a vector of column numbers or column names that indicates identification variables. These will be dropped from the analysis but copied into the imputed datasets." The "dropping" is not 100% clear to me in an important aspect, namely, whether idvars are -considered- in the MI calculation process at all. So is a) or b) the correct answer? a) The idvars are -used- in the MI process to calculate imputations, but are themselves not modified (imputed). They are copied to the output dataset 1:1. b) The idvars are only copied and -not- (even) considered in the MI process to calculate imputations. Thank you, Marcus -- Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch On the shoulders of giants? http://science.creativecommons.org - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

16 years, 1 month

installation problem

by Steve Shewfelt

I'm new to the list and currently trying to install AmeliaView II on my new Windows Vista machine. I'm currently running on R 2.6, though the problem occurred when I had R 2.8 loaded as well. I'm a bit of a novice at both programs, so I'd be more comfortable running the GUI than running things through R. After I start R and start Amelia, I get the following message: Script: C:\Program Files\AmeliaView\lib\invisible.vbs Line: 1 Char: 1 Error: The system cannot find the file specified Code: 80070002 Source: (null) When I looked in Explorer I was able to find this file, so it is there but for some reason is not being recognized. Any suggestions? Thanks, Steve Shewfelt PhD Candidate, Political Science Yale University - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

16 years, 2 months

How to format monthly dates for Time series analysis

by Candy Day

Hi I am relatively new to R, and now trying Amelia for health indicator data I have a CSV file created from Excel with the following fields (columns): ID - autonumber identifier field Prov - name of the province (geographic area) for use as cross-section - this seems to work whether it is a string such as province acronym or whether coded to a number, although it doesn't work if I set the variable type to 'nominal' only if I leave it as 'no transformation' Month - this field is currently formatted as ISO dates, but this is being seen by Amelia (using AmeliaView) as a factor and is not working for doing time series with cross-section analysis eg of data 2006-12-01 2007-01-01 2007-02-01 2007-03-01 Value - the actual data values for the indicator as numbers - this is the field that has some missing data values that I want imputed. The only way I have been able to get this to work is to get Excel to convert the Month variable to a number - you then get data that looks like this 39052.00 39083.00 39114.00 39142.00 but I'm not sure if this is the correct way to do it and whether Amelia will understand the date/time significance. Otherwise how else must one format date variables so that they are correctly interpreted? Regards Candy -- Candy Day HealthLink, Health Systems Trust http://www.hst.org.za Mobile: 084 960 9014 Fax: 086 524 9563

16 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia February 2008