Amelia

amelia@lists.gking.harvard.edu

669 discussions

by Drew Altschul

Hello, I have a dataset with a very large number of columns (>10000) and fewer rows, but still a lot (>4000). I want to impute some data with Amelia for use in Zelig. I know what variables I am interested in using in the regressions, but there are a number of missing values, so I was hoping I could leverage the rest of the dataset to come up with imputations for the variables of interest. Hence, I only want to impute missing values in a few columns of my dataset, rather than spending loads of time imputing all the other values which I don't need and getting error 34 trying to do this. Problem is that I can't figure out how to exclude columns from being imputed without excluding them from the entire analysis. I guessed this would be straight-forward, but I searched and searched and came up with nothing, so if anyone can help me sort this out, I would very much appreciate it. Thanks, Drew The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

7 years

Amelia - how to set up how many lags and leads I want to take into account

by Tomáš Kudláček

Dear list members, I have a question concerning lags and leads - it is easy to tell Amelia for which variables I want to include lags and leads. But how many lags and leads does it take into account? Is it customizable? Can only specific leads be specified, e.g. 3rn, 5th and 10th lead? Thank you very much in advance! Tom -- Bc. Tomáš Kudláček Junior Researcher Phytophthora Research Centre CZ.02.1.01/0.0/0.0/ 15_003/0000453 Department of Forest Protection and Wildlife Management Faculty of Forestry and Wood Technology Mendel University in Brno Zemědělská 3, Brno, 613 00 Tel: +420 775680314 Web: www.mendelu.cz

7 years

tscsPlot error: Unsupported use of matrix or array for column indexing

by Jon Zadra

Hello, I am new to amelia and have been unable to find any info on the error I am receiving: *> tscsPlot(bd.am.poly1, cs = "Fraud", var = "n")* *Error: Unsupported use of matrix or array for column indexing* The data is 3 variables: a date var used for the ts, a grouping var used for cs, and a count var, "n", which is my outcome variable that requries imputation. My initial amelia call looks like this (note that it wasn't clear if amelia supported datetime objects, so I converted to seconds since epoch - if it can handle datetime objects and this conversion is unnecessary, it would be great to know that!): *>bd.am.poly1 <- amelia(bdsum, idvars = "date", ts = "datenum", cs = "call.type", polytime = 1, intercs = T)* Here's a sample of the data: date call.type n datenum <date> <fctr> <dbl> <dbl> 1 2016-08-17 Criminal Mis C NA 17030 2 2014-02-10 TRAFFB-Traffic Complaint NA 16111 3 2015-11-03 WEAPO1B-Weapon I/P 1 16742 4 2016-09-03 Disturbance 0 17047 5 2015-02-24 Assault 0 16490 6 2014-08-06 Unknown Problem 0 16288 7 2014-08-28 INJACC3B-Injury Accident C 0 16310 8 2015-06-11 Recovered Stolen Prop 0 16597 9 2016-02-15 Lost Property 0 16846 10 2015-05-09 Welfare Check 2 16564 Thanks much, Jon --- Jonathan Zadra, PhD Data Scientist Sorenson Impact Center David Eccles School of Business, University of Utah www.sorensonimpactcenter.com --

7 years

Imputing Don't Know, Not Applicable Responses

by Gu Li

Dear list members, I realize that this is perhaps more of a conceptual issue than a practical one, but I wonder how would you deal with survey responses such as "don't know" or "not applicable." Specifically: (1) Do you regard "don't know" and "not applicable" as missing? (2) If not, do you regard them as valid responses as other options (e.g., a scale of 1 to 7), and use all these values to impute missing data? That is, if someone did not answer this item, the imputed value could be don't know, not applicable, or any value from 1 to 7. If this is the correct approach, how to do it in Amelia or other software? (3) Is it possible to only impute the "true" missing data (i.e., not for "don't know" or "not applicable" responses), with valid responses from 1 to 7 in Amelia or other software? (Listwise removing participants who select "don't know" or "not applicable" in one variable before imputing is not a good idea because those participants may contribute to MAR/MCAR missing in other variables.) (4) Are there other approaches to deal with "don't know" or "not applicable" responses? Many thanks for your help! Gu -- Gu Li, PhD Visiting International Research Scholar University of British Columbia E-mail: guli(a)alumni.ubc.ca; ligu.sysu(a)gmail.com

7 years

Amelia Development Version not installing

by Nandana Sengupta

Hi, I have been using the development version of Amelia for the past few months and finding it very useful (Thanks Matt!). However since the last two days an installation error pops up when trying to install using the following command: install.packages("Amelia", repos="http://r.iq.harvard.edu", type = "source") The error: Warning: unable to access index for repository http://r.iq.harvard.edu/src/contrib: cannot open URL 'http://r.iq.harvard.edu/src/contrib/PACKAGES' Warning message: package ‘Amelia’ is not available (for R version 3.2.3) Error in library("Amelia") : there is no package called ‘Amelia’ Execution halted Would be great if the permissions could be restored so we can continue using the development version. Thanks a lot, Nandana Sengupta University of Chicago, Knowledge Lab ________________________________ From: Matt Blackwell [mblackwell(a)gov.harvard.edu] Sent: Tuesday, September 29, 2015 9:48 PM To: Nandana Sengupta; amelia(a)lists.gking.harvard.edu Subject: Re: [amelia] Error: contrasts can be applied only to factors with 2 or more levels Hi Nandana, This is a bug in the current version of Amelia that occurs when listwise deletion eliminates all observations. We have a fix for it in the development version of Amelia. You can find installation instructions for that version here: https://github.com/IQSS/Amelia We will be submitting this development version to CRAN in the upcoming weeks. Cheers, Matt ~~~~~~~~~~~ Matthew Blackwell Assistant Professor of Government Harvard University url: http://www.mattblackwell.org On Mon, Sep 28, 2015 at 12:43 PM Nandana Sengupta <nandana(a)uchicago.edu<mailto:nandana@uchicago.edu>> wrote: Hi, I am running Amelia on a cleaned up subset of a single round of the National Longitudnal Survey of Youth data. The subset used in the analysis has 64 variables. I only kept factor variables with 2 to 10 levels in the subset, but when I run the amelia command I get the following error: **** Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels **** On the other hand the amelia code runs if I specify "incheck = FALSE" in the command options. I am also able to do multiple imputation on this dataset using other R software but am keen to work with Amelia, since I want to utilize the TSCS aspect of multiple imputation that the software provides. Could you shed some light on why I am seeing the error above and also if the imputation with errors suppressed is reliable? Thanks, Nandana Sengupta University of Chicago, Knowledge Lab -- Amelia mailing list served by HUIT [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia Amelia mailing list Amelia(a)lists.gking.harvard.edu<mailto:Amelia@lists.gking.harvard.edu> To unsubscribe from this list or get other information: https://lists.gking.harvard.edu/mailman/listinfo/amelia

7 years, 1 month

Time Requirement On Small Dataset

by DEYSIA LATOYA LEVIN

Do you have a rough sense of how long it might take to run 30 imputation on 3821 records, n=764? It seems stuck at the "beginning prep functions" stage. Below is my code: install.packages("Amelia") update.packages() require(Amelia) summary(data) a.out <- amelia(data, m=30, ts = "visit", cs = "id", boot.type = "none", noms=c("mar", "abst", "famalc1", "spousalc", "gender", "race", "income", "A", "D"), logs=c("ASIalc", "ASIdrg", "ASIemp", "ASIleg", "ASImed", "ASIpsy", "ASIsoc", "ASIpdgt", "pcntbpov"), bound = rbind(c(12, 0, Inf), c(13, 0, Inf), c(14, 0, Inf), c(15, 0, Inf), c(16, 0, Inf), c(17, 0, Inf), c(18, 0, Inf), c(19, 0, Inf), c(8, 0, Inf)), polytime = 1, intercs = TRUE, p2s = 2, empri = .1*nrow(data)) Any help would be greatly appreciated. Sincerely, Deysia -- Deysia Levin, MPH Epidemiology Doctoral Candidate Department of Epidemiology UC Berkeley School of Public Health 101 Haviland Hall Berkeley, CA. 94720 Phone: (510) 926-2496 Email: Deysia(a)gmail.com <RyanGamba(a)gmail.com>

7 years, 8 months

Pooling standard deviation, Cohen's d, F-statistic, and R-squared

by Gu Li

Hello list members! I am writing to ask about methods of pooling Amelia outputs for standard deviation, Cohen's d, and model fit statistics such as F-statistic and R-squared. Specifically: (1) For SD, can I use mi.meld() to pool SDs estimated from individual imputed datasets, similarly to pooling standard errors for regression coefficients? (2) For Cohen's d, can I use zelig-ls to pool the t-statistic for the dummy predictor, and then transform the pooled t-statistic into Cohen's d? Alternatively, can I calculate Cohen's d by each imputed dataset and then calculate the mean of the ds? Or a third approach, to calculate Cohen's d based on pooled mean and SD? - These approaches do not always lead to identical results, which one is the best? Or is there yet another better approach? (3) For R-squared - I understand that Dr. King recommends not to focus on model fit statistics - but just out of curiosity: mice has a function that uses the procedure proposed by Harel (2009): http://www.tandfonline.com/doi/pdf/10.1080/02664760802553000 a) In each ‘complete’ data, • calculate R2 • take its squared root - R • use Fisher z-transformation to > evaluate the normalized estimate and its variance (Q(i), V (i)) 2) With the m sets of estimates and variances, • combine results using > Rubin’s rules • the confidence interval (CI) for Q is QT ± z(α/2)√(QT) • > inverse transform for the proportion scale • square your results. Is this approach superior to taking the mean of estimated R-squared's from the imputed datasets directly? (4) For the F-statistic - Is there any recommendation other than taking the mean of Fs from the imputed datasets? My apologies for the many questions! Thank you in advance for any of your help! :) Best wishes, Gu -- Gu Li, MS PhD Candidate University of Cambridge Department of Psychology Free School Lane, Cambridge, CB2 3RQ United Kingdom

7 years, 9 months

writing to .dta: empty string

by Sophie C. Moullin

Dear Amelia users/creators, I want to write a stack of Amelia imputed data sets into a Stata format for some specific analyses and tests that I find easier in Stata. I know that write.amelia enables this when the separate argument is set to false, and have tried the following code: write.amelia(am.output, format="dta", file.stem="outdata", separate=FALSE, orig.data=TRUE) However, I get an error message: “Error in write.dta(dataframe= list…) empty string is not valid in Stata's documented format”. Stack overflow has a thread on this error for write.dta, which suggests overwriting a data frame, however I cannot do this with the Amelia output: http://stackoverflow.com/questions/27574055/converting-r-file-to-stata-with… Any advice? Grateful for this great MI package, and for any suggestions! Sophie Sophie Moullin Sociology & Social Policy PhD Student Princeton University smoullin(a)princeton.edu<mailto:smoullin@princeton.edu>

7 years, 9 months

Continued Problems with Subsetting/Gold-Standard Data in Overimputation

by Sean Kates

After updating to the newest version of Amelia (1.7.4), I tried overimputing a dataset that has incorrect values in one of its variables. All of the error observations are measured identically (as zeros, where they should be positive). The code I originally used is below, and it triggers a warning of the type: "Some observations estimated with negative measurement error variance. Set to gold standard." dat<-data.frame(A, B, C, VS) mopd<-moPrep(dat, VS~VS, subset=VS<.0001) I looked through the github code as to what causes this error (other than, of course, the negative error variance), and more importantly, how to activate the gold.standard (which for my purposes is the rest of the values for VS) and presumably fix this issue. After trying quite a few different possible codings, I can't get it to work. I either receive the same error, or a host of errors surrounding how I've included gold.standard in the code. I would think it should be easy, since I'm basically bifurcating my data (all data under some amount is the subset measured with error; all data over the amount can be considered gold-standard data), but can't figure it out. Thanks for any help you can give, Sean

7 years, 11 months

Overimputation - Setting observation-level priors on nominal variables

by HuiYing Chua

Hi there, I have basically 2 questions related to setting observation-level priors on nominal variables. I am trying to do an overimputation on a dichotomous variable, say y1. My 1st question: I am aware that using the argument “priors” and “overimp”, I could specify observation-level priors by 4-column matrix (row, column, prior.mean, prior.sd) or 5-column matrix (row, column, lower confidence range, upper confidence range, confidence level). I am attempting the 4-column matrix but I am not sure how do I specify prior.mean and prior.sd when my prior is the dichotomous variable itself. I read somewhere prior.mean can be set to y1 itself? Is prior.sd similar to the proportion of variance attributable to measurement error? Would need advice on how do I specify prior.sd in this case. My 2nd question: I am also aware of generating prior using the command “moPrep” from the Amelia package. The argument “error.proportion” from “moPrep” command is rather easy to understand (proportion of variance attributable to measurement error). But what is the difference setting priors using “moPrep" and “priors”? Should the output be the same? Please kindly advice. Many many thanks ! Huiying

7 years, 11 months

← Newer
1
2
3
4
5
6
7
...
67
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia