Hello,
I have a dataset with a very large number of columns (>10000) and fewer
rows, but still a lot (>4000). I want to impute some data with Amelia for
use in Zelig. I know what variables I am interested in using in the
regressions, but there are a number of missing values, so I was hoping I
could leverage the rest of the dataset to come up with imputations for the
variables of interest.
Hence, I only want to impute missing values in a few columns of my dataset,
rather than spending loads of time imputing all the other values which I
don't need and getting error 34 trying to do this. Problem is that I can't
figure out how to exclude columns from being imputed without excluding them
from the entire analysis.
I guessed this would be straight-forward, but I searched and searched and
came up with nothing, so if anyone can help me sort this out, I would very
much appreciate it.
Thanks,
Drew
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Dear list members,
I have a question concerning lags and leads - it is easy to tell Amelia for
which variables I want to include lags and leads. But how many lags and
leads does it take into account? Is it customizable? Can only specific leads
be specified, e.g. 3rn, 5th and 10th lead?
Thank you very much in advance!
Tom
--
Bc. Tomáš Kudláček
Junior Researcher
Phytophthora Research Centre
CZ.02.1.01/0.0/0.0/ 15_003/0000453
Department of Forest Protection and Wildlife Management
Faculty of Forestry and Wood Technology
Mendel University in Brno
Zemědělská 3, Brno, 613 00
Tel: +420 775680314
Web: www.mendelu.cz
Hello,
I am new to amelia and have been unable to find any info on the error I am
receiving:
*> tscsPlot(bd.am.poly1, cs = "Fraud", var = "n")*
*Error: Unsupported use of matrix or array for column indexing*
The data is 3 variables: a date var used for the ts, a grouping var used
for cs, and a count var, "n", which is my outcome variable that requries
imputation.
My initial amelia call looks like this (note that it wasn't clear if amelia
supported datetime objects, so I converted to seconds since epoch - if it
can handle datetime objects and this conversion is unnecessary, it would be
great to know that!):
*>bd.am.poly1 <- amelia(bdsum, idvars = "date", ts = "datenum", cs =
"call.type", polytime = 1, intercs = T)*
Here's a sample of the data:
date call.type n datenum
<date> <fctr> <dbl> <dbl>
1 2016-08-17 Criminal Mis C NA 17030
2 2014-02-10 TRAFFB-Traffic Complaint NA 16111
3 2015-11-03 WEAPO1B-Weapon I/P 1 16742
4 2016-09-03 Disturbance 0 17047
5 2015-02-24 Assault 0 16490
6 2014-08-06 Unknown Problem 0 16288
7 2014-08-28 INJACC3B-Injury Accident C 0 16310
8 2015-06-11 Recovered Stolen Prop 0 16597
9 2016-02-15 Lost Property 0 16846
10 2015-05-09 Welfare Check 2 16564
Thanks much,
Jon
---
Jonathan Zadra, PhD
Data Scientist
Sorenson Impact Center
David Eccles School of Business, University of Utah
www.sorensonimpactcenter.com
--
Dear list members,
I realize that this is perhaps more of a conceptual issue than a practical
one, but I wonder how would you deal with survey responses such as "don't
know" or "not applicable." Specifically:
(1) Do you regard "don't know" and "not applicable" as missing?
(2) If not, do you regard them as valid responses as other options (e.g., a
scale of 1 to 7), and use all these values to impute missing data? That is,
if someone did not answer this item, the imputed value could be don't know,
not applicable, or any value from 1 to 7. If this is the correct approach,
how to do it in Amelia or other software?
(3) Is it possible to only impute the "true" missing data (i.e., not for
"don't know" or "not applicable" responses), with valid responses from 1 to
7 in Amelia or other software? (Listwise removing participants who select
"don't know" or "not applicable" in one variable before imputing is not a
good idea because those participants may contribute to MAR/MCAR missing in
other variables.)
(4) Are there other approaches to deal with "don't know" or "not
applicable" responses?
Many thanks for your help!
Gu
--
Gu Li, PhD
Visiting International Research Scholar
University of British Columbia
E-mail: guli(a)alumni.ubc.ca; ligu.sysu(a)gmail.com
Hi,
I have been using the development version of Amelia for the past few months and finding it very useful (Thanks Matt!).
However since the last two days an installation error pops up when trying to install using the following command:
install.packages("Amelia", repos="http://r.iq.harvard.edu", type = "source")
The error:
Warning: unable to access index for repository http://r.iq.harvard.edu/src/contrib:
cannot open URL 'http://r.iq.harvard.edu/src/contrib/PACKAGES'
Warning message:
package ‘Amelia’ is not available (for R version 3.2.3)
Error in library("Amelia") : there is no package called ‘Amelia’
Execution halted
Would be great if the permissions could be restored so we can continue using the development version.
Thanks a lot,
Nandana Sengupta
University of Chicago, Knowledge Lab
________________________________
From: Matt Blackwell [mblackwell(a)gov.harvard.edu]
Sent: Tuesday, September 29, 2015 9:48 PM
To: Nandana Sengupta; amelia(a)lists.gking.harvard.edu
Subject: Re: [amelia] Error: contrasts can be applied only to factors with 2 or more levels
Hi Nandana,
This is a bug in the current version of Amelia that occurs when listwise deletion eliminates all observations. We have a fix for it in the development version of Amelia. You can find installation instructions for that version here:
https://github.com/IQSS/Amelia
We will be submitting this development version to CRAN in the upcoming weeks.
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Government
Harvard University
url: http://www.mattblackwell.org
On Mon, Sep 28, 2015 at 12:43 PM Nandana Sengupta <nandana(a)uchicago.edu<mailto:nandana@uchicago.edu>> wrote:
Hi,
I am running Amelia on a cleaned up subset of a single round of the National Longitudnal Survey of Youth data.
The subset used in the analysis has 64 variables.
I only kept factor variables with 2 to 10 levels in the subset, but when I run the amelia command I get the following error:
****
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
****
On the other hand the amelia code runs if I specify "incheck = FALSE" in the command options.
I am also able to do multiple imputation on this dataset using other R software but am keen to work with Amelia, since I want to utilize the TSCS aspect of multiple imputation that the software provides.
Could you shed some light on why I am seeing the error above and also if the imputation with errors suppressed is reliable?
Thanks,
Nandana Sengupta
University of Chicago, Knowledge Lab
--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Amelia mailing list
Amelia(a)lists.gking.harvard.edu<mailto:Amelia@lists.gking.harvard.edu>
To unsubscribe from this list or get other information:
https://lists.gking.harvard.edu/mailman/listinfo/amelia
Do you have a rough sense of how long it might take to run 30 imputation on
3821 records, n=764? It seems stuck at the "beginning prep functions"
stage.
Below is my code:
install.packages("Amelia")
update.packages()
require(Amelia)
summary(data)
a.out <- amelia(data, m=30, ts = "visit", cs = "id", boot.type =
"none", noms=c("mar", "abst", "famalc1", "spousalc", "gender", "race",
"income", "A", "D"),
logs=c("ASIalc", "ASIdrg", "ASIemp", "ASIleg", "ASImed", "ASIpsy",
"ASIsoc", "ASIpdgt", "pcntbpov"),
bound = rbind(c(12, 0, Inf), c(13, 0, Inf), c(14, 0, Inf), c(15, 0,
Inf), c(16, 0, Inf), c(17, 0, Inf), c(18, 0, Inf), c(19, 0, Inf), c(8,
0, Inf)),
polytime = 1, intercs = TRUE, p2s = 2, empri = .1*nrow(data))
Any help would be greatly appreciated.
Sincerely,
Deysia
--
Deysia Levin, MPH
Epidemiology Doctoral Candidate
Department of Epidemiology
UC Berkeley School of Public Health
101 Haviland Hall
Berkeley, CA. 94720
Phone: (510) 926-2496
Email: Deysia(a)gmail.com <RyanGamba(a)gmail.com>
Hello list members!
I am writing to ask about methods of pooling Amelia outputs for standard
deviation, Cohen's d, and model fit statistics such as F-statistic and
R-squared.
Specifically: (1) For SD, can I use mi.meld() to pool SDs estimated from
individual imputed datasets, similarly to pooling standard errors for
regression coefficients?
(2) For Cohen's d, can I use zelig-ls to pool the t-statistic for the dummy
predictor, and then transform the pooled t-statistic into Cohen's d?
Alternatively, can I calculate Cohen's d by each imputed dataset and then
calculate the mean of the ds? Or a third approach, to calculate Cohen's d
based on pooled mean and SD? - These approaches do not always lead to
identical results, which one is the best? Or is there yet another better
approach?
(3) For R-squared - I understand that Dr. King recommends not to focus on
model fit statistics - but just out of curiosity: mice has a function that
uses the procedure proposed by Harel (2009):
http://www.tandfonline.com/doi/pdf/10.1080/02664760802553000
a) In each ‘complete’ data,
• calculate R2 • take its squared root - R • use Fisher z-transformation to
> evaluate the normalized estimate and its variance (Q(i), V (i))
2) With the m sets of estimates and variances, • combine results using
> Rubin’s rules • the confidence interval (CI) for Q is QT ± z(α/2)√(QT) •
> inverse transform for the proportion scale • square your results.
Is this approach superior to taking the mean of estimated R-squared's from
the imputed datasets directly?
(4) For the F-statistic - Is there any recommendation other than taking the
mean of Fs from the imputed datasets?
My apologies for the many questions! Thank you in advance for any of your
help! :)
Best wishes,
Gu
--
Gu Li, MS
PhD Candidate
University of Cambridge
Department of Psychology
Free School Lane, Cambridge, CB2 3RQ
United Kingdom
Dear Amelia users/creators,
I want to write a stack of Amelia imputed data sets into a Stata format for some specific analyses and tests that I find easier in Stata.
I know that write.amelia enables this when the separate argument is set to false, and have tried the following code:
write.amelia(am.output, format="dta", file.stem="outdata", separate=FALSE,
orig.data=TRUE)
However, I get an error message: “Error in write.dta(dataframe= list…) empty string is not valid in Stata's documented format”.
Stack overflow has a thread on this error for write.dta, which suggests overwriting a data frame, however I cannot do this with the Amelia output: http://stackoverflow.com/questions/27574055/converting-r-file-to-stata-with…
Any advice?
Grateful for this great MI package, and for any suggestions!
Sophie
Sophie Moullin
Sociology & Social Policy PhD Student
Princeton University
smoullin(a)princeton.edu<mailto:smoullin@princeton.edu>
After updating to the newest version of Amelia (1.7.4), I tried
overimputing a dataset that has incorrect values in one of its variables.
All of the error observations are measured identically (as zeros, where
they should be positive). The code I originally used is below, and it
triggers a warning of the type: "Some observations estimated with negative
measurement error variance. Set to gold standard."
dat<-data.frame(A, B, C, VS)
mopd<-moPrep(dat, VS~VS, subset=VS<.0001)
I looked through the github code as to what causes this error (other than,
of course, the negative error variance), and more importantly, how to
activate the gold.standard (which for my purposes is the rest of the values
for VS) and presumably fix this issue. After trying quite a few different
possible codings, I can't get it to work. I either receive the same error,
or a host of errors surrounding how I've included gold.standard in the
code. I would think it should be easy, since I'm basically bifurcating my
data (all data under some amount is the subset measured with error; all
data over the amount can be considered gold-standard data), but can't
figure it out. Thanks for any help you can give,
Sean
Hi there,
I have basically 2 questions related to setting observation-level priors on nominal variables.
I am trying to do an overimputation on a dichotomous variable, say y1.
My 1st question:
I am aware that using the argument “priors” and “overimp”, I could specify observation-level priors by 4-column matrix (row, column, prior.mean, prior.sd) or 5-column matrix (row, column, lower confidence range, upper confidence range, confidence level). I am attempting the 4-column matrix but I am not sure how do I specify prior.mean and prior.sd when my prior is the dichotomous variable itself. I read somewhere prior.mean can be set to y1 itself? Is prior.sd similar to the proportion of variance attributable to measurement error? Would need advice on how do I specify prior.sd in this case.
My 2nd question:
I am also aware of generating prior using the command “moPrep” from the Amelia package. The argument “error.proportion” from “moPrep” command is rather easy to understand (proportion of variance attributable to measurement error). But what is the difference setting priors using “moPrep" and “priors”? Should the output be the same?
Please kindly advice. Many many thanks !
Huiying