Do you have a rough sense of how long it might take to run 30 imputation on
3821 records, n=764? It seems stuck at the "beginning prep functions"
stage.
Below is my code:
install.packages("Amelia")
update.packages()
require(Amelia)
summary(data)
a.out <- amelia(data, m=30, ts = "visit", cs = "id", boot.type =
"none", noms=c("mar", "abst", "famalc1", "spousalc", "gender", "race",
"income", "A", "D"),
logs=c("ASIalc", "ASIdrg", "ASIemp", "ASIleg", "ASImed", "ASIpsy",
"ASIsoc", "ASIpdgt", "pcntbpov"),
bound = rbind(c(12, 0, Inf), c(13, 0, Inf), c(14, 0, Inf), c(15, 0,
Inf), c(16, 0, Inf), c(17, 0, Inf), c(18, 0, Inf), c(19, 0, Inf), c(8,
0, Inf)),
polytime = 1, intercs = TRUE, p2s = 2, empri = .1*nrow(data))
Any help would be greatly appreciated.
Sincerely,
Deysia
--
Deysia Levin, MPH
Epidemiology Doctoral Candidate
Department of Epidemiology
UC Berkeley School of Public Health
101 Haviland Hall
Berkeley, CA. 94720
Phone: (510) 926-2496
Email: Deysia(a)gmail.com <RyanGamba(a)gmail.com>
Hello list members!
I am writing to ask about methods of pooling Amelia outputs for standard
deviation, Cohen's d, and model fit statistics such as F-statistic and
R-squared.
Specifically: (1) For SD, can I use mi.meld() to pool SDs estimated from
individual imputed datasets, similarly to pooling standard errors for
regression coefficients?
(2) For Cohen's d, can I use zelig-ls to pool the t-statistic for the dummy
predictor, and then transform the pooled t-statistic into Cohen's d?
Alternatively, can I calculate Cohen's d by each imputed dataset and then
calculate the mean of the ds? Or a third approach, to calculate Cohen's d
based on pooled mean and SD? - These approaches do not always lead to
identical results, which one is the best? Or is there yet another better
approach?
(3) For R-squared - I understand that Dr. King recommends not to focus on
model fit statistics - but just out of curiosity: mice has a function that
uses the procedure proposed by Harel (2009):
http://www.tandfonline.com/doi/pdf/10.1080/02664760802553000
a) In each ‘complete’ data,
• calculate R2 • take its squared root - R • use Fisher z-transformation to
> evaluate the normalized estimate and its variance (Q(i), V (i))
2) With the m sets of estimates and variances, • combine results using
> Rubin’s rules • the confidence interval (CI) for Q is QT ± z(α/2)√(QT) •
> inverse transform for the proportion scale • square your results.
Is this approach superior to taking the mean of estimated R-squared's from
the imputed datasets directly?
(4) For the F-statistic - Is there any recommendation other than taking the
mean of Fs from the imputed datasets?
My apologies for the many questions! Thank you in advance for any of your
help! :)
Best wishes,
Gu
--
Gu Li, MS
PhD Candidate
University of Cambridge
Department of Psychology
Free School Lane, Cambridge, CB2 3RQ
United Kingdom