Hi Gu,
See below for responses.
On Thu, Jul 14, 2016 at 8:44 AM, Gu Li <ligu.sysu(a)gmail.com> wrote:
Dear all,
I am re-sending my questions to see if you have any thoughts. I have
spoken to several colleagues and found that they had similar problems...
Any help from you are very much appreciated!
Best,
Gu
2016-06-23 13:13 GMT+01:00 Gu Li <ligu.sysu(a)gmail.com>om>:
Hello list members!
I am writing to ask about methods of pooling Amelia outputs for standard
deviation, Cohen's d, and model fit statistics such as F-statistic and
R-squared.
Specifically: (1) For SD, can I use mi.meld() to pool SDs estimated from
individual imputed datasets, similarly to pooling standard errors for
regression coefficients?
If you just want to report the descriptive SD of a variable, you can just
take the average of the within imputation SDs. The more complicated formula
for the SEs of regression coefficients and means is for estimates of the
uncertainty of an estimate. But the sample SD is itself just an estimate.
> (2) For Cohen's d, can I use zelig-ls to pool the t-statistic for the
> dummy predictor, and then transform the pooled t-statistic into Cohen's d?
> Alternatively, can I calculate Cohen's d by each imputed dataset and then
> calculate the mean of the ds? Or a third approach, to calculate Cohen's d
> based on pooled mean and SD? - These approaches do not always lead to
> identical results, which one is the best? Or is there yet another better
> approach?
>
>
The Rubin rules generally state to estimate a pooled statistic you should
take the average of the within-imputation statistics. Then use the variance
formula to get a pooled variance estimate for the statistic.
(3) For R-squared - I understand that Dr. King
recommends not to focus on
model fit statistics - but just out of curiosity:
mice has a function that
uses the procedure proposed by Harel (2009):
http://www.tandfonline.com/doi/pdf/10.1080/02664760802553000
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.tandfonline.com_doi_pdf_10.1080_02664760802553000&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=wJ94M9DX9QlOxWnqaawekLOZuSonsC6KM_yxvDKwQio&s=OpTcyhCSiKC0h_DkfX6f3_4IMyjdUjbTlm2lAQz4H4c&e=>
a) In each ‘complete’ data,
• calculate R2 • take its squared root - R • use Fisher z-transformation
to evaluate the normalized estimate and its
variance (Q(i), V (i))
2) With the m sets of estimates and variances, • combine results using
Rubin’s rules • the confidence interval (CI) for
Q is QT ± z(α/2)√(QT) •
inverse transform for the proportion scale • square your results.
Is this approach superior to taking the mean of estimated R-squared's
from the imputed datasets directly?
I'm not very familiar with this approach but it sounds reasonable.
I'm sure
the two procedures will lead to very similar estimates of the R^2.
(4) For the F-statistic - Is there any
recommendation other than taking
the mean of Fs from the imputed datasets?
The average is probably an ok way to do this, but more generally you might
want to look to likelihood ratio tests to assess model fit. With those, you
can use the procedure of Meng and Rubin (1992, Biometrika). Here's a link:
http://biomet.oxfordjournals.org/content/79/1/103
Hope that helps!
Cheers,
Matt
~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Government
Harvard University
url:
http://www.mattblackwell.org
My apologies for the many questions! Thank you in
advance for any of your
help! :)
Best wishes,
Gu
--
Gu Li, MS
PhD Candidate
University of Cambridge
Department of Psychology
Free School Lane, Cambridge, CB2 3RQ
United Kingdom
--
Gu Li, MS
PhD Candidate
University of Cambridge
Department of Psychology
Free School Lane, Cambridge, CB2 3RQ
United Kingdom
--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia
Amelia mailing list
Amelia(a)lists.gking.harvard.edu
To unsubscribe from this list or get other information:
https://lists.gking.harvard.edu/mailman/listinfo/amelia