Dear Developers
I have been a strong advocate of Amelia II. To date I have taught
four missing data workshops in both the US and Europe demoing Amelia
II. Two my workshops had a hands on component where I made the
participants do their own imputations. My experience teaching Amelia
II has pointed out one feature that is badly missing from the GUI. A
random generator seed setter. It would be really nice if we could set
a seed in the GUI if we wanted to so we could get the same imputations
on every computer in the classroom. It would also be very nice in
terms of logging replication information of which I am a very strong
proponents of (mainly due to the efforts of Gary King himself). But
right now (as far as I know) appropriate logging facilities through
the use of random number generator seeds are only available to command
line users of Amelia II. Additionally, some easy to use logging
function that logs the R syntax of every step taken in the GUI would
also be beneficial.
Please include these feature in a future release of Amelia II.
Thanks.
Levi Littvay
Central European University
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi (and especially, Hi Matt!)
I'm getting the (known) error
Error in chol.default(copy.theta[c(FALSE, m[ss, ]), c(FALSE, m[ss, ])]) :
the leading minor of order 6 is not positive definite
whenever I try to set a distribution or range prior for a whole variable
in AmeliaView. I know this has been discussed before, but no solution
was sent to the list... what's the problem here? I'm using Amelia II
version 1.1-27 and the latest version of R.
Here's my session file:
`amelia.list` <-
structure(list(amelia.args = structure(list(outname = "blimputationbig",
m = 5, empri = 0, ts = 2, cs = 1, am.filename =
"H:/mmanger/Data/blsmallv3.dta",
file.type = 2, lags = c(8L, 9L, 10L, 13L), leads = 8:10,
polytime = 2, intercs = 0, output.select = 4, priors = structure(c(0,
0, 3, 6, 0.97, 5.9, 6.9, 2.05), .Dim = c(2L, 4L), .Dimnames = list(
c("newPrior", "newPrior"), c("", "polity", "", "")))), .Names =
c("outname",
"m", "empri", "ts", "cs", "am.filename", "file.type", "lags",
"leads", "polytime", "intercs", "output.select", "priors"))), .Names =
"amelia.args")
Any help would be greatly appreciated.
--Mark
--
Mark S. Manger, PhD
Assistant Professor
Department of Political Science, McGill University
mark.manger(a)mcgill.ca
on leave 2007-08:
Advanced Research Fellow, Program on US-Japan Relations
Weatherhead Center for International Affairs
Harvard University
61 Kirkland Street, Room 301
Cambridge, MA 02138
617-495-5998
--
Mark S. Manger, PhD
Assistant Professor
Department of Political Science, McGill University
mark.manger(a)mcgill.ca
on leave 2007-08:
Advanced Research Fellow, Program on US-Japan Relations
Weatherhead Center for International Affairs
Harvard University
61 Kirkland Street, Room 301
Cambridge, MA 02138
617-495-5998
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi,
I apologize in advance for the lengthy question, but it's representative
of many issues I face when working with large panels of economic data,
so I would be extremely grateful for your suggestions, best practices,
experiences etc.
I'm wondering what I could do to speed up the imputation of my rather
large dataset (a panel of N 2120 x T 80 = 169600 obs). At this pace, my
imputations would run months. Memory is not the issue, rather I think
that I have too many priors and/or too many missings on certain
variables. See below, especially lnAid and lnFDI. Note that the missings
are concentrated on certain T points (in early time points) rather than
specific cross-sectional units.
|
Variable || || | Obs Mean Std. Dev.
Min Max
Polity || || | 168160 .8924833 6.955011
-10 10
Corruptlvl|| || | 157820 5.441431 1.799652
0 10
RuleofLaw || || | 157820 5.247434 2.204846
0 10
GovStab || ||| 157820 5.935454 2.064963
0 10
log of bilat. Aid | 76079 1.919392 2.338255 -2.302585 9.692112
log of FDI in host| 32080 3.918487 2.928901 -2.372018 10.98025
Capital openness|||| | 155200 -.2888318 1.379179 -1.766966
2.602508
Polcon V || || | 154320 .3490876 .3158385
0 .89
log of GDPcap_host| 154560 7.95649 1.053043 4.933741 10.48464
||log of ||GDP_host | 166480 29.62135 3.04193 22.97718
43.12974
||log of ||GDP_home | 147381 31.1144 2.128313 26.15253
37.36032
|If I don't set range priors, I get nonsensical values for most of the
variables: negative GDP (real GDP, not negative log values), polity
scores out of range, etc. I haven't even tried higher-order polynomials
or interactions with cross-sectional units, although I would prefer to
given that FDI exhibits a clear trend. Breaking up the dataset randomly
into pieces by cross-sections doesn't improve speed.
It seems that I have to make tradeoffs. What do you think would be the
best thing to do, i.e. what is the most time-consuming issue for the EM
algorithm?
Constrain/shorten the sample to have a higher proportion of observed
values on lnAid and lnFDI?
Accept imputations that are out of range (probably not)?
||Break up the dataset "vertically" into one with Aid and one with the
FDI variable, run two sets of imputations, and merge it again?
Many thanks,
Mark
--
Mark S. Manger, PhD
Assistant Professor
Department of Political Science, McGill University
mark.manger(a)mcgill.ca
on leave 2007-08:
Advanced Research Fellow, Program on US-Japan Relations
Weatherhead Center for International Affairs
Harvard University
61 Kirkland Street, Room 301
Cambridge, MA 02138
617-495-5998
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia