PROBLEM: (1) The guidance on setting the parameter "empri" in the user
manual is not consistent, and thus might be confusing.
(2) There is a small typo in the text addressing the
parameter "empri" in the User Manual and the "Amelia" file in the R package
BACKGROUND:
1. Inconsistent Guidance
(a) User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package
(at \ library \ Amelia \ help \ Amelia, which I read as a Windows *.txt
file...) states"
empri: number indicating level of the empirical (or ridge) prior.
This prior shinks the covariances of the data, but keeps the
means and variances the same for problems of high
missingness, small N's or large correlations among the
variables. Should be kept small; a reasonable upper bound
is around 10% of the rows of the data.
(b) User Manual, sec. 5.6.1 (p.21) reads:
"A recommendation of 0.5 to 1 percent of the number of observations, n, is a
reasonable starting value, and often useful in large datasets to add some
numerical stability. For example, in a dataset of two thousand observations,
this would translate to a prior value of 10 or 20 respectively. A prior of
up to 5 percent is moderate in most applications.
For our data, it is easy to code up a 1 percent ridge prior:
> a.out.time2 <- amelia(freetrade, ts = "year", cs = "country",
+ polytime = 2, intercs = TRUE, p2s = 0, empri = 0.01 *
+ nrow(freetrade))...."
Since the example in sec.5.6,1 uses a value equal to 1% of the number of
rows of data, I have favored this interpretation...
(My experimenting indicates that using a value up to 5% of the number of
rows of data works better than trying to use a value of 0.1 to 1% or up to
5% of the number of observations.)
2. Typo
The User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package
have the same typo: "shinks" instead of "shrinks"...
RECOMMENDATIONS:
1. Recommend that the "Amelia" file in the R package and both sections of
the User Manual reflect the best guidance, and be consistent.
2. Fix the little typo identified in 2. above
Wayne A. Thornton
thornton(a)fas.harvard.edu
<http://1429236.signature1.mailinfo.com/confirm2.6/0403020B/0003074A/0D004C0
0/65702201.jpg>
Please disregard all of the issues/questions I raised in my email below,
EXCEPT for one:
Q: Does whether or not the input file has a header row (variable names)
affect how Amelia works?
Matt Blackwell's response to my first issue ( subj: Amelia for R produces no
imputed data output files [WAT Issue #1] ) resolved the other issues in my
earlier email below.)
I changed the subject line of this message accordingly...
DISCUSSION: It seems that Amelia (and AmeliaView) assume that the input
data set has a header row.
However I cannot find any discussion in the documentation to confirm this.
I have observed the following:
-- When I write the data.frame to a csv file to be read by AmeliaView... if
the csv file has no header row, then in AmeliaView -> Summarize Data ->
"Missing: x / [total]"... The "total" listed is one less than the rows
actually in the data set.
-- When I pass the data.frame to Amelia for R directly, it doesn't seem to
have this problem.
To prevent any problems of this nature, should Amelia and AmeliaView have an
input parameter telling it whether or not the input data set has a header
row?
Wayne Thornton
thornton(a)fas.harvard.edu
_____
From: owner-amelia_at_lists_gking_harvard_edu(a)mail.hmdc.harvard.edu
[mailto:owner-amelia_at_lists_gking_harvard_edu@mail.hmdc.harvard.edu] On
Behalf Of Wayne Thornton
Sent: Sunday, June 28, 2009 16:24
To: amelia(a)lists.gking.harvard.edu
Subject: [amelia] Amelia output extracted from output[[ ]] looks odd [WAT
Issue #2]
RE: Amelia output extracted from output[[ ]] looks odd [WAT Issue #2]
PROBLEM: After running Amelia to generate 5 imputed files, the output files
extracted using output[[ ]] look odd....
BACKGROUND: Here is my command line to run Amelia:
*******************
CONTROL PANEL
*******************
impruns <- 5
tolX <- 0.0001
empriX <- 100
autopriX <- 0.05
resampleX <- 100
***************************
CONTROL PANEL
*******************
imputed <- amelia(DATA8i,
m = impruns , p2s = 2 ,
idvars = c(3,4,5) ,
ts = 1 , cs = 2 , polytime = NULL,
startvals = 0 ,
tolerance = tolX ,
noms
= nomIV8i ,
ords = ordIV8i , incheck = T , collect = F ,
outname = "DATA8imp",
write.out = T , archive = T ,
keep.data = T ,
empri = empriX ,
autopri = autopriX ,
bounds = IVlims, max.resample =
resampleX )
After a run I am able to extract output info from...
imputed[[ ]]
The user guide (p.27, under "Output") says.
"...you can refer to any of the datasets by referencing output[[i]], where i
is the number of the dataset you wish to reference.
These datasets will be returned in the same format which you passed
them...."
However, the files imputed[[1]], imputed[[2]], etc.......are quite different
from the original input file, and different from each other.
-- The input file is a data frame (1044 x 487). with no header.
-- Output files:
imputed[[ 1]] 1044 x 2435 numeric; looks
like imputed values
NOTE: 2435 = 5 * 287...
imputed[[ 2]] 1 x 1 "5"
imputed[[ 3]] TRUE /
FALSE
imputed[[ 4]] 483 x 2415 numeric, does
NOT look line imputed values
NOTE: 483 = number of IVs minus 4;
Data set includes 3 identity variables, 1 time series var, 1 cross-section
var
imputed[[ 5]] 483 x 5 numeric,
does NOT look lile imputed values
These output files raise the following comments/questions:
(1) Contrary to the info in the user guide, the output files extracted from
output[[i]] do not match the format of the input file.
(2) Does whether or not the input file has a header row (variable names)
affect how Amelia works?
(This question may be an artifact of my lack of understanding about working
with data frames... But if you read in the output csv file and compute
nrow(file), the result is one less than the number of rows actually in the
csv file.
(3) Is the first output file [[1]] the 5 sets of imputed data?
(4) I have no idea what the other files are... Are they for diagnostics?
Thanks,
Wayne
SUBMITTED BY: Wayne A. Thornton
Harvard Univ.
thornton(a)fas.harvard.edu
781-492-3131
<http://1429236.signature1.mailinfo.com/confirm2.6/0205010E/0202054D/0B034F0
5/13137013.jpg>
PROBLEM: The Amelia functions summary.Amelia() and compare.density()
apparently report the fraction of missing values (expressed as a decimal).
However, these values are identified as "Percent
Missing" in...
-- output of summary.Amelia()
-- legends on plots generated by compare.density()
EXAMPLE from my data set:
summary.Amelia() my computation
Tension_avg_vics
0.01149 1.149
Tension_avg_vics_no_zeros
0.01245 1.245
Tension_bads_count
0.01245 1.245
Tension_bads_div_tokens
0.01245 1.245
Tension_distrusts_count
0.02011 2.011
....
.... ....
Wayne A. Thornton
thornton(a)fas.harvard.edu
<http://1429236.signature1.mailinfo.com/confirm2.6/06060308/05010F45/0F034D0
3/97211729.jpg>
Matt---
...Weird....
...If you unzip the HELP file from the attached zip file, THEN it
works...!!...
Wayne
_____
From: Wayne Thornton [mailto:thornton@fas.harvard.edu]
Sent: Sunday, June 28, 2009 18:13
To: 'Matt Blackwell'
Cc: 'amelia(a)lists.gking.harvard.edu'
Subject: Amelia documentation seems to differ between websites; Amelia
package Help file does not work right [WAT issue #3]
Matt---
I was about to respond with an email saying, "...Matt... WHAT documentation
are you talking about?.. ..when I realized my Amelia installation must be
one change behind...
(All the docs I had were 6 weeks out of date....).
...However, having realized that and tried to fix it, I still note two
problems...
(....I realize this kind of stuff is hard to jeep up with-- and am most
grateful that anyone is even trying.... So please accept my comments as
constructive and not critical...)
(1) The version of the PDF manual avail at the R website and included in
the Amelia R-package (dated 04-27-09) seems substantially different from the
version available at Gary King's website (dated 05-17-09...)
See...
http://cran.r-project.org/web/packages/Amelia/Amelia.pdfhttp://gking.harvard.edu/amelia/docs/amelia.pdf
(2) I can't open the entries in the Help file provided with the Amelia
package....
After opening
\ library \ Amelia \ chtm \ Amelia.chm
...if you click on an entry in the left window, nothing comes up in the
right (main) window...
(See attached zip file).
Thanks...
Wayne
-----Original Message-----
From: Matt Blackwell [mailto:mblackwell@gmail.com]
Sent: Sunday, June 28, 2009 16:12
To: Wayne Thornton
Cc: amelia(a)lists.gking.harvard.edu
Subject: Re: [amelia] Amelia for R produces no imputed data output files
[WAT Issue #1]
Hi Wayne,
Amelia recently changed its output to fit better with R. You will now
find the imputed datasets in the following locations:
imputed$imputations[[1]]
imputed$imputations[[2]]
...
imputed$imputations[[m]]
Amelia no longer saves the imputed datasets from the amelia()
function. You can write the datasets to file using the write.amelia()
function:
write.amelia(imputed, file.stem = "outdata")
For further details, take a look at the manual page for write.amelia()
or see the Amelia documentation (section 5.2.1 should have the
relevant information).
I hope that helps,
matt.
---------------------------------
** INITIAL EMAIL DELETED **
<http://1429236.signature1.mailinfo.com/confirm2.6/0605020B/00070444/0F014C0
0/87102242.jpg>
RE: Amelia for R produces no imputed output files [WAT Issue #1]
PROBLEM: After running Amelia to generate 5 imputed files, no imputed data
output files were generated.
BACKGROUND: Here is my command line to run Amelia:
*******************
CONTROL PANEL
*******************
impruns <- 5
tolX <- 0.0001
empriX <- 100
autopriX <- 0.05
resampleX <- 100
***************************
CONTROL PANEL
*******************
imputed <- amelia(DATA8i,
m = impruns , p2s = 2 ,
idvars = c(3,4,5) ,
ts = 1 , cs = 2 , polytime = NULL,
startvals = 0 ,
tolerance = tolX ,
noms
= nomIV8i ,
ords = ordIV8i , incheck = T , collect = F ,
outname = "DATA8imp",
write.out = T , archive = T ,
keep.data = T ,
empri = empriX ,
autopri = autopriX ,
bounds = IVlims, max.resample =
resampleX )
After a run in which Amelia did not generate the expected output files
(.DATA8imp1, DATA8imp2, etc.),
I am able to extract output info from.
imputed[[ ]]
imputed.rData
SUBMITTED BY: Wayne A. Thornton
Harvard Univ.
thornton(a)fas.harvard.edu
781-492-3131
<http://1429236.signature1.mailinfo.com/confirm2.6/05060108/0502074E/03074F0
3/24231711.jpg>
RE: Amelia output extracted from output[[ ]] looks odd [WAT Issue #2]
PROBLEM: After running Amelia to generate 5 imputed files, the output files
extracted using output[[ ]] look odd....
BACKGROUND: Here is my command line to run Amelia:
*******************
CONTROL PANEL
*******************
impruns <- 5
tolX <- 0.0001
empriX <- 100
autopriX <- 0.05
resampleX <- 100
***************************
CONTROL PANEL
*******************
imputed <- amelia(DATA8i,
m = impruns , p2s = 2 ,
idvars = c(3,4,5) ,
ts = 1 , cs = 2 , polytime = NULL,
startvals = 0 ,
tolerance = tolX ,
noms
= nomIV8i ,
ords = ordIV8i , incheck = T , collect = F ,
outname = "DATA8imp",
write.out = T , archive = T ,
keep.data = T ,
empri = empriX ,
autopri = autopriX ,
bounds = IVlims, max.resample =
resampleX )
After a run I am able to extract output info from...
imputed[[ ]]
The user guide (p.27, under "Output") says.
"...you can refer to any of the datasets by referencing output[[i]], where i
is the number of the dataset you wish to reference.
These datasets will be returned in the same format which you passed
them...."
However, the files imputed[[1]], imputed[[2]] are quite different from the
original input file, and different from each other.
-- The input file is a data frame (1044 x 487). with no header.
-- Output files:
imputed[[ 1]] 1044 x 2435
numeric; looks like imputed values
NOTE: 2435 = 5 * 287...
imputed[[ 2]] 1 x 1 "5"
imputed[[ 3]]
TRUE / FALSE
imputed[[ 4]] 483 x 2415 numeric, does
NOT look line imputed values
NOTE: 483 = number of IV minus 4; Data set includes 3 identity variables, 1
time series var, 1 cross-section var
imputed[[ 5]] 483 x 5 numeric, does
NOT look line imputed values
These output files raise the following comments/questions:
(1) Contrary to the info in the user guide, the output files extracted from
output[[i]] do not match the format of the input file.
(2) Does whether or not the input file has a header row (variable names)
affect how Amelia works?
(This question may be an artifact of my lack of understanding about working
with data frames... But if you read in the output csv file and compute
nrow(file), the result is one less than the number of rows actually in the
csv file.
(3) Is the first output file [[1]] the 5 sets of imputed data?
(4) I have no idea what the other files are... Are they for diagnostics?
Thanks,
Wayne
SUBMITTED BY: Wayne A. Thornton
Harvard Univ.
thornton(a)fas.harvard.edu
781-492-3131
<http://1429236.signature1.mailinfo.com/confirm2.6/0000000B/0007074E/06024E0
0/21422241.jpg>
Hello,
I am trying to find out what specific bootstrapping algorithm is used to generate the m incomplete data sets, which are then used to get m sets of point estimates for mu and Sigma. Specifically, are you using a block, paired, wild or other type of bootstrapping to sample m data sets of size n from the data D={D_obs, D_mis}?
I checked the documentation and the accompanying paper but could not find the answer.
Many thanks for any response you can give.
Best,
Tanja
Tanja Srebotnjak, PhD, MSc, Dipl. Stat.
Postgraduate Fellow
Institute for Health Metrics and Evaluation
University of Washington
2301 5th Ave, Suite 600
Seattle, WA 98121
Email: tanjas(a)u.washington.edu<mailto:tanjas@u.washington.edu>
Tel: +1-206-897-2866
www.healthmetricsandevaluation.org<http://www.healthmetricsandevaluation.org>
Hello,
I'm using Amelia 2 with R 2.8.1 version, and have the
following question: Is there a way to get non-graphical
output of overimputation? More precisely, is there a way to
get exact values of the imputed values and confidence
intervals obtained with the overimputation, i.e. values of
all mean imputed values and confidence intervals showed on
the overimputation diagnostic graph? For me it would be very
useful if I could get "standard" output instead of the
graphical one, but unfortunately I don't know how to do it
in R.
Thank you in advance for you answer.
Best wishes,
Sandra
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
I am receiving an error massage 43
Amelia Error Code: 43
You have a variable in your dataset that does not vary. Please remove this
variable.
However, when pinpointing the variable in question, it appears that it does
in fact vary.
What could be causing this problem?
Kind regards,
Harry