Dear Amelia users/authors:
I was able to impute CSTS data with logical bounds before the latest update
to Amelia. I was wondering whether anyone else experienced similar problems
with the latest update (maybe I am just doing something wrong...)
Thanks for any help, best,
Milan
---
Milan Svolik
Assistant Professor
Department of Political Science
University of Illinois at Urbana-Champaign
https://netfiles.uiuc.edu/msvolik/www/
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
The actual reference to the paper I mentioned in my previous email is:
Andrew Gelman; Gary King; Chuanhai Liu. Not Asked and Not Answered: Multiple Imputation for Multiple Surveys. Journal of the American Statistical Association, Vol. 93, No. 443. (Sep., 1998), pp. 846-857.
Thanks,
Victor Herrera MD, MSc.
Hello Amelia users:
I am working with a pool of surveys and I want to impute missing values in the pooled dataset while keeping the design variables and re-calculated weights (and the variables from which those weights were derived). From the paper by King & Liu (1998) on multiple imputation for multiple surveys now I know that a hierarchical approach to this problem is the appropriate one; however, after reading the documentation of the software (Amelia II) I am not sure whether this task can be accomplished.
I will appreciate your help on this issue.
Thanks,
Victor Herrera MD. MSc.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Hello,
I believe the latest update to the Amelia package for R has inadvertently
broken some essential functionality.
Using a sample dataset available
here<http://www.princeton.edu/%7Ewbullock/sampleData.RData>,
I'm using the following code:
library(Amelia)
load("sampleData.RData")
test <- amelia(sampleData, idvars=c("st"), noms=c("female","imports"),
ords=c("age", "edu", "inc"))
On one computer using Version 1.2-2, built: 2009-04-27 available via the
CRAN archives, everything goes well.
On a second computer using Version 1.2-9, built: 2009-07-02 I get:
Error in sum(sapply(x[, fact], is.factor)) :
invalid 'type' (list) of argument
It appears as though amelia is having trouble taking in a vector of column
names, but I must admit I haven't done extensive testing of the problem.
Any help would be much appreciated.
Sincerely,
--Will Bullock
Department of Politics
Princeton University
PROBLEM: (1) The guidance on setting the parameter "empri" in the user
manual is not consistent, and thus might be confusing.
(2) There is a small typo in the text addressing the
parameter "empri" in the User Manual and the "Amelia" file in the R package
BACKGROUND:
1. Inconsistent Guidance
(a) User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package
(at \ library \ Amelia \ help \ Amelia, which I read as a Windows *.txt
file...) states"
empri: number indicating level of the empirical (or ridge) prior.
This prior shinks the covariances of the data, but keeps the
means and variances the same for problems of high
missingness, small N's or large correlations among the
variables. Should be kept small; a reasonable upper bound
is around 10% of the rows of the data.
(b) User Manual, sec. 5.6.1 (p.21) reads:
"A recommendation of 0.5 to 1 percent of the number of observations, n, is a
reasonable starting value, and often useful in large datasets to add some
numerical stability. For example, in a dataset of two thousand observations,
this would translate to a prior value of 10 or 20 respectively. A prior of
up to 5 percent is moderate in most applications.
For our data, it is easy to code up a 1 percent ridge prior:
> a.out.time2 <- amelia(freetrade, ts = "year", cs = "country",
+ polytime = 2, intercs = TRUE, p2s = 0, empri = 0.01 *
+ nrow(freetrade))...."
Since the example in sec.5.6,1 uses a value equal to 1% of the number of
rows of data, I have favored this interpretation...
(My experimenting indicates that using a value up to 5% of the number of
rows of data works better than trying to use a value of 0.1 to 1% or up to
5% of the number of observations.)
2. Typo
The User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package
have the same typo: "shinks" instead of "shrinks"...
RECOMMENDATIONS:
1. Recommend that the "Amelia" file in the R package and both sections of
the User Manual reflect the best guidance, and be consistent.
2. Fix the little typo identified in 2. above
Wayne A. Thornton
thornton(a)fas.harvard.edu
<http://1429236.signature1.mailinfo.com/confirm2.6/0403020B/0003074A/0D004C0
0/65702201.jpg>
Please disregard all of the issues/questions I raised in my email below,
EXCEPT for one:
Q: Does whether or not the input file has a header row (variable names)
affect how Amelia works?
Matt Blackwell's response to my first issue ( subj: Amelia for R produces no
imputed data output files [WAT Issue #1] ) resolved the other issues in my
earlier email below.)
I changed the subject line of this message accordingly...
DISCUSSION: It seems that Amelia (and AmeliaView) assume that the input
data set has a header row.
However I cannot find any discussion in the documentation to confirm this.
I have observed the following:
-- When I write the data.frame to a csv file to be read by AmeliaView... if
the csv file has no header row, then in AmeliaView -> Summarize Data ->
"Missing: x / [total]"... The "total" listed is one less than the rows
actually in the data set.
-- When I pass the data.frame to Amelia for R directly, it doesn't seem to
have this problem.
To prevent any problems of this nature, should Amelia and AmeliaView have an
input parameter telling it whether or not the input data set has a header
row?
Wayne Thornton
thornton(a)fas.harvard.edu
_____
From: owner-amelia_at_lists_gking_harvard_edu(a)mail.hmdc.harvard.edu
[mailto:owner-amelia_at_lists_gking_harvard_edu@mail.hmdc.harvard.edu] On
Behalf Of Wayne Thornton
Sent: Sunday, June 28, 2009 16:24
To: amelia(a)lists.gking.harvard.edu
Subject: [amelia] Amelia output extracted from output[[ ]] looks odd [WAT
Issue #2]
RE: Amelia output extracted from output[[ ]] looks odd [WAT Issue #2]
PROBLEM: After running Amelia to generate 5 imputed files, the output files
extracted using output[[ ]] look odd....
BACKGROUND: Here is my command line to run Amelia:
*******************
CONTROL PANEL
*******************
impruns <- 5
tolX <- 0.0001
empriX <- 100
autopriX <- 0.05
resampleX <- 100
***************************
CONTROL PANEL
*******************
imputed <- amelia(DATA8i,
m = impruns , p2s = 2 ,
idvars = c(3,4,5) ,
ts = 1 , cs = 2 , polytime = NULL,
startvals = 0 ,
tolerance = tolX ,
noms
= nomIV8i ,
ords = ordIV8i , incheck = T , collect = F ,
outname = "DATA8imp",
write.out = T , archive = T ,
keep.data = T ,
empri = empriX ,
autopri = autopriX ,
bounds = IVlims, max.resample =
resampleX )
After a run I am able to extract output info from...
imputed[[ ]]
The user guide (p.27, under "Output") says.
"...you can refer to any of the datasets by referencing output[[i]], where i
is the number of the dataset you wish to reference.
These datasets will be returned in the same format which you passed
them...."
However, the files imputed[[1]], imputed[[2]], etc.......are quite different
from the original input file, and different from each other.
-- The input file is a data frame (1044 x 487). with no header.
-- Output files:
imputed[[ 1]] 1044 x 2435 numeric; looks
like imputed values
NOTE: 2435 = 5 * 287...
imputed[[ 2]] 1 x 1 "5"
imputed[[ 3]] TRUE /
FALSE
imputed[[ 4]] 483 x 2415 numeric, does
NOT look line imputed values
NOTE: 483 = number of IVs minus 4;
Data set includes 3 identity variables, 1 time series var, 1 cross-section
var
imputed[[ 5]] 483 x 5 numeric,
does NOT look lile imputed values
These output files raise the following comments/questions:
(1) Contrary to the info in the user guide, the output files extracted from
output[[i]] do not match the format of the input file.
(2) Does whether or not the input file has a header row (variable names)
affect how Amelia works?
(This question may be an artifact of my lack of understanding about working
with data frames... But if you read in the output csv file and compute
nrow(file), the result is one less than the number of rows actually in the
csv file.
(3) Is the first output file [[1]] the 5 sets of imputed data?
(4) I have no idea what the other files are... Are they for diagnostics?
Thanks,
Wayne
SUBMITTED BY: Wayne A. Thornton
Harvard Univ.
thornton(a)fas.harvard.edu
781-492-3131
<http://1429236.signature1.mailinfo.com/confirm2.6/0205010E/0202054D/0B034F0
5/13137013.jpg>
PROBLEM: The Amelia functions summary.Amelia() and compare.density()
apparently report the fraction of missing values (expressed as a decimal).
However, these values are identified as "Percent
Missing" in...
-- output of summary.Amelia()
-- legends on plots generated by compare.density()
EXAMPLE from my data set:
summary.Amelia() my computation
Tension_avg_vics
0.01149 1.149
Tension_avg_vics_no_zeros
0.01245 1.245
Tension_bads_count
0.01245 1.245
Tension_bads_div_tokens
0.01245 1.245
Tension_distrusts_count
0.02011 2.011
....
.... ....
Wayne A. Thornton
thornton(a)fas.harvard.edu
<http://1429236.signature1.mailinfo.com/confirm2.6/06060308/05010F45/0F034D0
3/97211729.jpg>