Amelia July 2009

amelia@lists.gking.harvard.edu

6 participants
7 discussions

by Milan Svolik

Dear Amelia users/authors: I was able to impute CSTS data with logical bounds before the latest update to Amelia. I was wondering whether anyone else experienced similar problems with the latest update (maybe I am just doing something wrong...) Thanks for any help, best, Milan --- Milan Svolik Assistant Professor Department of Political Science University of Illinois at Urbana-Champaign https://netfiles.uiuc.edu/msvolik/www/ - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 9 months

Fwd: [amelia] MI: Multiple surveys

by Victor Mauricio Herrera

The actual reference to the paper I mentioned in my previous email is: Andrew Gelman; Gary King; Chuanhai Liu. Not Asked and Not Answered: Multiple Imputation for Multiple Surveys. Journal of the American Statistical Association, Vol. 93, No. 443. (Sep., 1998), pp. 846-857. Thanks, Victor Herrera MD, MSc.

14 years, 9 months

MI: Multiple surveys

by Victor Mauricio Herrera

Hello Amelia users: I am working with a pool of surveys and I want to impute missing values in the pooled dataset while keeping the design variables and re-calculated weights (and the variables from which those weights were derived). From the paper by King & Liu (1998) on multiple imputation for multiple surveys now I know that a hierarchical approach to this problem is the appropriate one; however, after reading the documentation of the software (Amelia II) I am not sure whether this task can be accomplished. I will appreciate your help on this issue. Thanks, Victor Herrera MD. MSc. - Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia More info about Amelia: http://gking.harvard.edu/amelia

14 years, 9 months

Broken Functionality in Amelia package for R (Version 1.2-9, built: 2009-07-02)

by Will Bullock

Hello, I believe the latest update to the Amelia package for R has inadvertently broken some essential functionality. Using a sample dataset available here<http://www.princeton.edu/%7Ewbullock/sampleData.RData>, I'm using the following code: library(Amelia) load("sampleData.RData") test <- amelia(sampleData, idvars=c("st"), noms=c("female","imports"), ords=c("age", "edu", "inc")) On one computer using Version 1.2-2, built: 2009-04-27 available via the CRAN archives, everything goes well. On a second computer using Version 1.2-9, built: 2009-07-02 I get: Error in sum(sapply(x[, fact], is.factor)) : invalid 'type' (list) of argument It appears as though amelia is having trouble taking in a vector of column names, but I must admit I haven't done extensive testing of the problem. Any help would be much appreciated. Sincerely, --Will Bullock Department of Politics Princeton University

14 years, 9 months

Amelia: Guidance on setting parameter "empri" is somewhat confusing [WAT issue #5]

by Wayne Thornton

PROBLEM: (1) The guidance on setting the parameter "empri" in the user manual is not consistent, and thus might be confusing. (2) There is a small typo in the text addressing the parameter "empri" in the User Manual and the "Amelia" file in the R package BACKGROUND: 1. Inconsistent Guidance (a) User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package (at \ library \ Amelia \ help \ Amelia, which I read as a Windows *.txt file...) states" empri: number indicating level of the empirical (or ridge) prior. This prior shinks the covariances of the data, but keeps the means and variances the same for problems of high missingness, small N's or large correlations among the variables. Should be kept small; a reasonable upper bound is around 10% of the rows of the data. (b) User Manual, sec. 5.6.1 (p.21) reads: "A recommendation of 0.5 to 1 percent of the number of observations, n, is a reasonable starting value, and often useful in large datasets to add some numerical stability. For example, in a dataset of two thousand observations, this would translate to a prior value of 10 or 20 respectively. A prior of up to 5 percent is moderate in most applications. For our data, it is easy to code up a 1 percent ridge prior: > a.out.time2 <- amelia(freetrade, ts = "year", cs = "country", + polytime = 2, intercs = TRUE, p2s = 0, empri = 0.01 * + nrow(freetrade))...." Since the example in sec.5.6,1 uses a value equal to 1% of the number of rows of data, I have favored this interpretation... (My experimenting indicates that using a value up to 5% of the number of rows of data works better than trying to use a value of 0.1 to 1% or up to 5% of the number of observations.) 2. Typo The User Manual, sec. 7.2 (p. 51) and the file "Amelia" in the R package have the same typo: "shinks" instead of "shrinks"... RECOMMENDATIONS: 1. Recommend that the "Amelia" file in the R package and both sections of the User Manual reflect the best guidance, and be consistent. 2. Fix the little typo identified in 2. above Wayne A. Thornton thornton(a)fas.harvard.edu <http://1429236.signature1.mailinfo.com/confirm2.6/0403020B/0003074A/0D004C0 0/65702201.jpg>

14 years, 9 months

FW: [amelia] Does whether or not the input file has a header row (variable names) affect how Amelia works? [WAT Issue #2]

by Wayne Thornton

Please disregard all of the issues/questions I raised in my email below, EXCEPT for one: Q: Does whether or not the input file has a header row (variable names) affect how Amelia works? Matt Blackwell's response to my first issue ( subj: Amelia for R produces no imputed data output files [WAT Issue #1] ) resolved the other issues in my earlier email below.) I changed the subject line of this message accordingly... DISCUSSION: It seems that Amelia (and AmeliaView) assume that the input data set has a header row. However I cannot find any discussion in the documentation to confirm this. I have observed the following: -- When I write the data.frame to a csv file to be read by AmeliaView... if the csv file has no header row, then in AmeliaView -> Summarize Data -> "Missing: x / [total]"... The "total" listed is one less than the rows actually in the data set. -- When I pass the data.frame to Amelia for R directly, it doesn't seem to have this problem. To prevent any problems of this nature, should Amelia and AmeliaView have an input parameter telling it whether or not the input data set has a header row? Wayne Thornton thornton(a)fas.harvard.edu _____ From: owner-amelia_at_lists_gking_harvard_edu(a)mail.hmdc.harvard.edu [mailto:owner-amelia_at_lists_gking_harvard_edu@mail.hmdc.harvard.edu] On Behalf Of Wayne Thornton Sent: Sunday, June 28, 2009 16:24 To: amelia(a)lists.gking.harvard.edu Subject: [amelia] Amelia output extracted from output[[ ]] looks odd [WAT Issue #2] RE: Amelia output extracted from output[[ ]] looks odd [WAT Issue #2] PROBLEM: After running Amelia to generate 5 imputed files, the output files extracted using output[[ ]] look odd.... BACKGROUND: Here is my command line to run Amelia: ******************* CONTROL PANEL ******************* impruns <- 5 tolX <- 0.0001 empriX <- 100 autopriX <- 0.05 resampleX <- 100 *************************** CONTROL PANEL ******************* imputed <- amelia(DATA8i, m = impruns , p2s = 2 , idvars = c(3,4,5) , ts = 1 , cs = 2 , polytime = NULL, startvals = 0 , tolerance = tolX , noms = nomIV8i , ords = ordIV8i , incheck = T , collect = F , outname = "DATA8imp", write.out = T , archive = T , keep.data = T , empri = empriX , autopri = autopriX , bounds = IVlims, max.resample = resampleX ) After a run I am able to extract output info from... imputed[[ ]] The user guide (p.27, under "Output") says. "...you can refer to any of the datasets by referencing output[[i]], where i is the number of the dataset you wish to reference. These datasets will be returned in the same format which you passed them...." However, the files imputed[[1]], imputed[[2]], etc.......are quite different from the original input file, and different from each other. -- The input file is a data frame (1044 x 487). with no header. -- Output files: imputed[[ 1]] 1044 x 2435 numeric; looks like imputed values NOTE: 2435 = 5 * 287... imputed[[ 2]] 1 x 1 "5" imputed[[ 3]] TRUE / FALSE imputed[[ 4]] 483 x 2415 numeric, does NOT look line imputed values NOTE: 483 = number of IVs minus 4; Data set includes 3 identity variables, 1 time series var, 1 cross-section var imputed[[ 5]] 483 x 5 numeric, does NOT look lile imputed values These output files raise the following comments/questions: (1) Contrary to the info in the user guide, the output files extracted from output[[i]] do not match the format of the input file. (2) Does whether or not the input file has a header row (variable names) affect how Amelia works? (This question may be an artifact of my lack of understanding about working with data frames... But if you read in the output csv file and compute nrow(file), the result is one less than the number of rows actually in the csv file. (3) Is the first output file [[1]] the 5 sets of imputed data? (4) I have no idea what the other files are... Are they for diagnostics? Thanks, Wayne SUBMITTED BY: Wayne A. Thornton Harvard Univ. thornton(a)fas.harvard.edu 781-492-3131 <http://1429236.signature1.mailinfo.com/confirm2.6/0205010E/0202054D/0B034F0 5/13137013.jpg>

14 years, 9 months

Amelia: summary.Amelia() and compare.density() report fraction (decimal) missing, but labeled as "Percent Missing" (WAT issue #4)

by Wayne Thornton

PROBLEM: The Amelia functions summary.Amelia() and compare.density() apparently report the fraction of missing values (expressed as a decimal). However, these values are identified as "Percent Missing" in... -- output of summary.Amelia() -- legends on plots generated by compare.density() EXAMPLE from my data set: summary.Amelia() my computation Tension_avg_vics 0.01149 1.149 Tension_avg_vics_no_zeros 0.01245 1.245 Tension_bads_count 0.01245 1.245 Tension_bads_div_tokens 0.01245 1.245 Tension_distrusts_count 0.02011 2.011 .... .... .... Wayne A. Thornton thornton(a)fas.harvard.edu <http://1429236.signature1.mailinfo.com/confirm2.6/06060308/05010F45/0F034D0 3/97211729.jpg>

14 years, 9 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia July 2009