Hi Wayne,
Issues with headers are more about file input to R than Amelia per se,
but there is something tricky here that I should explain. When
importing data files into R, functions like read.table(), read.csv(),
etc need to know whether or not there is a header (a line at the top
of the file indicating column names). If there is a header, this
information is put into the column names of the data.frame resulting
from the call.
The only time that this is an issue for Amelia is when using
AmeliaView. When loading a CSV file in AmeliaView, the program assumes
that your file has a header. Thus, it will use the first line of the
data at the column names, whether or not these are actually column
names. This was not stated in the manual before, but I have added it.
It is possible for there to be a difference between the number of rows
when you manually load and when loading through AmeliaView, depending
on how you manually loaded the file.
Cheers,
matt.
On Mon, Jun 29, 2009 at 6:14 PM, Wayne Thornton<thornton(a)fas.harvard.edu> wrote:
Please disregard all of the issues/questions I raised in my email below,
EXCEPT for one:
Q: Does whether or not the input file has a header row (variable names)
affect how Amelia works?
Matt Blackwell's response to my first issue ( subj: Amelia for R produces no
imputed data output files [WAT Issue #1] ) resolved the other issues in my
earlier email below.)
I changed the subject line of this message accordingly...
DISCUSSION: It seems that Amelia (and AmeliaView) assume that the input
data set has a header row.
However I cannot find any discussion in the documentation to confirm this.
I have observed the following:
-- When I write the data.frame to a csv file to be read by AmeliaView... if
the csv file has no header row, then in AmeliaView -> Summarize Data ->
"Missing: x / [total]"... The "total" listed is one less than the
rows
actually in the data set.
-- When I pass the data.frame to Amelia for R directly, it doesn't seem to
have this problem.
To prevent any problems of this nature, should Amelia and AmeliaView have an
input parameter telling it whether or not the input data set has a header
row?
Wayne Thornton
thornton(a)fas.harvard.edu
_____
From: owner-amelia_at_lists_gking_harvard_edu(a)mail.hmdc.harvard.edu
[mailto:owner-amelia_at_lists_gking_harvard_edu@mail.hmdc.harvard.edu] On
Behalf Of Wayne Thornton
Sent: Sunday, June 28, 2009 16:24
To: amelia(a)lists.gking.harvard.edu
Subject: [amelia] Amelia output extracted from output[[ ]] looks odd [WAT
Issue #2]
RE: Amelia output extracted from output[[ ]] looks odd [WAT Issue #2]
PROBLEM: After running Amelia to generate 5 imputed files, the output files
extracted using output[[ ]] look odd....
BACKGROUND: Here is my command line to run Amelia:
*******************
CONTROL PANEL
*******************
impruns <- 5
tolX <- 0.0001
empriX <- 100
autopriX <- 0.05
resampleX <- 100
***************************
CONTROL PANEL
*******************
imputed <- amelia(DATA8i,
m = impruns , p2s = 2 ,
idvars = c(3,4,5) ,
ts = 1 , cs = 2 , polytime = NULL,
startvals = 0 ,
tolerance = tolX ,
noms
= nomIV8i ,
ords = ordIV8i , incheck = T , collect = F ,
outname = "DATA8imp",
write.out = T , archive = T ,
keep.data = T ,
empri = empriX ,
autopri = autopriX ,
bounds = IVlims, max.resample =
resampleX )
After a run I am able to extract output info from...
imputed[[ ]]
The user guide (p.27, under "Output") says.
"...you can refer to any of the datasets by referencing output[[i]], where i
is the number of the dataset you wish to reference.
These datasets will be returned in the same format which you passed
them...."
However, the files imputed[[1]], imputed[[2]], etc.......are quite different
from the original input file, and different from each other.
-- The input file is a data frame (1044 x 487). with no header.
-- Output files:
imputed[[ 1]] 1044 x 2435 numeric; looks
like imputed values
NOTE: 2435 = 5 * 287...
imputed[[ 2]] 1 x 1 "5"
imputed[[ 3]] TRUE /
FALSE
imputed[[ 4]] 483 x 2415 numeric, does
NOT look line imputed values
NOTE: 483 = number of IVs minus 4;
Data set includes 3 identity variables, 1 time series var, 1 cross-section
var
imputed[[ 5]] 483 x 5 numeric,
does NOT look lile imputed values
These output files raise the following comments/questions:
(1) Contrary to the info in the user guide, the output files extracted from
output[[i]] do not match the format of the input file.
(2) Does whether or not the input file has a header row (variable names)
affect how Amelia works?
(This question may be an artifact of my lack of understanding about working
with data frames... But if you read in the output csv file and compute
nrow(file), the result is one less than the number of rows actually in the
csv file.
(3) Is the first output file [[1]] the 5 sets of imputed data?
(4) I have no idea what the other files are... Are they for diagnostics?
Thanks,
Wayne
SUBMITTED BY: Wayne A. Thornton
Harvard Univ.
thornton(a)fas.harvard.edu
781-492-3131
<http://1429236.signature1.mailinfo.com/confirm2.6/0205010E/0202054D/0B034F0
5/13137013.jpg>
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
More info about Amelia:
http://gking.harvard.edu/amelia