Hello
I guess that is a common problem when imputing data, but I am rather
confused by it. The availability (= not-NA-ness) of my survey data looks
like this (-> screenshot). It shows the number of non-NA entries for each
variable in the data frame.
Areas A and C have rather complete answer sets. B is white because these
questions have only been asked conditional on answers given beforehand
(in A). But a part of the white area (in D) -could- be imputed.
How does the imputation process look like? What to do with B? I could
think of two variants that are both more or less unclear to me:
A) Should I cut the dataframe and impute in two steps? But how do I then
reintegrate the results from the first imputation?
i) remove B+D, impute A+C
ii) reinsert B/D and impute D.
B) How do I specify the process for one MI step without cutting?
Thank you very much for your thoughts on this.
Best regards,
Marcus
--
Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp
Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch
On the shoulders of giants? http://science.creativecommons.org
Hello
I am in the process of coming up with an amelia() command for my survey
data, starting with a small subset to get a start and learn in the process.
It is about how to decide whether to specify variables as noms or ords.
When I look at my data (mostly ordered factors), I think I should use
the ords specification (see at bottom) and not "noms", but unfortunately
"ords" gives an error message whereas "noms" only gives warnings. --
Thus, my two questions are:
1) Which specification approach A or B is the correct (or better) one?
Or, more generally, should all factor variables be "noms=" and all
ordered factors be "ords="?
2) What do the warnings or the error message mean? What to do about them?
Thanks for having a look at it!
Marcus
> t<-md[20:1000,1:12] # subset of full data frame
> # First, to give you a feel for the data (output truncated)
> t[1:8,1:10] # to get a feel for the data frame
id camp startyr exp.pjs insp.need insp.foss insp.pcss insp.feat
20 30019 free 1990 5 0 0 0 0
21 30020 <NA> NA 1 0 0 0 0
22 30021 free 2005 1 0 0 0 0
23 30022 both 2004 1 0 0 0 0
24 30023 both 2005 1 0 0 0 0
25 30024 both 2000 <NA> 0 0 0 1
26 30025 open 2004 5 0 0 0 1
27 30026 open 1985 30 0 1 0 0
> str(t, give.attr=FALSE)
'data.frame': 981 obs. of 12 variables:
$ id : int 30019 30020 30021 30022 30023 30024 30025 30026 40001
$ camp : Factor w/ 3 levels "both","free",..: 2 NA 2 1 1 1 3 3 1 3
$ startyr : int 1990 NA 2005 2004 2005 2000 2004 1985 2002 1999 ...
$ exp.pjs : Ord.factor w/ 6 levels "1"<"5"<"10"<"20"<..: 2 1 1 1 1 NA
$ insp.need: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 2 ...
$ insp.foss: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 2 2 1 ...
$ insp.pcss: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 1 1 ...
$ insp.feat: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 2 2 1 2 2 ...
$ insp.bugr: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 2 ...
$ insp.insc: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 2 2 1 ...
$ insp.outc: Ord.factor w/ 2 levels "0"<"1": 1 1 1 1 1 1 1 1 2 1 ...
$ insp.offr: Ord.factor w/ 2 levels "0"<"1": 1 2 1 1 1 2 1 1 1 1 ...
===== A) MI with variant "noms" =====
> # specifying indep. variables as "noms"... works...
> t_noms<-amelia(t, p2s=2, m=5, idvars=c("id"),
noms=c("camp","exp.pjs","insp.need","insp.foss","insp.pcss","insp.feat","insp.bugr","insp.insc"))
> ... but gives warnings (usually 30-50, multiples of ten it seems):
-- Imputation 5 --
setting up EM chain indicies
1(140) 2(0)
saving and cleaning
Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit warnings())
> warnings() # with English translation
50: Bedingung hat Länge > 1 und nur das erste Element wird benutzt in:
if (class(x.orig[, i]) == "logical") x.imp[, i] <- as.logical(x.imp[,
50: Condition has length>1 and only the first element is used in:
if (class...
# Plus: I occasionally get this warning (or error message?) in some
# Imputations:
The resulting variance matrix was not invertible. Please check your
data for highly collinear variables.
===== B) MI with variant "ords" =====
> # specifying vars as ords throws an error message:
> t_ords<-amelia(t, p2s=2, m=5, idvars=c("id"), noms=c("camp"),
ords=c("exp.pjs","insp.need","insp.foss","insp.pcss","insp.feat","insp.bugr","insp.insc","insp.outc","insp.offr"))
amelia starting
Fehler in if (any(unique(na.omit(data[, i]))%%1 != 0)) { :
Fehlender Wert, wo TRUE/FALSE nötig ist
# Missing value, where TRUE/FALSE needed/expected
Zusätzlich: Warning message:
'%%' is not meaningful for ordered factors in:
Ops.ordered(unique(na.omit(data[, i])), 1)
--
Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp
Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch
On the shoulders of giants? http://science.creativecommons.org
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi -
I have a similar question that tdavis had in April of last year. When I set range values using priors option in Amelia View, I still get values outside the range I set. For instance I can have only a range of 0 to 12 for the variable months breastfeeding during one year. So I set a minimum at 0 and max at 12 with 0.99 confidence level. However, I still get negative values and values above 12. Is there anyway to go around this?!
Thanks for your help!
Best,
Anshu
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi,
I am trying to compute results from multiply imputed datasets with Amelia for a
survival analysis model with a log log distribution.
It seems like clarify doesn't support streg, is there any other way with Stata?
Thanks,
--
Marguerite
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
--
Ce message a ete verifie par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a ete trouve.
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hello
The manual says about identification variables: "idvars : a vector of
column numbers or column names that indicates identification variables.
These will be dropped from the analysis but copied into the imputed
datasets."
The "dropping" is not 100% clear to me in an important aspect, namely,
whether idvars are -considered- in the MI calculation process at all. So
is a) or b) the correct answer?
a) The idvars are -used- in the MI process to calculate imputations, but
are themselves not modified (imputed). They are copied to the output
dataset 1:1.
b) The idvars are only copied and -not- (even) considered in the MI
process to calculate imputations.
Thank you,
Marcus
--
Marcus M. Dapp | PhD student | ETH Zurich | www.ib.ethz.ch/people/mdapp
Prof. Thomas Bernauer, International Relations | www.ib.ethz.ch
On the shoulders of giants? http://science.creativecommons.org
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
I'm new to the list and currently trying to install AmeliaView II on my
new Windows Vista machine. I'm currently running on R 2.6, though the
problem occurred when I had R 2.8 loaded as well. I'm a bit of a novice
at both programs, so I'd be more comfortable running the GUI than
running things through R. After I start R and start Amelia, I get the
following message:
Script: C:\Program Files\AmeliaView\lib\invisible.vbs
Line: 1
Char: 1
Error: The system cannot find the file specified
Code: 80070002
Source: (null)
When I looked in Explorer I was able to find this file, so it is there
but for some reason is not being recognized. Any suggestions?
Thanks,
Steve Shewfelt
PhD Candidate, Political Science
Yale University
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
Hi
I am relatively new to R, and now trying Amelia for health indicator data
I have a CSV file created from Excel with the following fields (columns):
ID - autonumber identifier field
Prov - name of the province (geographic area) for use as cross-section -
this seems to work whether it is a string such as province acronym or
whether coded to a number, although it doesn't work if I set the variable
type to 'nominal' only if I leave it as 'no transformation'
Month - this field is currently formatted as ISO dates, but this is being
seen by Amelia (using AmeliaView) as a factor and is not working for doing
time series with cross-section analysis
eg of data
2006-12-01
2007-01-01
2007-02-01
2007-03-01
Value - the actual data values for the indicator as numbers - this is the
field that has some missing data values that I want imputed.
The only way I have been able to get this to work is to get Excel to convert
the Month variable to a number - you then get data that looks like this
39052.00
39083.00
39114.00
39142.00
but I'm not sure if this is the correct way to do it and whether Amelia will
understand the date/time significance.
Otherwise how else must one format date variables so that they are correctly
interpreted?
Regards
Candy
--
Candy Day
HealthLink, Health Systems Trust
http://www.hst.org.za
Mobile: 084 960 9014
Fax: 086 524 9563