we've done some runs like that. Glad its worked for you.
Its just that after 40 vars, users run out of patience!
(One person told me they ran it with 650,000 observations)
Gary
On Thu, 19 Sep 2002, bob wrote:
Gary,
I don't know whether you keep track of these things, but I managed to impute
a data set containing 47 variables and 21,000 cases. It did take almost 12
hours though (running on my laptop overnight). Is this out of the ordinary?
The guidelines has seemed to suggest that 40 variables was pretty much the
outer limits of the program.
Bob
Robert Mattes
Democracy in Africa Research Unit
University of Cape Town
Afrobarometer
----- Original Message -----
From: "Gary King" <king(a)harvard.edu>
To: <adamt(a)who.int>
Cc: <evansd(a)who.int>nt>; "Amelia Listserv" <amelia(a)latte.harvard.edu>
Sent: Thursday, September 19, 2002 3:03 PM
Subject: [amelia] Re: A question about using Amelia in a prediction model
pls see below...
On Thu, 19 Sep 2002 adamt(a)who.int wrote:
>
> > Dear Dr King, I am currently writing a paper on the work that I have
been
> > doing using Amelia, and for which your
team and yourself had provided
me
> > with great help in using the software.
> >
> > This work involves the development of a model to predict hospital unit
> > cost (the dependent variable) using a set of explanatory variables-
for
> > which missing values have been imputed
by Amelia. I have been trying
to
> > find out from the economic literature
on multiple imputation what kind
of
> > tests were used to discuss the
goodness of fit of the models that
were
> > developed after imputation-i.e., other
than the adjusted R squared or
the
> > F statistics, for example, which cannot
be computed from the average
> > equation. I was not able to find any reference to model validation
from
my
> > search and I am not sure how I can
defend my model or provide a way to
> > validate it especially that I do not have a gold standard to compare
my
> > results with. I would very much
appreciate if you could let me know of
any
>
references or have any suggestions about ways to convince an economic
> readership that the average equation fits well.
any quantity (including R^2 if that makes sense to you) can be computed.
You follow the same rules for combining these quantities across the
multiply imputed data sets as for any other quantity. see
http://gking.harvard.edu/amelia/node3.html
I'd add that I don't find R^2 of much use (see
http://gking.harvard.edu/files/abs/mist-abs.shtml and
http://gking.harvard.edu/files/abs/truth-abs.shtml for example) and would
instead look at some plots, such as the residuals by yhat or by some of
the X's. For the latter, you can look at the scatterplot of all M
datasets together (the nonmissing points will be plotted on top of one
another of course; the others will spread out).
> >
> > Another related question: I used the ado file prepared by Kenneth
Scheve
> > to estimate the combined beta
coefficients and standard errors from
the
> > five data sets generated by Amelia. The
STATA output does not provide
the
> > root mean square error which I need to
estimate the fundamental
> > uncertainty around the predicted values. Does this mean that using the
> > multiple imputation technique can only allow for parameter
uncertainty?
> and
if so, is there a way to justify this in the paper?
no, multiple imputation includes both fundamental and estimation
uncertainty. If you want this quantity, you can compute it for each data
set and combine them as with other quantities.
>
> I would very much appreciate your input on these two questions.
best of luck with your research.
Gary
: Gary King, King(a)Harvard.Edu
http://GKing.Harvard.Edu :
: Center for Basic Research Direct (617) 495-2027 :
: in the Social Sciences Assistant (617) 495-9271 :
: 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
: Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 :
> >
> > Best regards, Taghreed
> >
> >
> > >-----Original Message-----
> > >From: Gary King [mailto:king@harvard.edu]
> > >Sent: Tuesday, 19 March 2002 19:21
> > >To: adamt
> > >Cc: evansd; James Honaker; Kenneth Scheve
> > >Subject: Re: problem with Amelia
> > >
> > >
> > >
> > >why don't you first see whether Amelia (through DataLoad) loaded the
data
> > >in properly. Load it in and look
at the descriptive statistics to
make
> > >sure. If that doesn't work,
try the new version of Amelia which has
a
> > new
> > >Dataload incorporated. If that doesn't do it, you can save the data
in
> > >ascii, and load it into Amelia that
way, which we know always works.
> > >It sounds like that is the issue, but let me know if not. I'm CCing
my
> > >coauthors in case they have other
ideas.
> > >
> > >Gary
> > >
> > > : Gary King, King(a)Harvard.Edu
http://GKing.Harvard.Edu :
> > > : Center for Basic Research Direct (617) 495-2027 :
> > > : in the Social Sciences Assistant (617) 495-9271 :
> > > : 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
> > > : Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 :
> > >
> > >On Tue, 19 Mar 2002 adamt(a)who.ch wrote:
> > >
> > > >
> > > > > Dear Dr King,
> > > > >
> > > > > My name is Taghreed Adam and I am working in WHO in Chris
Murray's
> > > > > cluster. Chris asked me
to use Amelia to replace the missing
values
> > in
> > >the
> > > > > dataset that I am working with. This is prior to running a
> > regression to
> > > > > predict unit costs per bed day in hospitals. The dataset I am
using
> > for
> > > > > Amelia includes 21 variables; the log of the unit cost per bed
day (
> > $),
> > >a
> > > > > set of explanatory variables such as log of GDP per capita,
> > occupancy
> > > > > rate (%), Average length of stay (days) etc. There are also
> > variables
> > > > > that describe the nature of the unit cost data - e.g. whether
> > capital
> > > > > costs, drugs, and other incidental costs are included (all
dummies).
> > >Then
> > > > > there are desciptors like the country code,the region code,
whether
> > it
> > >is
> > > > > a public or private hospital (dummy) etc... The total number of
> > >hospitals
> > > > > for which we have observations is 1097. The maximum percentage
of
> > > > > missingness of
observations per variable is 53%, i.e., the least
> > number
> > >of
> > > > > observations I have for any variable is around 600. they are
all
> > >numeric
> > > > > variables.
> > > > >
> > > > > What I first did is to make sure that all variables included in
the
> > >model
> > > > > are normally distributed. The the data is saved in excel
version 2,
> > >with
> > > > > no headings.
> > > > > In Amelia, I set _AMempri option to 1 to control for the high
degree
> > of
> > > > > missingness of some of the variables, I identified fully
observed
> > > > > variables using the
_AMfully and the one nominal variable for
which
> > >there
> > > > > is missing data using the _AMnoms. (The other nominal
variables do
> > not
> > > > > have missing data)
> > > > >
> > > > > What happens when I run Amelia is either that it crashes just
after
> > I
> > > > > specify the input file name or it gives me the following
message:
> > >elements
> > > > > of m can not be zero. I tried to check whether I have any
variable
> > that
> > >is
> > > > > coded as a string variable that might explain this message but
it is
> > not
> > > > > the case. I have no observations or variables that are all zeros
or
> > > > > missing. I do not think
it is a memory problem as my computer
has
> > 550
> > >MHz
> > > > > memory and I do not work with other software while it is
running.
> > > > >
> > > > > I tried to delete some of the variables, e.g., some of the
dummies
> > or
> > > > > those that might be highly correlated with other variables and
I
> > tried
> > >to
> > > > > run it again with a total of 8 variables. I used stata this time
as
> > the
> > > > > type of input file. It started running about 7 hours ago and
is
> > still
> > > > > running but clearly something is wrong as the number of
iterations
> > is
> > >now
> > > > > 115000.
> > > > >
> > > > > I would be grateful if you could give me some advice on what
could
> > be
> > >the
> > > > > source of the problem and what else I can try to do. I would be
> > happy to
> > > > > call you if it will make it easier to discuss. We have
discussed
> > this
> > > > > question extensively with Josh Salomon who has also run out of
ideas
>about
> > > what we can try next.
> > >
> > > I am looking forward to hearing from you.
> > > Yours sincerely,
> > > Taghreed
> > >
> > >
> > > Dr Taghreed Adam
> > > Global Programme on Evidence for Health Policy (GPE) and;
> > > Child and Adolescent Health Department ( CAH)
> > >
> > > World Health Organization
> > >
> > > 20 Avenue Appia
> > >
> > > CH-1211 Geneva 27
> > >
> > > Switzerland
> > >
> > > Tel: +41 22 791 3487
> > > Fax: +41 22 791 4328
> > > office: 3164
> > > e-mail: adamt(a)who.int
> > >
> > >
> >
-
amelia mailing list served by Harvard-MIT Data Center
List Address: amelia(a)latte.harvard.edu
Subscribe/Unsubscribe:
http://lists.hmdc.harvard.edu/?info=amelia
-
amelia mailing list served by Harvard-MIT Data Center
List Address: amelia(a)latte.harvard.edu
Subscribe/Unsubscribe: