Hello,
My question is not about how Amelia works, but when should I use it in my
analysis.
I am analyzing survey data, and some variables I intend analyze are derived
from other ones. An example: I want to create, for instance, a
single variable (say, an index) of political engagement using information
from variables about different political activities -- let's suppose I want
to create an additive index of political activism summing five variables of
political activities, where 1 = the respondent took part in the activity, 0
= s/he didn't take. However, I have missing cases in these five variables I
would like to sum. Let's also suppose I don't have prior justification to
attribute either 0 or 1 to the missing cases. This way, when I add up these
five variables, I will also have some missing cases in my index.
Trying to eliminate these missing cases, how should I proceed? Should
I impute data in my original variables and, then, create my index (adding up
the already "complete" cases)? Or should I create the index using variables
with missing case and impute data later (that is, imputing data in the final
index)?
I am using an additive index as example, yet I believe this question might
also apply to other techniques as factor analysis and so one. Should missing
data be imputed before or after of this kind data processing?
Thanks,
Fabricio Fialho
-- web: sites.google.com/site/fabriciofialho/
Hi,
I would like to ask a question regarding a dataset with survey data in which
each person has an id. There were different objects evaluated with the same
set of categorical variables. Thus a person occures in the data more times
and thus more rows have the same id. Should I set this variable as ids =
"id" or cs="id". Or do you think it is ok to treat this id as a normal
variable, i.e. not to make any special settings? Thanks.
Best regards,
Tomas
I often work with survey datasets for which n<p(p+3)/2 but where the
number of variables with missing data is a lot less than n. My
intuition tells me that in these cases, there is no reason to impute
from a model with p(p+3)/2 parameters. It seems excessive and
requires that we fiddle with dropping variables. I know we could
ridge the covariance matrix, but I'd like to bracket that for now to
consider the question of whether we really need those p(p+3)/2
parameters in the first place.
As an example, suppose we have data with 4 variables, A,B,C, and D.
We have complete data for A and B. C and D exhibit arbitrary
missingness (maybe monotone, maybe not). In this case, a full joint
normal approach such as Amelia's algorithm would estimate parameters
for a 4-dimensional normal distribution---that is, 4 means and 10
covariances, or 14 parameters. Suppose as an alternative, we impute
using a bivariate normal model, where C and D are modeled as bivariate
normal, with means that are regressions onto A and B, and conditional
on A and B, C and D covary according to Cov(C,D) not zero. This
distribution is thus characterized by 4 regression coefficients,
Var(C), Var(D), and Cov(C,D)---that is, 7 parameters (i.e. half as
many as in the full joint model) . Is there any reason that the
second, bivariate, approach wouldn't be preferred? (If so, the
obvious follow-up is, why are we trying to build more model than we
need?)
Thanks for any illumination!
Cyrus
--
Cyrus Samii
Political Science
Columbia University
cds81(a)columbia.edu
Burundi Survey: www.columbia.edu/~cds81/burundisurvey/
ISERP Statistical Consulting:
www.iserp.columbia.edu/services/statistical_consulting.html
Comparative Political Economy Blog: cpecolumbia.blogspot.com
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Hi
I am new to multiple imputation and so my question is apparently very basic. But still: So far I was unable to find any examples showing
how to report results based on, say, 5 imputated datasets. How should one report the results -
*simply* take the average for each regression coeff./s.e. across all imputed datasets?
Many thanks
Jan
________________________________________________________________
Neu: WEB.DE Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://produkte.web.de/go/02/
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
On 09/02/2009 12:20 AM, Donald Braman wrote:
> I can't seem to subscribe to the MatchIt list, so I'm posting here in
> hopes that the same people read both.
>
> I'm working with multiply imputed data from Amelia & wondering if any
> progress has been made at integrating Amelia, MatchIt, and Zelig -- in
> particular, pre-processing multiply imputed data and preparing them
> for use in Zelig.
>
> Not at all pressing, and (again) apologies for posting here rather
> than the MatchIt list.
>
> Cheers, Don
At the moment, MatchIt doesn't allow missing data on input (unless you
temporarily code it as observed and exact match on the missing values).
In part, this is because most matching algorithms don't have procedures
to deal with missingness. An exception is CEM which has a procedure for
multiply imputed data, but to access that you'll need to use the (R or
Stata) version of CEM directly (see http://gking.harvard.edu/cem)
instead of the one in MatchIt.
(separate message coming to fix your matchit subscription...)
Gary
---
Gary King
Albert J. Weatherhead III University Professor
Director, Institute for Quantitative Social Science
Harvard University, 1737 Cambridge St, Cambridge, MA 02138
http://GKing.Harvard.Edu, King(a)Harvard.Edu
Direct 617-495-2027, Assistant 495-9271, eFax 812-8581
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
On this point Frank, I would suggest that any information you generate via your imputation process is dependent on the other information you have available in the imputation model. For example, if I wanted to impute a variable that would be used as a predictor in regression, and I have a lot of salient information about my respondents (associated attitudes, demographrics etc) that helped me predict this variable accurately, than my imputation process offers a substantial amount to my final estimation of that value. That does not differ when you are imputing a nominal or ordinal variable.
Gary has tools available in amelia to evaluate the models performance. Using the density comparisons and overimpute offers this validation. Linking overimpute to the the datamining literature, you could look at the use of 'holdout cases' as a method of model validation that they typically use.
Thanks Paul
> Gary King <king(a)harvard.edu> wrote:
>
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia