TrafficMagnet can help you achieve a top ranking on more than 300,000
search engines and directories.
Are you looking for new ways to promote lists.hmdc.harvard.edu?
TrafficMagnet can help you achieve a top ranking on the search engines.
1) Want to Grow Your Online Business?
TrafficMagnet will submit your site to thousands of search engines
every month.
2) Want to Increase Your Web Traffic?
TrafficMagnet can help you drive more targeted traffic to your site.
3) Need Affordable Online Advertising?
With TrafficMagnet you will save time and costs while you get the
results you are looking for.
Click at
http://p1j2m3a4.pdhost.com/pdsvr/www/r?1000001189.1751.9.KYDNOKN+P+E2Ra
to sign up now!
TrafficMagnet provides a powerful search engine marketing service to
thousands of customers world wide. Click at
http://p1j2m3a4.pdhost.com/pdsvr/www/r?1000001189.1751.11.Vp07EQwVofC57c
for More Information!
This email was sent to amelia(a)latte.harvard.edu.
We apologize if this email has reached you in error. We honor all removal requests.
Please go to the link below to be removed from our mailing list.
http://p1j2m3a4.pdhost.com/pdsvr/www/optoutredirect?UC=Lead&UI=10532743
On Thu, 13 Mar 2003, Julia Lynch wrote:
> Gary,
>
> Just realized there is a complicating factor:
> One of my independent variables is itself an imputed variable, the result of
> a 2SAIV process (which imputes a brand new variable into your working
> dataset using information shared by you working dataset and an outside
> dataset). Under ordinary circumstances, I can just bootstrap the standard
> error of the coefficient on the new variable. But if I'm also doing
> multiple imputation of missing data, that suddenly makes the calculation of
> the standard error on that coefficient more complicated...
> I guess what I'm wondering is if the SEs generated by the MI process that
> Amelia uses will be radically different from what I would have gotten by
> bootstrapping?
interesting... well, I'd do multiple imputation on all the variables
other than the one that 2saiv will save you from. then run 2saiv as an
analysis procedure. alternatively, you could stack the 2 data sets and
use amelia to impute the entire variable and skip 2saiv. i don't think
you need bootstrapping here.
> Hmmm. More fodder for the listserv, I guess.
indeed; great questions!
Gary
>
> Julie
>
> Julia Lynch
> Assistant Professor
> Department of Political Science
> University of Pennsylvania
> 202 Stiteler Hall
> Philadelphia, PA 19104
> tel 215 898 4240
> fax 215 573 2073
> email jflynch(a)sas.upenn.edu
>
On Wed, 12 Mar 2003, Julia Lynch wrote:
> Gary,
>
> I'm stumped. Can I ask what you would do in the following situation?
>
> I have an index composed of 4 opinion items. For each of the 4 items, I
> have DK responses from 5-20% of respondents, but typically the DK is on only
> 1 of the 4 items (i.e. DK responses are not highly correlated across the
> items of the index). Do I:
>
> a. delete listwise and lose 20% of my respondents, losing efficiency and
> introducing massive bias (my DK respondents are poor, female, low educ, low
> political interest, etc.) -- clearly not my preferred option
>
> b. Impute the missing data for each item and construct the index using the
> imputed values. I wouldn't normally want to impute for an opinion variable,
> but if the respondent was able to answer 3 other closely related questions,
> why should I believe that s/he couldn't also answer the fourth? But does
> EMis still work if I've transformed the imputed data eg. by smushing it into
> an index?
>
> c. impute the index score. Again, some of these DKs are legitimate, but
> others aren't, and this would get around the issue of transforming the
> imputed data. But it would also mean throwing out information from the
> three items that the respondent DID answer. (OK, not throwing out, because
> I'd use that info to impute, but still...)
>
> d. compute the index score for respondents with one DK out of 4 items by
> setting the value of missing items at the mean of the remaining items. This
> seems to me to be taking less than full advantage of the other information
> in the dataset about how these people might have responded.
>
> e. treat each item separately as a categorical variable. Messy and not
> nearly as much fun as working with this index.
>
> I'm not crazy about any of these choices. What do you think? Any advice
> much appreciated...
>
> Julie
>
Great question
definitely b. you will add a lot of power to the imputation model by
using three observed answers to impute a fourth. much better than using
the index and having to assume that all 4 (i.e., the index value) is
either missing or biased because its now based on only 3 of the questions.
You can always smush or do whatever you think is appropriate with the
simulations/imputations, such as creating an index.
Gary
: Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu :
: Center for Basic Research Direct (617) 495-2027 :
: in the Social Sciences Assistant (617) 495-9271 :
: 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
: Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 :
>
>
>
> Julia Lynch
> Assistant Professor
> Department of Political Science
> University of Pennsylvania
> 202 Stiteler Hall
> Philadelphia, PA 19104
> tel 215 898 4240
> fax 215 573 2073
> email jflynch(a)sas.upenn.edu
>
On Wed, 5 Mar 2003, alison holman wrote:
> Dear Dr. King, I was given your name by the statistical consultant on
> our grant b/c I am trying to figure out the best approach to handling
> the missing data in our study. I am the data analyst for a study
> addressing the health (mental and physical) consequences of the 9-11
> terrorist attacks. Our study is a longitudinal survey of adults
> (nationwide random sample, repeated measures over time). We have data
> collected at 4 time points (9-14 days, 2 mo, 6mo, 12 mo post attacks).
> Some respondents have only 2 time points, others have responded at all 4
> time points.
>
> I am writing you b/c I am struggling with learning the best way to
> deal with the missing data issue. I would like to make the most of
> these data, as this is the most interesting and richest dataset I have
> had an opportunity to work on. We have health data collected pre-9-11
> with approximately 12-19% missing. After reading through a few papers,
> I have realized that I still may be able to do MI *even though* I
> suspect the data are not really MAR. However, since I cannot directly
> test the MAR assumption, I am not sure how exactly to proceed. I have
> been trying to identify potential biases in the missing value patterns
> using SPSS. I have identified that the missing health data are
> associated with being younger.....a rather strong association. The
> older folks are 80% less likely to have missing health data than the
> 18-30 yr olds. Given these differences, I was considering imputing
> values *within* age categories, using the other demographic data I have
> available in the dataset as well.
> I have looked at the descriptions of the programs you offer on your
> website and I am not sure which of these programs would be best to use
> for my purposes. The health data I have are *completely categorical*
> (never diagnosed, self-diagnosed, md diagnosed) ailments. But as I said
> earlier, my dataset is a national probability sample with oversampling
> in 4 communities, and with repeated measures on each participant over a
> year. I also have post-stratification weights that I need to use for my
> analyses. Given that I have complex survey data, what would you
> recommend vis a vis:
>
> (a) is one of the MI programs a reasonable and valid way to solve my
> missing data problem?
> (b) which (if any) of the programs would you recommend for me to use for
> imputing my missing values?
Have a look at Amelia. It is not designed specifically for sample
attrition (which is your problem), but it has been used for that problem.
> (c) are there special considerations I need to be aware of for complex
> survey data?
not really. You can try to include the sample weights as a (fully
observed) variable in the analysis.
> (d) do I need to use weights when doing the imputations? If so, can
> Amelia accommodate them?
since you're not computing causal effects as part of this first stage
(imputation) analysis, you don't need them as weights. but if they're not
functions of other variables in your analysis, you might control for them
as above.
> (e) I intuitively (and perhaps naively) think I should use the
> individual items rather than a scale score when imputing values, am I
> right? (I have had some stats people advise me to impute at the scale
> score level, but that seems to me to compound any potential biases there
> may be in the data)...
you're absolutely right. the only qualification is if the individual
items are always missing when one is missing, then you might as well use
the scale score. since this is not normally the case, your intution is
right.
> (f) I am a beginner at using STATA--can Amelia run in STATA and can you
> refer me to someone or some article that describes how to use it in
> STATA?
Amelia is a stand-alone program. but it produces imputations that you can
use in another program like Stata. If you use Stata, I'd suggest you use
Clarify (also at the same web page), which will automatically combine the
separate imputations.
> Finally, I have a more general, philosophical question about doing
> imputation. Please pardon my naivete about this stuff, but I am a
> novice at doing this....and I am very serious about wanting to do the
> right thing with these data. I understand that there is some debate re
> whether it is considered legitimate to impute DVs in a dataset...yet the
> variables I am hoping to impute are going to be both IVs and DVs in
> different theoretical analyses (the health data). Many of my colleagues
> are uncomfortable imputing DVs--given the nature of my dataset, is this
> something I should avoid? I understand that if I don't impute I could
> introduce bias simply by deleting cases or using the mean, but when the
> data are not MAR, other than having a rich mix of variables to use in
> making the imputations, what other precautions do you recommend?
There is some misunderstanding about this issue, but I don't think there
is any real disagreement. You should certainly include the dependent
variables. Omitting them will cause bias. Since the imputations are
drawn from the posterior, there is no endogeneity bias.
> Thank you in advance for your expert advice.....I really appreciate
> your help with this!
Good luck! Sounds like a great project.
Gary
: Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu :
: Center for Basic Research Direct (617) 495-2027 :
: in the Social Sciences Assistant (617) 495-9271 :
: 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
: Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 :
>
> Sincerely,
>
> Alison Holman, Ph.D.
> Professional Researcher
> Center for Health Policy and Research
> University of California, Irvine
> Irvine, CA 92697
> (949) 824-6849 (phone)
> (949) 824-3002 (fax)
>
Hello,
I am trying to use imputed data with Clarify, but I am getting an
error message in Stata that says "Obs. nos. out of range" when it
tries to simulate sigma-squared. Here is what Stata is telling me:
. clear
. use imp1
. estsimp reg y x1 x2 x3 x4 x5 x6, mi(imp)
Estimation number : 1 of 5
Dataset being used: imp1
Simulating main parameters. Please wait....
% of simulations completed: 14% 28% 42% 57% 71% 85% 100%
Simulating sigma-squared. Please wait
Obs. nos. out of range
r(198);
.
Any suggestions on what might be happening and how to fix it?
Thanks!
Strom
--
Strom C. Thacker
Susan Louise Dyer Peace Fellow
Hoover Institution
Stanford University
Stanford, CA 94305-6010
650-725-3432
650-723-1687 (fax)
sthacker(a)bu.edu
http://www.bu.edu/sthacker
Looking for some suggestions here -
I've got NES data (2000 Presidential election to be exact) merged with
crime rate data (county level) from the FBI. I'm imputing 48
variables (10 fully observed) of which two are missing @ 50% (n =
1768). I have already transformed all the variables to be as close to
normal as possible a'la the suggestions in Honaker, etal (2001), I'm
using a ridge prior of 6, and I've increased the _AMsn global to 100.
Despite all this Amelia still crashes during stage 3 (Importance
Sampling) and reports the error message :
There are insufficient valid draws from the approximating
distribution so the program will end. This error may occur
for a number of reasons including severe missingness and
data which badly violates the assumptions of the imputation
model. The user should be able correct the problem by
increasing the draws from the approximating distribution
(see _AMsn global); by transforming the variables to more
closely meet the model's distributional assumptions;
by using or increasing the strength of the prior; by using
the t-distribution for the approximating distribution (see
_AMst global); and/or adjusting the _AMsfac global.
Press any key to exit
Currently active call: RNDISMP [154]
Now, I don't think the problem is the severe missingness because I was
able to get Amelia to run on this data before I merged the crime data
into the set (note: this was before I learned that scales, etc. require
both the scale and it's components to be entered - thus the old dataset
only @ 20 variables in it.)
My question is, is it worth my time to keep increasing the _AMsn global
(how high should you go?) or should I just bite the bullet and start
playing with the _AMst and _AMsfac globals? If I change these, how
much am I compromising the robustness of my imputation model?
Matthew Vile,
University of New Orleans
Hi,
I thought you might be interested in getting in-depth knowledge
about your web audience and web traffic patterns in a reliable
and cost-effective way.
Stop Guessing - Start Knowing!
- CoolStats measures web site traffic and online behavior of
your visitors.
- CoolStats will help you understand how to optimize your site
to meet the needs of your visitors.
- You get access to detailed, real-time statistical analysis of
your web pages - 24 hours a day. Click at
http://www.coolstats.com/viewdemo/index.html to view Online Demo.
- CoolStats is the ultimate real-time tracking solution for
small and mid-sized businesses.
- 100% accuracy by measuring activity at the client, not via
server based log files.
- The fee of $24.95 is minimal compared to what it would cost
you to run a tracking service yourself!
Why CoolStats?
- no programming to do
- no servers to maintain
- no software applications to install
Click at
http://p1j2m3a4.pdhost.com/pdsvr/www/r?1000049338.1586.15.ECi94Z4o6mlGnw
to Sign Up now!
"We needed to make business sense out of our web visitor behavior -
CoolStats delivers first-class graphical reports that help us
continuously improve and optimize our website to match the
requirements of our target audience." BRYAN KASHILIN, BOSTON
Click at http://www.coolstats.com/product/customerref.html to
check what other customers say about us!
For more information about our website tracking services,
please visit our website or contact me directly at the below
email. I look forward to hearing from you soon.
Best regards,
Helen Baker
CoolStats Support
Email: helen_baker(a)coolstats.com
Http://www.coolstats.com
Don't be the last one to know!
-----------------------------------------------------------------------
This message has been brought to amelia(a)latte.harvard.edu.
If you do not wish to receive anymore emails, please follow
the opt-out instruction below. We apologize for any inconvenience.