I thought this was a sterling example of the importance of duplication:
Some might remember a handful of berkeley students "proving"
electronic voting irregularities in Florida:
http://ucdata.berkeley.edu/new_web/VOTE2004/election04_WP.pdf
It made a lot of press, and convinced a lot of leftist techno-wonks.
Two professors (Drexel and SUNY-Binghamton) decide to have a look at
the numbers themselves, noting:
"As professors who teach statistics and econometrics to undergraduate
and graduate
students, we are always on the lookout for good examples of "what not
to do" so that we may better instruct our students in the responsible
use of statistics. Therefore we have examined the HMCB study with a
critical eye. We conclude that the study is entirely without merit and
its"results" are meaningless."
The rather brutal smackdown, complete with an alternative model is
available here:
http://election04.ssrc.org/research/critique-of-hmcb.pdf
/\llan
Allan Friedman
Doctoral Student, Public Policy
Kennedy School of Government
Hi Olivia,
I have several questions.
Can we use Zelig this time?
As for #2(a), should we include not only the treatment indicator but also intercept, too?
As for #2(c), if we fit separate regressions for control and treatment, treatment can not be an independent variable?
I'm afraid I misunderstand the problem, but could you give me any hint?
How can we output R data frame into another format such as csv?
Thank you in advance.
Kentaro
Thanks for asking, Ian. For 3(a) focus only on the phone and door
treatments. For 3(b), redo 2(*b*) and 2(c). (Not 2(a).) I've put the
changes up on the web -- please refresh your browsers if you've visited
that page before. 8) Yours, Olivia.
---------- Forwarded message ----------
Date: Tue, 7 Dec 2004 01:33:50 -0500
From: Ian Yohai <yohai(a)fas.harvard.edu>
To: 'Olivia Lau' <olau(a)fas.harvard.edu>
Subject: problem 3
Hi Olivia,
On Problem 3, I assume in part (a) you mean "for each of the two potential
treatments," as you say to focus only on phone & visit. Also, for 3(b) does
it make sense to rerun 2 (a) as in 2(a) we assumed that receiving treatment
and intent to treat were the same? (Our matched dataset is only people
receiving treatment and the control group).
Hi, everyone.
I made a slight change to the problem set. Rather than impute 1,000
predicted values for each unit, just impute 100 and calculate 100
treatment effects per unit, which you will average (over the units) into
100 ATEs. This is so we don't bring the servers to a grinding halt. 8)
Have a good weekend!
Olivia.
Thanks for pointing this out, Ian. A corrected solution set has been posted. Note that the point of PS7 was to make you appreciate Zelig -- I sure did when I was done writing up the solution set. 8)
----- Original Message -----
From: Ian Yohai
To: 'Olivia Lau'
Sent: Friday, December 03, 2004 5:23 PM
Subject: solution set 7 q's
Olivia:
Sorry to bother you. I just had a few notes on solution set 7.
On Question 1 (c), I think the FD values for COOP 3-> 4 are incorrect, should be something like 0.21 for the mean first difference, rather than a negative number, and different values for the CIs. When I ran your code, that is what I got, so it might have just been a transfer error.
Also on 1 (c), I think the code had wrong values for the interaction term for the TARGET scenarios. Median(COOP) = 1, so COOP*TARGET should be 1*1, 1*2, 1*3, which equals 1, 2, 3 and not 2,4,6.
Finally, on question 2 (c) (2), I think you have plotted scenarios (1) and (3), rather than scenarios (2) and (3), as PRESINC=1 in both cases in your code. Granted, this makes for a better comparison, but differs from the way the question was asked.
Thanks,
Ian
Hi, everyone.
I posted some stuff to the course web site:
1) the (correct) solution set to PS7
2) the new PS8, due next week in section
I have some hints for you for PS8:
* You can use Zelig for this problem set and your life will be much
easier if you do. See demo(match) for an example.
* Write a function to calculate the ATE and ATT given certain
generalized inputs, such as the formula, model, data, and treatment
indicator. You should use zelig(), setx() and sim() in this high-level
function. This will make your week much easier.
* Per the discussion we had in section, I gave very specific definitions
for the treatment and control groups. Please follow these definitions
even if you disagree with the reasoning.
Finally, the problem set looks long, but that's because most of the
answers this week will be numeric rather than graphical and I can't check
numbers outside of tables, so I have provided the tables for you to fill
in. Please use the same row and column layout in your solution sets!
If you have any questions, please let me know. Thanks!
Olivia.
Good question, Wei.
The key thing to remember for Question 2 part C (1)-(3) is that you should reuse the same values of betahat over and over again. That is: You fit the model once in 2(b). This gives you point estimates for hat{beta} and hat{Sigma}. Now, use these point estimates and simulate tilde{beta} from the multivariate normal distribution. For each subpart in (C), re-use the tilde{beta} with different hypothetical X (e.g., for each value in the ADAACA range you select).
It doesn't matter whether you use logistic regression or probit regression to get the point estimates hat{beta} and hat{Sigma}.
Hope this helps!
Olivia
----- Original Message -----
From: weiha
To: olau(a)fas.harvard.edu
Sent: Thursday, December 02, 2004 12:39 AM
Subject: first different for probit
Olivia,
Part (2) of part (c) in the second homework question, we are asked to draw a graph similar to the one in lecture notes(the one on page 27). Right?
But I have one question: the second step Gary said is to run logistic regression. Does that mean we need to run separate regression for each value of ADAACA(seems weird) or run regression once and get the betahat?
Sincerely,
Wei Ha
PhD Candidate in Public Policy
Harvard University
Fax: 1-801-605-1455
Hi, everyone.
One note about getting expected values and selecting x values: you need to
examine the hypothetical, so for the interaction term, this is the
hypothetical x1 * the hypothetical x2, not the mean or median of x1* x2.
If you're confused, try setx() on on the regressions...
Yours,
Olivia
Hi, guys.
For this week's problem set, please identify and write out the stochastic
and systematic components (as the problem set requests).
Here's a summary of the questions that have gone to the list about this
week's problem set:
For both 1 and 2, you should only simulate parameters once. You should
re-use those draws to calculate the different quantities of interest.
For 1c, change the interaction term when you change the relevant variable.
For 2a1, the systematic component is the standardized normal CDF with mean 0
(not mu) and sd = 1.
For 2, use PRESINC as the incumbency variable. Do *not* include INC in the
regression.
For 2c, remember to change the interaction term along with everything else.
For 2c3, this is the plot Gary went over in class today.
Again, you aren't allowed to use either Zelig or Clarify for this
assignment.
Yours,
Olivia