Quick Question:
I'm trying to figure out what the syntax is for the "na.action="
specification in the zelig() command. What are the possible inputs for
this specification, and what do they do?
Thanks in advance.
-Bernard
-----------------------
Bernard L. Fraga
Ph.D. Student, Harvard University
Government and Social Policy
bfraga at fas.harvard.edu
-----------------------
Dear all,
has anyone bumped into this error message?
Error in qr.default(XX) : NA/NaN/Inf in foreign function call (arg 1)
I get it running a logit in Zelig. Works perfect with one variable but when
I plug a new dichotomous variable it comes up with that.....
Thanks
Charlotte
Hi,
Given the confusion, I wanted to clarify what we want for question 3.
Essentially, we want to make sure that you all understand the logic behind
propensity score matching. This is why we ask you to calculate the
propensity score independent of the matchit() function. The key point of
the wording in question 3 is that your call to matchit() should use the
propensity scores that you estimate outside of matchit() to conduct the
matching. In other words, no matter how you write the formula in matchit(),
the propensity scores that you estimate outside of matchit() should be the
distance metric used in matching. I alluded to this in a few of the prior
emails about this question and apologize if I was being too elusive.
Miya
--
Miya Woolfalk
Ph.D. Student
Harvard University
Government and Social Policy
Hi all,
This is a reminder that we will only be having one section at 8-9pm tomorrow
night. We will be covering some R stuff, multiple equation models, and
multinomial logit in more detail.
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
Despite an earlier email chain on a related subject, the syntax to use matchit() with propensity scores rather than covariates is still not clear to me, if anyone can help. Do you set "distance=" to the propensity score vector? What do you put in the "formula" argument? Thanks!
Hi Folks,
We have decided to cancel the early section time (7pm-8pm) and only hold the
later section (8pm-9pm) for the rest of the semester. Please make note of
this in your schedules.
See you on Thursday from 8-9!
Best,
Miya
--
Miya Woolfalk
Ph.D. Student
Harvard University
Government and Social Policy
Hi all,
Please read Ch. 8 of Unifying Political Methodology for this week.
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
Hi all,
Problem set 7 has been graded. A couple things:
1. The Zero-Inflated Poisson model has various stochastic components. The
first stochastic component is the choice of the data generated process,
either process 1 or process 2. This choice follows a Bernoulli. The second
component is conditional on the choice, Y is either distributed zero or
Poisson. This problem was almost universally missed. Most of you got the
second component but not the first.
2. When drawing your parameters from the multivariate normal, you need to
draw all of them together at once. Some of you drew betas and gammas
separately from different multivariate normals, using subsets of the
variance covariance matrix. This is wrong, as you are discarding the
covariances between parameters. In the same way you don't draw each beta
separately, you shouldn't draw any parameters separately regardless of what
Greek letter they are.
3. In many of your problem sets, there is an excessive use of for loops.
This is very very very very bad coding habit. In general, unless you are
iterating some process, you should not use a for loop because it is the most
inefficient way to do things. I notice that many of you use for loops for
simple vector operations. For example, if you have a vector x, and you want
to multiply every element by 2, you should do:
new.x <- x * 2
and NOT
new.x <- c()
for(i in 1:length(x)){
new.x[i] <- 2 * x[i]
}
This may seem to be a trivial example, but in almost all your problem sets,
there is something exactly like this, especially in log-likelihood
functions. This whole problem set could have (and should have) been done
without a single for loop, so I encourage you all to look at the answer key
and code. In future methods classes and your future work, you need to be
mindful of this and NOT use for loops. Code that takes seconds to run may
end up taking hours because of excessive for loops. I cannot stress this
point enough.
4. Finally, remember not to hardcode anything in your functions, especially
when indicating which elements of your parameter vector are betas and
gammas. For example, don't do this in your log-likelihood functions:
betas <- par[1:6]
gammas <- par[7:8]
Instead do
betas <- par[1:ncol(X)]
gammas <- par[(ncol(X) + 1), (ncol(X) + ncol(Z))]
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
I am trying to work through an issue hopefully the list can help me with.
Oftentimes we are interested in estimating a quantity such as the
"effect" of gender on wages. In one respect this is incredibly easy,
gender is random within known proportions so we can just estimate the
causal effect without controlling for anything (mean of male wages -
mean of female wages). To add any variables would be to induce
posttreatment bias and distort the causal effect of gender on wages.
Plausibly though we probably aren't interested in the causal effect of
gender of wages, but instead the remaining "effect" of gender once we
have accounted for education, experience, race etc. ("the resumes are
the same but the man is paid more" hypothesis). At this point our
"effect" is no longer a causal effect because there is no clear
treatment intervention, but it is a plausible quantity of interest,
lets call it an 'attributable' effect.
So now I raise two questions: 1) Can we still use matching to reduce
model dependence even though we recognize we are not estimating a
causal effect?, 2) Can we estimate multiple attributable effects from
the same model by matching first on one treatment variable (gender)
and then on another (college education)?
King and Zeng (2007) addresses this by suggesting that we estimate
multiple-variable causal effects by moving two variables
simultaneously. While this may help us get at a causal effect, as I
argue above we may not actually be interested in that quantity at all.
Of course it doesn't change the fact that the quantity has to be
clearly interpretable (to use an example from the article, what does
it mean to test the effect of unemployment on the duration of a
dictatorship while controlling for a well-armed resistance outside the
front gate of the palace?); however, this strikes me as all the more
reason to use matching as there will probably be no match for low
unemployment levels with well-armed resistances. I'd say that our
counterfactual needs to be interpretable, but unfortunately I don't
think "counterfactual" is the appropriate word any longer (as again,
there is no plausible intervention).
I raise the issue because I think these attributable effects are quite
common in the literature and make a better argument for articles that
include posttreatment variables.
Brandon