2) The
difference is one in parameterization. Remember how we
fixed tau1 = 0? If you fix tau1 to a different value, you will
get a different answer for the second intercept term (tau2) and
different signs on the coefficients (beta). As long as you use
the same parameterization to calculate the quantities of
interest, the predicted probability of falling into each bin
will be invariant to reparameterization of the systematic
component.
I buy that; what I don't understand is why you gould get different signs
on the coefficients (beta) when you are getting the same estimates (signs
and magnitudes) on the taus.
The linear predictor in VGAM (Zelig) is tau - x \beta, not x \beta. That's
why all the coefficients are negative. Just make sure that you use the same
parameterization to calculate the predicted probability of falling into each
category and you'll be ok irrespective of the sign on beta. Remember that
the beta is entirely dependent on the systematic component. If the
systematic component is different, the point estimates for beta will differ,
but the quantities of interest calculated from the systematic components
will be the same.
3) The
substantive question of interest can't be the unobserved
underlying variable (because the stochastic component is
multinomial). Thus, the expected value for this model is the
predicted probability that an observation i falls into each bin.
So let's say that you're interested in presidential approval.
Someone conducts a survey asking: "The president is doing an
effective job. Strongly agree (1), agree (2), neutral (3),
disagree (4), strongly disagree (5)." We observe the number of
people in bins 1:5 and estimate the model and find beta and
tau2-tau5. If we were just interested in the unobserved
underlying distribution Y*, the quantity we would calculate is
x'beta = mu. Now what does mu mean? mu relative to what? It
has to be mu relative to the cut points.
This is what I don't understand. Say you were interested in the effects
of certain covariates on the heights of children. You don't collect the
data, and for some reason the researchers who did collect the data only
tell you which of five categories they are in - really short, kind of
short, average, kind of tall, really tall - and you don't know what the
cutpoints were that they used to divide the children into categories.
Maybe I'm missing something, but this would seem to be the kind of
question one might be interested in answering using something like ordered
probit. In this case, what bin you are in is of no substantive interest,
it is just an artifact of the data collection process.
This and the question below are separate issues. On the negative sign on
beta, see above. On the predicting the unobserved variable Y* from observed
categories Y: Remember how we identify the model by pinning down one cut
point to 0 and constraining the other cut points to be greater than 0?
Well, we're effectively shifting the normal distrbution along R+. If you
have to do that to estimate the model, then you can't say anything about the
original distribution Y* because the cut points are defined relative to 0
(and not relative to the original position of Y* on R+). I can show this
pretty easy graphically, but lack the capacity to do that over email. I'll
draw the picture in section tonight.
It is the mu that
we are interested in, even if we don't have it in meaningful units. Now,
if I estimate this model with zelig and the coefficient on (say) income is
negative and in polr it is positive, what is one supposed to think?
Cheers
Mike
Olivia
----- Original Message -----
From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu>
To: <gov2001-l(a)lists.fas.harvard.edu>
Sent: Thursday, December 16, 2004 10:24 AM
Subject: [gov2001-l] Re: ordered probit coefficients
Hi -
I know that we are not supposed to be interested in the raw
coefficient
estimates from something like ordered probit, but how should
we think
about the fact that the coefficient estimates from Zelig are
of the
opposite sign while the intercept/threshold estimates are of
the same sign
as what we are getting from our own code (and from what you
get using
polr() in the MASS package)? What if the substantive question
of interest
is the underlying unobserved variable?
Cheers
Mike
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l