On Thu, 16 Dec 2004, Olivia Lau wrote:
Excellent questions, mike.
1) Just so everyone is clear on this: polr() is for ordinal
*logit* not ordinal probit.
polr has an ordered probit option. Quoting from the polr help file:
The ordered factor which is observed is which bin
Y_i falls into with breakpoints
zeta_0 = -Inf < zeta_1 < ... < zeta_K = Inf
This leads to the model
logit P(Y <= k | x) = zeta_k - eta
with _logit_ replaced by _probit_ for a normal latent variable,
and eta being the linear predictor, a linear function of the
explanatory variables.
At any rate, it gives the same estimates as Stata (and as Zelig, except
for the sign change).
2) The difference is one in parameterization. Remember how we
fixed tau1 = 0? If you fix tau1 to a different value, you will
get a different answer for the second intercept term (tau2) and
different signs on the coefficients (beta). As long as you use
the same parameterization to calculate the quantities of
interest, the predicted probability of falling into each bin
will be invariant to reparameterization of the systematic
component.
I buy that; what I don't understand is why you gould get different signs
on the coefficients (beta) when you are getting the same estimates (signs
and magnitudes) on the taus.
3) The substantive question of interest can't be
the unobserved
underlying variable (because the stochastic component is
multinomial). Thus, the expected value for this model is the
predicted probability that an observation i falls into each bin.
So let's say that you're interested in presidential approval.
Someone conducts a survey asking: "The president is doing an
effective job. Strongly agree (1), agree (2), neutral (3),
disagree (4), strongly disagree (5)." We observe the number of
people in bins 1:5 and estimate the model and find beta and
tau2-tau5. If we were just interested in the unobserved
underlying distribution Y*, the quantity we would calculate is
x'beta = mu. Now what does mu mean? mu relative to what? It
has to be mu relative to the cut points.
This is what I don't understand. Say you were interested in the effects
of certain covariates on the heights of children. You don't collect the
data, and for some reason the researchers who did collect the data only
tell you which of five categories they are in - really short, kind of
short, average, kind of tall, really tall - and you don't know what the
cutpoints were that they used to divide the children into categories.
Maybe I'm missing something, but this would seem to be the kind of
question one might be interested in answering using something like ordered
probit. In this case, what bin you are in is of no substantive interest,
it is just an artifact of the data collection process. It is the mu that
we are interested in, even if we don't have it in meaningful units. Now,
if I estimate this model with zelig and the coefficient on (say) income is
negative and in polr it is positive, what is one supposed to think?
Cheers
Mike
Olivia
----- Original Message -----
From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu>
To: <gov2001-l(a)lists.fas.harvard.edu>
Sent: Thursday, December 16, 2004 10:24 AM
Subject: [gov2001-l] Re: ordered probit coefficients
Hi -
I know that we are not supposed to be interested in the raw
coefficient
estimates from something like ordered probit, but how should
we think
about the fact that the coefficient estimates from Zelig are
of the
opposite sign while the intercept/threshold estimates are of
the same sign
as what we are getting from our own code (and from what you
get using
polr() in the MASS package)? What if the substantive question
of interest
is the underlying unobserved variable?
Cheers
Mike
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
_______________________________________________
gov2001-l mailing list
gov2001-l(a)lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l