[gov2001-l] section in Science Center 316 - Gov2001

jens_hainmueller＠ksg05.harvard.edu

15 Dec 15 Dec

9:12 a.m.

New subject: [gov2001-l] mix package

is 'imp.mix' the right command to use in the mix package for the imputations? it's the only imputation type command i cound find there. 'imp.mix' apparently requires pre-processing with 'da.mix' or 'dabipf.mix' do we need to use any of those too? thanks! in da mix, j.

Reply

jens_hainmueller＠ksg05.harvard.edu

1:20 p.m.

New subject: AW: [gov2001-l] mix package

http://cran.r-project.org/doc/packages/mix.pdf contains a detailed and very helpful documentation of the mix package. best, j.

...

-----Ursprungliche Nachricht----- Von: gov2001-l-bounces(a)lists.fas.harvard.edu [mailto:gov2001-l-bounces@lists.fas.harvard.edu] Gesendet: Wednesday, December 15, 2004 2:23 PM An: gov2001-l(a)lists.fas.harvard.edu Betreff: Re: [gov2001-l] mix package You should follow the steps in the example at the bottom of help(imp.mix)...so yes, you do need to do some pre-processing before you use imp.mix(). Also, note that I forgot to mention something in section last week: If we fix tau1 = 0, we need to fix tau2 > 0. So you need to exp(par[k+1]) it to contstrain it to be greater than zero, but I'm sure that everyone has figured that out by now... 8) Olivia. On Wed, 15 Dec 2004, Jens Hainmueller wrote:

is 'imp.mix' the right command to use in the mix package for the imputations? it's the only imputation type command i cound find there. 'imp.mix' apparently requires pre-processing with 'da.mix' or

'dabipf.mix'

do we need to use any of those too? thanks! in da mix, j. _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

_______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

olau＠fas.harvard.edu

5:49 a.m.

New subject: [gov2001-l] Re: ordered probit coefficients

Excellent questions, mike. 1) Just so everyone is clear on this: polr() is for ordinal *logit* not ordinal probit. 2) The difference is one in parameterization. Remember how we fixed tau1 = 0? If you fix tau1 to a different value, you will get a different answer for the second intercept term (tau2) and different signs on the coefficients (beta). As long as you use the same parameterization to calculate the quantities of interest, the predicted probability of falling into each bin will be invariant to reparameterization of the systematic component. 3) The substantive question of interest can't be the unobserved underlying variable (because the stochastic component is multinomial). Thus, the expected value for this model is the predicted probability that an observation i falls into each bin. So let's say that you're interested in presidential approval. Someone conducts a survey asking: "The president is doing an effective job. Strongly agree (1), agree (2), neutral (3), disagree (4), strongly disagree (5)." We observe the number of people in bins 1:5 and estimate the model and find beta and tau2-tau5. If we were just interested in the unobserved underlying distribution Y*, the quantity we would calculate is x'beta = mu. Now what does mu mean? mu relative to what? It has to be mu relative to the cut points. Olivia ----- Original Message ----- From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 10:24 AM Subject: [gov2001-l] Re: ordered probit coefficients

...

Hi - I know that we are not supposed to be interested in the raw coefficient estimates from something like ordered probit, but how should we think about the fact that the coefficient estimates from Zelig are of the opposite sign while the intercept/threshold estimates are of the same sign as what we are getting from our own code (and from what you get using polr() in the MASS package)? What if the substantive question of interest is the underlying unobserved variable? Cheers Mike _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

olau＠fas.harvard.edu

6:47 a.m.

New subject: [gov2001-l] Re: ordered probit coefficients

...

2) The difference is one in parameterization. Remember how we fixed tau1 = 0? If you fix tau1 to a different value, you will get a different answer for the second intercept term (tau2) and different signs on the coefficients (beta). As long as you use the same parameterization to calculate the quantities of interest, the predicted probability of falling into each bin will be invariant to reparameterization of the systematic component.

I buy that; what I don't understand is why you gould get different signs on the coefficients (beta) when you are getting the same estimates (signs and magnitudes) on the taus.

The linear predictor in VGAM (Zelig) is tau - x \beta, not x \beta. That's why all the coefficients are negative. Just make sure that you use the same parameterization to calculate the predicted probability of falling into each category and you'll be ok irrespective of the sign on beta. Remember that the beta is entirely dependent on the systematic component. If the systematic component is different, the point estimates for beta will differ, but the quantities of interest calculated from the systematic components will be the same.

...

3) The substantive question of interest can't be the unobserved underlying variable (because the stochastic component is multinomial). Thus, the expected value for this model is the predicted probability that an observation i falls into each bin. So let's say that you're interested in presidential approval. Someone conducts a survey asking: "The president is doing an effective job. Strongly agree (1), agree (2), neutral (3), disagree (4), strongly disagree (5)." We observe the number of people in bins 1:5 and estimate the model and find beta and tau2-tau5. If we were just interested in the unobserved underlying distribution Y*, the quantity we would calculate is x'beta = mu. Now what does mu mean? mu relative to what? It has to be mu relative to the cut points.

This is what I don't understand. Say you were interested in the effects of certain covariates on the heights of children. You don't collect the data, and for some reason the researchers who did collect the data only tell you which of five categories they are in - really short, kind of short, average, kind of tall, really tall - and you don't know what the cutpoints were that they used to divide the children into categories. Maybe I'm missing something, but this would seem to be the kind of question one might be interested in answering using something like ordered probit. In this case, what bin you are in is of no substantive interest, it is just an artifact of the data collection process.

This and the question below are separate issues. On the negative sign on beta, see above. On the predicting the unobserved variable Y* from observed categories Y: Remember how we identify the model by pinning down one cut point to 0 and constraining the other cut points to be greater than 0? Well, we're effectively shifting the normal distrbution along R+. If you have to do that to estimate the model, then you can't say anything about the original distribution Y* because the cut points are defined relative to 0 (and not relative to the original position of Y* on R+). I can show this pretty easy graphically, but lack the capacity to do that over email. I'll draw the picture in section tonight.

...

It is the mu that we are interested in, even if we don't have it in meaningful units. Now, if I estimate this model with zelig and the coefficient on (say) income is negative and in polr it is positive, what is one supposed to think?

...

Cheers Mike

Olivia ----- Original Message ----- From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 10:24 AM Subject: [gov2001-l] Re: ordered probit coefficients

Hi - I know that we are not supposed to be interested in the raw coefficient estimates from something like ordered probit, but how should we think about the fact that the coefficient estimates from Zelig are of the opposite sign while the intercept/threshold estimates are of the same sign as what we are getting from our own code (and from what you get using polr() in the MASS package)? What if the substantive question of interest is the underlying unobserved variable? Cheers Mike _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

_______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

olau＠fas.harvard.edu

4:21 p.m.

New subject: [gov2001-l] mix package

Mike gets, I think, major brownie points for this. 8) This means that everyone should at least try multiple imputation for their papers and report whether their coefficients and quantities of interest display any bias before imputation. 8) ----- Original Message ----- From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 9:11 PM Subject: Re: [gov2001-l] mix package

...

This code should work for the mix package on R 2.0.1: library(mix) immig <- read.table("C:/immigration.dat", header = TRUE) # This turns gender into a 1,2 categorical variable # We treat the three ordinal variables as continuous rather than # categorical; otherwise, it has to estimate 686 cells, which is way # too many given the amount of data. immig$gender <- immig$gender + 1 # This puts gender on the left of the matrix immig <- cbind(immig$gender, immig[,1:4]) names(immig) <- c("gender", names(immig[,2:5])) # This does a whole bunch of stuff to identify the missing data # I think the data needs to be a matrix, not a frame s <- prelim.mix(as.matrix(immig), 1) # This calculates initial estimates of the means and var-covar matrix thetahat <- em.mix(s) # This sets the random number seed rngseed(1234567) # This generates the imputed dataset. It cranks through the da.mix # function 5000 times, draws a dataset, and then picks up where it left # off. A new dataset is drawn after every 5000 steps. newtheta <- da.mix(s, thetahat, steps = 5000) getparam.mix(s, newtheta) x1 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x2 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x3 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x4 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x5 <- as.data.frame(imp.mix(s, newtheta, immig)) # This coerces ipip back into an ordinal variable x1$ipip <- round(x1$ipip) x2$ipip <- round(x2$ipip) x3$ipip <- round(x3$ipip) x4$ipip <- round(x4$ipip) x5$ipip <- round(x5$ipip) # This gets rid of any 6s that creep into the imputed data # If you end up with any 0s after the round step, get rid of # those too. x1$ipip[x1$ipip >5] <- 5 x2$ipip[x2$ipip >5] <- 5 x3$ipip[x3$ipip >5] <- 5 x4$ipip[x4$ipip >5] <- 5 x5$ipip[x5$ipip >5] <- 5 #This turns the imputed datasets into a list for Zelig immig.mi <- list(x1,x2,x3,x4,x5) Cheers, Mike On Wed, 15 Dec 2004, Olivia Lau wrote:

You should follow the steps in the example at the bottom of help(imp.mix)...so yes, you do need to do some pre-processing before you use imp.mix(). Also, note that I forgot to mention something in section last week: If we fix tau1 = 0, we need to fix tau2 > 0. So you need to exp(par[k+1]) it to contstrain it to be greater than zero, but I'm sure that everyone has figured that out by now... 8) Olivia. On Wed, 15 Dec 2004, Jens Hainmueller wrote:

is 'imp.mix' the right command to use in the mix package for the imputations? it's the only imputation type command i cound find there. 'imp.mix' apparently requires pre-processing with 'da.mix' or 'dabipf.mix' do we need to use any of those too? thanks! in da mix, j. _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

_______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

weiha＠fas.harvard.edu

19 Dec 19 Dec

2:59 p.m.

New subject: [gov2001-l] mix package

Hi Olivia, I had a question for question no 2. What does Rubin rules mean exactly and what are we supposed to do? Also, why do we use the mix package in R rather than the Amelia for this week's problem set? Thanks! Wei Sincerely, Wei Ha PhD Candidate in Public Policy Harvard University Fax: 1-801-605-1455 ----- Original Message ----- From: "Olivia Lau" <olau(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 9:21 PM Subject: Re: [gov2001-l] mix package

...

Mike gets, I think, major brownie points for this. 8) This means that everyone should at least try multiple imputation for their papers and report whether their coefficients and quantities of interest display any bias before imputation. 8) ----- Original Message ----- From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 9:11 PM Subject: Re: [gov2001-l] mix package

This code should work for the mix package on R 2.0.1: library(mix) immig <- read.table("C:/immigration.dat", header = TRUE) # This turns gender into a 1,2 categorical variable # We treat the three ordinal variables as continuous rather than # categorical; otherwise, it has to estimate 686 cells, which is way # too many given the amount of data. immig$gender <- immig$gender + 1 # This puts gender on the left of the matrix immig <- cbind(immig$gender, immig[,1:4]) names(immig) <- c("gender", names(immig[,2:5])) # This does a whole bunch of stuff to identify the missing data # I think the data needs to be a matrix, not a frame s <- prelim.mix(as.matrix(immig), 1) # This calculates initial estimates of the means and var-covar matrix thetahat <- em.mix(s) # This sets the random number seed rngseed(1234567) # This generates the imputed dataset. It cranks through the da.mix # function 5000 times, draws a dataset, and then picks up where it left # off. A new dataset is drawn after every 5000 steps. newtheta <- da.mix(s, thetahat, steps = 5000) getparam.mix(s, newtheta) x1 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x2 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x3 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x4 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x5 <- as.data.frame(imp.mix(s, newtheta, immig)) # This coerces ipip back into an ordinal variable x1$ipip <- round(x1$ipip) x2$ipip <- round(x2$ipip) x3$ipip <- round(x3$ipip) x4$ipip <- round(x4$ipip) x5$ipip <- round(x5$ipip) # This gets rid of any 6s that creep into the imputed data # If you end up with any 0s after the round step, get rid of # those too. x1$ipip[x1$ipip >5] <- 5 x2$ipip[x2$ipip >5] <- 5 x3$ipip[x3$ipip >5] <- 5 x4$ipip[x4$ipip >5] <- 5 x5$ipip[x5$ipip >5] <- 5 #This turns the imputed datasets into a list for Zelig immig.mi <- list(x1,x2,x3,x4,x5) Cheers, Mike On Wed, 15 Dec 2004, Olivia Lau wrote:

You should follow the steps in the example at the bottom of help(imp.mix)...so yes, you do need to do some pre-processing before you use imp.mix(). Also, note that I forgot to mention something in section last week: If we fix tau1 = 0, we need to fix tau2 > 0. So you need to exp(par[k+1]) it to contstrain it to be greater than zero, but I'm sure that everyone has figured that out by now... 8) Olivia. On Wed, 15 Dec 2004, Jens Hainmueller wrote:

is 'imp.mix' the right command to use in the mix package for the imputations? it's the only imputation type command i cound find there. 'imp.mix' apparently requires pre-processing with 'da.mix' or 'dabipf.mix' do we need to use any of those too? thanks! in da mix, j. _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

_______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

olau＠fas.harvard.edu

3:27 p.m.

New subject: [gov2001-l] mix package

Hi, Wei. The rubin rules are described in the rubin reading that gary assigned in the lecture notes. In practice, it means: "Use Zelig because it does it for you." 8) We are using mix because it's in R. Amelia isn't in R (yet). Olivia ----- Original Message ----- From: "weiha" <weiha(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Sunday, December 19, 2004 7:58 PM Subject: Re: [gov2001-l] mix package

...

Hi Olivia, I had a question for question no 2. What does Rubin rules mean exactly and what are we supposed to do? Also, why do we use the mix package in R rather than the Amelia for this week's problem set? Thanks! Wei Sincerely, Wei Ha PhD Candidate in Public Policy Harvard University Fax: 1-801-605-1455 ----- Original Message ----- From: "Olivia Lau" <olau(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 9:21 PM Subject: Re: [gov2001-l] mix package

Mike gets, I think, major brownie points for this. 8) This means that everyone should at least try multiple imputation for their papers and report whether their coefficients and quantities of interest display any bias before imputation. 8) ----- Original Message ----- From: "Michael Richard Kellermann" <kellerm(a)fas.harvard.edu> To: <gov2001-l(a)lists.fas.harvard.edu> Sent: Thursday, December 16, 2004 9:11 PM Subject: Re: [gov2001-l] mix package

This code should work for the mix package on R 2.0.1: library(mix) immig <- read.table("C:/immigration.dat", header = TRUE) # This turns gender into a 1,2 categorical variable # We treat the three ordinal variables as continuous rather than # categorical; otherwise, it has to estimate 686 cells, which is way # too many given the amount of data. immig$gender <- immig$gender + 1 # This puts gender on the left of the matrix immig <- cbind(immig$gender, immig[,1:4]) names(immig) <- c("gender", names(immig[,2:5])) # This does a whole bunch of stuff to identify the missing data # I think the data needs to be a matrix, not a frame s <- prelim.mix(as.matrix(immig), 1) # This calculates initial estimates of the means and var-covar matrix thetahat <- em.mix(s) # This sets the random number seed rngseed(1234567) # This generates the imputed dataset. It cranks through the da.mix # function 5000 times, draws a dataset, and then picks up where it left # off. A new dataset is drawn after every 5000 steps. newtheta <- da.mix(s, thetahat, steps = 5000) getparam.mix(s, newtheta) x1 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x2 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x3 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x4 <- as.data.frame(imp.mix(s, newtheta, immig)) newtheta <- da.mix(s, newtheta, steps = 5000) x5 <- as.data.frame(imp.mix(s, newtheta, immig)) # This coerces ipip back into an ordinal variable x1$ipip <- round(x1$ipip) x2$ipip <- round(x2$ipip) x3$ipip <- round(x3$ipip) x4$ipip <- round(x4$ipip) x5$ipip <- round(x5$ipip) # This gets rid of any 6s that creep into the imputed data # If you end up with any 0s after the round step, get rid of # those too. x1$ipip[x1$ipip >5] <- 5 x2$ipip[x2$ipip >5] <- 5 x3$ipip[x3$ipip >5] <- 5 x4$ipip[x4$ipip >5] <- 5 x5$ipip[x5$ipip >5] <- 5 #This turns the imputed datasets into a list for Zelig immig.mi <- list(x1,x2,x3,x4,x5) Cheers, Mike On Wed, 15 Dec 2004, Olivia Lau wrote:

You should follow the steps in the example at the bottom of help(imp.mix)...so yes, you do need to do some pre-processing before you use imp.mix(). Also, note that I forgot to mention something in section last week: If we fix tau1 = 0, we need to fix tau2 > 0. So you need to exp(par[k+1]) it to contstrain it to be greater than zero, but I'm sure that everyone has figured that out by now... 8) Olivia. On Wed, 15 Dec 2004, Jens Hainmueller wrote: > is 'imp.mix' the right command to use in the mix package > for the > imputations? it's the only imputation type command i > cound find there. > > 'imp.mix' apparently requires pre-processing with > 'da.mix' or 'dabipf.mix' > do we need to use any of those too? > > thanks! > > in da mix, > j. > > > _______________________________________________ > gov2001-l mailing list > gov2001-l(a)lists.fas.harvard.edu > http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l > _______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

_______________________________________________ gov2001-l mailing list gov2001-l(a)lists.fas.harvard.edu http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l

Reply

ferrara＠fas.harvard.edu

20 Dec 20 Dec

7:01 a.m.

New subject: [gov2001-l] assignment 9

Olivia, I attached a pdf copy of assignment 9. Happy holidays, Federico

Reply

modi＠fas.harvard.edu

8:40 p.m.

New subject: [gov2001-l] paper

Sorry for awakening a finally restful list, my unfounded hearsay, and the spam from accidentally replying to the wrong address. Apologetically, Amit

Reply