Gov2001 March 2007

gov2001@lists.gking.harvard.edu

24 participants
33 discussions

[gov2001-l] Some helpful pointers for the pset

by jgrimmer＠fas.harvard.edu

Hey Everyone, Sana had two great questions that will be helpful for the problem set so I wanted to forward them to the list. Here is the first > In problem 2.1, we're asked to control "for the above-mentioned > covariates". Does this include turnout, which > was a dependent variable in the first model? We do not want you to include turnout, but we do want you to include the interaction term that was in the first model. And here is the second question: > Again about problem 2.1: > Zelig reports back the coefficients, as well as three 'intercepts'. I > thought the intercepts referred to the threshold parameters, but if > that's the case, why are there three instead of two, since the > dependent variable (level of attention) has only four categories? Zelig is reporting three "intercepts" (which I'll call thresholds) because it identifies the model by assuming a variance for the latent variable, and by assuming that the intercept (or \beta_0) is 0. This allows zelig to estimate three threshold parameters. Remember, these parameters need to describe three walls for a four part dependent variable: the wall between 1 and 2, 2 and 3, and 3 and 4. In the section code, we used a different identifying assumption: that the first threshold was zero. In this case, we would only estimate two threshold parameters, and then an intercept. Also remember that these two identifying assumptions estimate the same model, just with a different parameterization. Thanks for the great questions--please do not hesistate to email the list with any other Cheers, Justin

17 years, 2 months

[gov2001-l] question about subsetting

by kbackes＠fas.harvard.edu

Hi all, I have another R question about subsetting data. Basically, we're trying to construct a subset of everyone in our dataset who answered yes to a certain question (BPMED=1, n=5091), but beacuse of the skip patterns, not everyone was asked the question, so there are a lot of missing values. When we try to subset based on BPMED=1, it still brings the NA values. I'm wondering if people know how to exclude NA values when subsetting. Our code is shown below, in case that's more helpful than my explanation. Basically, we want a subset that only contains the 5091 people for whom BPMED=1. > #subset among BP medicines users > hrs.BPmeds <- hrs[hrs$BPMED==1,] > #verify > dim(hrs.BPmeds) [1] 9760 129 > table(hrs.BPmeds$BPMED) 1 5091 Thanks so much for any advice you have. Katy and Sheila -- Katy Backes Kozhimannil, M.P.A. Ph.D. Program in Health Policy Resident Tutor, Adams House Harvard University 474 Adams Mail Center Cambridge, MA 02138 kbackes at fas.harvard.edu

17 years, 2 months

[gov2001-l] replication paper

by hlk23＠cornell.edu

Hello all, here are some more details regarding the upcoming replication assignment. The papers are due AT THE BEGINNING of class on April 2nd. Please bring the following things with you: For each group, 1) THREE CDs that contain all the data needed to replicate the paper, the article you chose to replicate as a pdf, electronic codebooks for the data, and your R code. If you should have several datasets, please also include a readme file that describes the overall structure of the data. 2) TWO hardcopies of the article you chose to replicate for Justin and me. The replication paper should mostly consist of the replication of the key tables of the original article. Quickly describe to what extent you were able to replicate the original findings, and indicate in what ways you plan to improve/extend the original paper (2-3 pages). At the end of lecture, please stick around for a couple of minutes so that we can give you the CDs prepared by one of the other groups. We will try to match groups according to research interests/background. Note that extension school students do not write final papers and thus do not replicate articles. They do, however, replicate other groups' replications. Several groups will therefore receive comments from two different sources: from another group and a long distance student. We will send out an additional email to long distance students with more details later. Also note the the reaction papers to the replications are co-authored by each group, not individually. Your replication of another group's replication will be due on Monday, April 9. Your are graded according to how constructive, not destructive you are. You should make sure that the other group successfully replicated the major claims of the original article by running their R code and then comment on their plans for extending/improving it. The more concrete advice you can provide on how to improve the original research the better. Your comments should generally take between 3 and 5 pages. cheers, Holger -- Holger Lutz Kern Graduate Student Department of Government Cornell University Institute for Quantitative Social Science Harvard University 1737 Cambridge Street N350 Cambridge, MA 02138 www.people.cornell.edu/pages/hlk23

17 years, 2 months

[gov2001-l] typo in solutions for ps 5

by hlk23＠cornell.edu

Hi all, we had a small typo in the solutions for ps 5. The p-value for the LR test was calculated incorrectly. We've posted corrected versions of the writeup and R code. cheers, Holger -- Holger Lutz Kern Graduate Student Department of Government Cornell University Institute for Quantitative Social Science Harvard University 1737 Cambridge Street N350 Cambridge, MA 02138 www.people.cornell.edu/pages/hlk23

17 years, 2 months

[gov2001-l] one last shot

by tom.laakso＠gmail.com

Okay, I know there have already been a million questions about this, but here's one more 11th hour attempt: Because there are 23 actual observations, "holding pressure at its observed values" implies that there will be 23 observations even when temperature is pinned to a particular value. Therefore, when we calculate the probability of failure for a given temperature (say, 31 degrees), we will get a VECTOR of probabilities for any single beta vector. Then, since we of course want to draw many beta-vectors from the relevant distribution, we end up with many VECTORS of probabilities. One of the preceding emails stated that we should average across the 23 to get a single probability. This seems fishy, as we would then be averaging once across the 23 observed pressures to get a single probability for a given beta, and then again across our draws from beta. This would introduce two deviations, which in theory we would have to propagate through. Is this indeed correct, or am I totally misunderstanding the situation? Phew. Any thoughts? Tom

17 years, 2 months

[gov2001-l] a few general questions about R

by kbackes＠fas.harvard.edu

Hi all, My partner Sheila and I are both new to R this semester, and have come across a few challenges as we approach our replication project. We have a few questions about some of the basics of using R for analysis and were hoping that some of you may have thoughts or suggestions: 1) How do you change the R/Zelig default settings so that output does not appear in scientific notation? 2) How do you subset data to run a regression on just that subset in Zelig? 3) What is the best way to view cross tabulations of data and to add row or column percentages? 4) What is a general rule of thumb for determining the most appropriate R format for different types of variables (i.e as matrix, as factor, as numeric)? We have run into a few different error messages regarding formatting, and that has been difficult for us to troubleshoot without much intuition for the different types of formats. Thanks so much for any tips you can provide - we really appreciate it. Best, Katy -- Katy Backes Kozhimannil, M.P.A. Ph.D. Program in Health Policy Resident Tutor, Adams House Harvard University 474 Adams Mail Center Cambridge, MA 02138 kbackes at fas.harvard.edu

17 years, 2 months

[gov2001-l] hints on 1.6?

by msen＠fas.harvard.edu

Hi everyone, I'm trying to tackle 1.6. Maybe some of you can help. I'm basically trying to follow what Gary does on pgs. 84-86 in his book. I ran a zelig logit twice, one using the date variable and once not using the date variable. Then I took the coefficient estimates and entered them into our logit model (1/(1 + exp((-1) * X*B)) thus getting two pi estimates (again, one with a date variable and one without a date variable). But when I took the ratio of the two pi estimates according to the formula on pg 84 ((-2) * log(logit.no.date/logit.date), I got multiple R values, some of which are negative (which I know is wrong---R values should always be positive since you know the model with more variables will always have more explanatory power). What's wrong with this approach? thanks for any help-- Maya

17 years, 2 months

[gov2001-l] quick hk question

by hlk23＠cornell.edu

Hi Gavril, it should be one graph, with the x-axis displaying the temperature and the y-axis displaying the expected probability given this temperature, leaving pressure at its observed values. The expected probabilities at each temperature will be the averages of the expected probabilities for all 23 observations. Hope that helps, Holger Bilev, Gavril wrote: > Hey Holger, > sorry to bug you again - but just a quick question to make sure - in 1.4 you expect 3 different graphs, correct? 1 for every different level of Pressure (there are only 3) or do you expect us to average them somehow and combine them into 1? > Best, > Gav -- Holger Lutz Kern Graduate Student Department of Government Cornell University Institute for Quantitative Social Science Harvard University 1737 Cambridge Street N350 Cambridge, MA 02138 www.people.cornell.edu/pages/hlk23

17 years, 2 months

[gov2001-l] difference between 1.3 and 1.4?

by msen＠fas.harvard.edu

I know this has been discussed on at least one prior thread, but I'm not sure I understand (or even see) the conceptual difference between problems 1.3 and 1.4 on the problem set. I know that's a vague question, so thanks for any help-- Maya

17 years, 2 months

[gov2001-l] Question on Problem Set

by hlk23＠cornell.edu

Hi Patrick, you are almost right. However, even in a logit model, expected and predicted values are not the same. The shortcut you're referring to saves you steps 4 and 5 of the expected value algorithm. Compare that to the 4 steps in the algorithm for the predicted value ... Holger Patrick Lam wrote: > Hi Holger, > > I had a question on the problem set. In 1.4 and 1.5, when you refer to > expected probabilities versus predicted probabilities, I'm not sure what > you mean. According to King, Tomz, and Wittenberg, we should take the > following steps. > > 1) Draw a value of beta. > 2) Multiply it by X > 3) Transform that into a probability. > 4) Draw M simulations (1s and 0s) from the Bernoulli distribution using > that probability. (this accounts for fundamental uncertainty) > 5) Average out the simulations. This is the expected probability and in > logit, is equal to the probability derived in step 3. > 6) Repeat for all betas > > My question is what you mean by expected versus predicted > probabilities. I think you are trying to get at how the expected and > predicted probabilities are the same in the logit case. So is the > expected probability that you are referring to just the probabilities > without accounting for fundamental uncertainty ( i.e. just taking the > probabilities from step 3) and the predicted probabilities are just > going through all the steps? Or do you mean that the expected > probabilities are when M is large and the predicted probabilities are > when M is small in step 4. > > Thanks > > -Patrick -- Holger Lutz Kern Graduate Student Department of Government Cornell University Institute for Quantitative Social Science Harvard University 1737 Cambridge Street N350 Cambridge, MA 02138 www.people.cornell.edu/pages/hlk23

17 years, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Gov2001 March 2007