Gov2001 March 2006

gov2001@lists.gking.harvard.edu

41 participants
66 discussions

Start a nNew thread

[gov2001-l] $\operatorname{P}$, $\operatorname{Pr}$, and $P$

by ghumphr＠fas.harvard.edu

Dear Professor King, Is there a difference between $\operatorname{P}$, $\operatorname{Pr}$, and $P$? Geoff

18 years

[gov2001-l] Replication: Reattaching Unique Identifiers

by nall＠fas.harvard.edu

For our replication project, Justin Grimmer and I have been exploring missing data problems. We were given a data set by the authors that was stripped of most identifiable information. Alas, it retains only those variables necessary for the authors' OLS regressions, even though it has a rich data history: it was produced by merging 1990 Census data with a detailed 1986 survey of municipalities. The authors omitted any unique variable that would let us link back to the original databases, a copy of which we have in our possession. To identify the missing data points (the unit of analysis is municipalities), we are trying to link the data table they used in their regressions with the 1986 raw data or the 1990 Census. Lacking a unique identifier, we have been unable to get an exact match, but we have been trying to link on the following variables: -Log of 1990 Census population--rounded to either 2 or 3 decimal pts. (The problem here is that increasing precision captures rounding errors between the two data sets, while decreasing precision leads to too many false matches when we merge.) Because the authors rounded the log of population to 3 (or so) decimal points, we get rounding errors if we just try to exponentiate. -Census region -A variety of dummy variables for various survey responses. These have been only somewhat useful since they are not a unique identifier. -We think that the states are listed in alphabetical order in their data table (based on the sequencing of regions) but we aren't sure how to use this fact. Linking on any combination of these produces either too few identifiable matches or too many. We've been using the merge() function in R. It seems to work okay but it is not good at identify what's causing mismatches. We have gotten close to linking the 1990 Census data with the 1986 survey, but this only gets us to their first step; it doesn't let us work backward to match with their table. Has anyone on the list dealt with such a problem in the past? Any suggestions are greatly appreciated. We're considering going back to the authors, but we suspect they may not have what we need. Thanks, Clayton and Justin

18 years

[gov2001-l] Spell Checking LaTeX source

by ghumphr＠fas.harvard.edu

I was wondering about how to spell check LaTeX source and ran across the command ispell -t [TEXFILE] and thought that it might be useful to someone.

18 years

[gov2001-l] Quantities of Interest

by ghumphr＠fas.harvard.edu

Hi, I was thinking about how to do exponential regression and came up with this quick optimization. I was considering using it to fill in missing parts of a Zipf distribution, but I have decided that the assumptions to do so are not met, particularly considering that political manifestos in German (which has a lot of compound words) have a pretty well-filled Zipf distribution whereas manifestos in English (which has a lot of multi-word terms) do not. I am curious as to how to adapt this estimator to time-series analysis and to make it more robust. Geoff # the function to optimize f <- function(par, X, Y, W=rep(1, nrow(X))) { beta <- par beta[2] <- 1/par[2] sum(t(W) %*% abs(exp(X %*% beta) - Y)) } # read in the data and set up variables table <- read.csv("table.csv", header=T) Y <- as.matrix(cbind(table[7])) X <- as.matrix(cbind(1, rev(c(1:nrow(Y))))) W <- as.matrix(table[6]) # do a linear regression on transformed Y values, taking the # reciprocal of beta_2 for optimization simplicity lm.out <- lm(log(Y)~(X[,2]), weights=as.vector(W)) par <- c(coefficients(lm.out)[1], 1/coefficients(lm.out)[2]) b0 <- par # extract a set of coefficients for initializing the optimization betahat_ <- optim(par, f, method="CG", X=X, Y=Y, W=W)$par betahat <- betahat_ betahat[2] <- 1/betahat[2] # plot plot(X[,2], Y, main="Price of IBM vs Time", xlab="day", ylab="adjusted price") lines(X[,2], exp(X %*% betahat), col="blue") b0[2] <- 1/b0[2] lines(X[,2], exp(X %*% b0), col="red") legend(x=0, y=120, legend=c("Naive Transformed Least Squares", "Absolute Least Error Predictor"), fill=c("red", "blue"), bty="n")

18 years

[gov2001-l] replication

by king＠harvard.edu

when you write up your replication, it would be helpful if you explained what the model was you were replicating. when you do this, don't just write down the name of the model, since the same models are often called different things. what you should do is to write down the model in full mathematical notation so we can all figure out what it is. the way to do this is exactly as i've been doing in class, including all the first principles necessary to produce the likelihood function. this will always include a stochastic component and a systematic component, and often more. when you write mathematical notation, be sure that each and every symbol is defined. if you have an index, such as for an observation number, be sure to say what it goes from and to (e.g., Y_i for observation i, i=1,...,n). Gary

18 years

[gov2001-l] Need help understanding simulation process

by sbhk597＠gmail.com

Hi All Can somebody help me to understand the two types of simulation that Gary gave lecture on. I am still bit confused. I use SPSS for my logit works but I strongly believe that we have to move beyond calculating simple betas and odds and give quantities of interest along with uncertainity. Suppose Beta = .0250I for education and Beta = .06531 for income in a logistic regression equation: Logit (turnout) = .02501 education + .06531 income. I would like to know through an example how would you simulate the impact of race on turnout 1. while holding constant income and education at their means. 2. with income bracket of 30,000 to 45,000 dollars and less than high school of education. Can somebody give example by drawing three to four samples? Also many times when you have predicted probabilities of voting in an election for a data set using logistic regression model for each case in the sample of a state or an area and after considering probability of less than .50 not voting and more than .50 voting, how can you show the impact of changing a value of the parameter e.g. education with less than high school to all the sample having atleast high school education, on the predicted turnout of say 45 percent for the sample. That is I would like to say that changing a certain parameter (kind of first difference) the total turnout would improve from 45 percent to 50 percent or whatever. I know I can do that in SPSS but it wont give me uncertanity or confidence intervals: which most of the analysts dont give for such type of "what if analysis" I am going through the work of Wolfinger and Rosentone "Who votes"; excelent work but no confidence interval levels or uncertanity in explaining their quantities of interest claculating through probit. How can you use Zelig for producing such quantities of interest? Bilal

18 years

[gov2001-l] still in office hours

by yohai＠fas.harvard.edu

Hi everyone, The gate in the basement of CGIS has gone down, but I'm still in the training lab if anyone wants to stop by for the next 30 minutes. You can enter through the front door of the Fung Library. Best, Ian

18 years

[gov2001-l] reminder office hours today

by yohai＠fas.harvard.edu

Hi everyone, A quick reminder that I will be holding office hours today from 4:30-6:30 in CGIS N018. If you are not in some tropical paradise, and have questions/concerns/frustrations about the replication project, please feel free to stop by. Best, Ian

18 years

[gov2001-l] installing Zelig on intel mac

by harrow＠gmail.com

Hi, I'm having trouble installing Zelig on a macbook laptop (the kind with the intel cpu). For example, running this line > install.packages("Zelig", repos = "http://gking.harvard.edu") gives errors Warning in install.packages("Zelig", repos = "http://gking.harvard.edu") : argument 'lib' is missing: using ~/Library/R/library Warning: unable to access index for repository http://gking.harvard.edu/bin/macosx/i686/contrib/2.2 Warning in download.packages(pkgs, destdir = tmpd, available = available, : no package 'Zelig' at the repositories and indeed the URL there doesn't work. Other repositories and commands (like source(http://gking.../install.R)) give similar errors. Is there a Zelig binary somewhere that's compatible with intel macs? -aram

18 years

[gov2001-l] ANNOUNCE *server issues tonight*

by yohai＠fas.harvard.edu

Hi everyone, I got an email from the FAS computing staff tonight saying they have completed the migration of everyone's home directories, and as a result they have to do some final upgrade on the icegov servers tonight. They claim this will only affect three users on icegov1 (whom I have already contacted), but just to be safe, I would recommend backing up all your files to your local machine tonight if at all possible. Please let me know ASAP if you are having any server issues.... Best, Ian

18 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Gov2001 March 2006