In the tobit model for problem 2, \mu_i is just the mean, not a vector of
means.
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
Hi Everyone- I hope all your papers are going well.
I am trying to write an R function that will produce a fairly complicated
graph by taking in, amongst other things, a variable name and then at one
point use setx in Zelig to produce a number of values. In setx the argument
name is actually the variable name, and here is where I run into trouble. I
can't seem to take an argument into my function and have it come out as an
argument in another function. So for example imagine this reduced example:
myfunction <- function(var) {
result <- setx(modelobj, data=data,
var
=seq(from=xrange[1], to=xrange[2], length.out=100))
}
You can see that I set the key piece of code out here. Its not the variable
itself I need there but the name of the variable. I have tried a character
string, assign(), paste(), substitute() and attribute(). For those that
know Stata this would be fairly easily solved by calling a local (in this
case the local is created automatically and you just write `var'). R is not
a macro language though, so I assume there is a better way I just haven't
found yet. Any thoughts?
Brandon
Dear Professor King, Miya and Patrick
Just wanted to mention the following thing that happen to me: this piece of
R code using Amelia only runs on one of the HMDC computers, I have tried 4
other computers and get an error message about unused
arguments...(concerning that variable with NAs)
I will stick to the working computer hoping that it is because it has the
right Amelia package that it runs ( and not because there is something wrong
that, for weird reasons, still goes our way!)
Thought i would mention it just in case anyone had the same issue...really
weird!
model.1.new <- scrugg.f[, c("uercovch", "laguercov", "lagleftc",
"lagtradeopen", "lagopenn",
"laguerate", "laggrow", "hrsveto", "siaroff", "ggdeflag", "bbb",
"blaguercov",
"blagleftc", "blagtradeopen", "blagopenn", "blaguerate", "blaggrow",
"bhrsveto",
"bsiaroff", "bggdef", "dum1", "dum2", "dum3", "dum4", "dum5", "dum6",
"dum7", "dum8",
"dum10", "dum12", "dum13", "dum14", "dum18", "dum19", "dum20", "dum21",
"counter", "year")]
## use amelia to impute missing data
model.1.new.am.list <- amelia(x=model.1.new, m=5, idvars=c("dum1", "dum2",
"dum3", "dum4",
"dum5", "dum6", "dum7", "dum8", "dum10", "dum12", "dum13", "dum14", "dum18",
"dum19",
"dum20", "dum21"), ts="year", cs="counter", polytime=3)$imputations
best regards
Charlotte
hi!
Quick (but fundamental question)
in exercise 1 gamma needs to be positive, so I re-parameterized...In
order to get the right MLE from optim() i didn't
forget to "re-parameterize" again...and i get the same result as the
one i found analytically. However, things change when i look for the
SE. Indeed, I get different answers analytically and with R (using the
hessian). I know this comes from the re-parameterization because I
don't have this issue when I change the whole function and do not
re-parameterize gamma. So my question is:
- if I re-parametterize, how do I apply the transformation to the
hessian to get the right result
how does that fit with the section notes that follows, why do we take
"pnorm" (this is the first transformation that is applied in the
ll.binom function) of "opt.1000 - 1.96*se" and not of "se" for
instance???
#binomial log-likelihood (N = # of trials for each observation)
ll.binom <- function(par, y, N){
# reparameterize pi; only search over [0,1]
p <- pnorm(par)
# log-likelihood
out <- sum(y*log(p) + (N-y)*log(1-p))
return(out)
}
# compare to wald ci
se <- sqrt(solve(-optim(par=2, fn=ll.binom, y=samp.1000, N=10, method="BFGS",
control=list(fnscale=-1), hessian=T)$hessian)) #$
wald.ci <- c( pnorm(opt.1000 - 1.96*se), pnorm(opt.1000 + 1.96*se))
wald.ci # 0.7364839 0.7535663
- if i do not re-parameterize in order to be done with it and have
both my analytical
and my R result fit, how can i justify i am not re-parameterizing gamma?!
thanks!
charlotte
Hi all,
A few announcements:
1) Please take the time to fill out course evaluations if you have not done
so. They are very valuable to us.
2) For those doing a replication project, please remember that your papers
are due on May 4 at 5pm EST. Please submit your papers to the Problem Set
Dropbox under the folder "Final". We only need one copy per group, so only
one of you has to submit it. Also, we only need your papers. We do not
need any of your code or data, although you should all keep copies of it
neatly formatted for future use.
3) For extension school students who are not doing a replication project,
your final has been posted on the class website in the same folder as the
problem sets. Your final is also due on May 4 at 5pm EST. Please submit
your final writeup and R code file to the Dropbox under the "Final" folder
just like any other problem set. You are NOT allowed to collaborate on the
final. If you have any questions, please email both TFs, and NOT the class
email list.
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam<http://www.people.fas.harvard.edu/%7Eplam>
Hi, we are trying to get predicted values for a negative binomial model...we
have our x mat, of observed values and some beta parameters (which we will
draw from a mutivariate normal). our problem is the dispersion parameter,
sigma squared....how do we draw that? so we should get one sigma squared per
draw of k + 1 betas (k being the number of covariates), but we are a little
confused...would sigma squared just be the variance of the mvrnorm
distribution at each draw??
Then, how do we incorporate the sigma squared values into the link function?
or is it the case that, when we are drawing our y's, we incorporate the mean
of the draws of sigma squared values (so the mean of a 1000 values, right)??
thanks!
best,
sparsha
I have a question that I wanted to ask Gary that has to do with
developing research questions in methodology itself, but I figured I
might as well do it here on the list in case others are interested.
I have a background in some "machine learning" (SVMs, the Ising model
and exact sampling, MCMC, k-nearest neighbor, HMMs), but have seen and
used those methods in relation to classic problems, e.g. vision, image
processing, robotics, and natural language processing.
I was wondering if you could provide insight into how to bring
complicated, difficult to understand methods from more "technical"
fields like computer science and statistics to political methodology?
I see two possible routes: (1) Do you stay on top of the statistics
and comp sci literature as it's developing and then say "oh this might
apply to this problem in political science"? Or (2) is it more often
the case that you see some problem in the political science lit or the
real world, and then you search for solutions to those in other
fields? How fruitful is it to look to what others have done outside
political science vs. spending the time to try to come up with your
own algorithm or model?
You seem to develop many methods on your own, do computer scientists
every come look at your work and say, "wow, this solves a problem
we've been having"? I know we're all in universities because being
around other scholars makes everyone more productive, but how much do
the social scientists and the computer scientists interact, for
example?
To give a concrete example, I've been thinking about this as I go over
hidden Markov models in my artificial intelligence class (here at
Brown). So if you assume that some process is a Markov process with
unobserved states, one of the conical problems is to figure out from
some output sequence the most likely state transitions and output
probabilities. To solve this most people use a special case of the EM
algorithm, which is what brought these two separate classes together
for me. So I have this model that has been shown to be useful in gene
prediction, cryptanalysis, and such things, and it seems like there
could be some real political science applications but I'm not sure
what those are exactly. This is a case where I'm operating under the
first method from above-- I've found some cool solution to a problem
I'm not sure I have, although I often think the second method from
above makes more sense.
To give a slightly more personal explanation for my curiosity:
substantively I'm interested in what are often thought of as political
science questions, but I get this relative joy in reading statistics
and computer science articles, and I'm trying to figure out how to
bring that together. You've given us a lot of intuition and skills to
take substantive empirical questions and develop methods to solve the
wide variety of problems we might have, but how do wake up one day
with the goal of creating a new clustering methodology or trying to
model something using an HMM when few people in political science even
know what those are?
Cheers,
Zac
Hi all,
A few reminders:
1. Like last week, we will be having only one section this week at 8-9pm.
This will likely be our last section, so we encourage you all to attend.
We will be covering a LOT of material. The general topic will be
"Introduction to Bayesian Statistics", and we will try to cover (time
permitting) missing data, hierarchical (random effects) models, and
item-response theory (ideal point estimation) models from both the Bayesian
and non-Bayesian perspectives. At the end, if there's time, I'll give a
brief overview to using Bibtex with Latex.
2. The party at Gary's house is this Saturday at noon. Please RSVP if you
haven't done so.
3. There will be no more problem sets for the rest of the semester. Your
replication papers are due on May 4 by 5pm. For extension school students
who are not doing the replication project, your final assignment will be
emailed out on Monday April 27. It will also be due on May 4 by 5pm.
4. Last, but most important, the course evaluations for this class are now
up online. Please take some time to fill them out. They are very valuable
and important to all of us.
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
Dear all,
In thinking about recent lectures, I'm a bit confused. If I understood
correctly, Gary mentioned one day that, really, nobody is much
interested in non-causal associations. But if that's true, then I'm
unclear about the implications. In a garden-variety social science
journal article with some sort of regression, the authors will go
through the models they report, and comment on the various independent
variables that appear to have significant "effects" on the dependent
variable--which sounds like they're trying to talk about a number of
causal relationships simultaneously, consistent with the idea that
"nobody is much interested in non-causal associations".
However, the recent lectures about matching, checking for balance,
research design, post-treatment bias, counterfactuals, etc. suggest
that to talk about even just a *single* causal effect you need to bear
down, and check and do a whole bunch of things... which, as I see it,
few journal articles generally do.
So what gives? Is Gary saying that existing practice is just not up to
snuff--that they're being wildly unrealistic in trying to parse out
several causal relationships in a single article? What then is the
implication for our own best practice? Should one just pick out a
single covariate on which to focus (using matching, checking balance,
etc.)? Or should one go through any single regression model and, for
*every* (categorical) independent variable that appears to be
significant, use matching on all the other covariates?
As an example, supposing you stick in religion as a covariate in a
regression with countries as the unit of analysis, and while religion
isn't really what you're interested in, you happen find an effect for,
say, being Catholic. To talk about the suprising, apparent "effect" of
religion on your outcome of interest, should you then use matching,
check whether counterfactuals are inside the convex hull, etc.?
Any clarification would be much appreciated.
- Malcolm