Hi! sorry to keep bothering everyone. Patrick, thank you for your last
response, it was incredibly helpful. I have a question about deviance
residuals
here is an example:
Deviance Residuals:
Min 1Q Median 3Q Max
-1.793 -1.037 -0.753 0.449 2.755
compared to another model we ran, using the same distribution (neg bin), but
more explanatory variables
Deviance Residuals:
Min 1Q Median 3Q Max
-1.997 -0.960 -0.644 0.428 2.879
so how do we know which one has a better "fit"? are bigger or smaller
numbers better?
sorry to keep bugging everyone on the list!
Hi all,
Quick optim question:
The paper Roman and I are replicating uses discrete-time event-history logistic
regression. We've been able to get the right point estimates and standard
errors thus far using Zelig's logit commands, but when we program our own logit
function, we run into some problems. Specifically, the Zelig and numeric MLE
estimates are nearly identical when we specify the model with all 16 covariates
but one. Adding the final covariate radically shrinks all parameter estimates.
The final covariate is on a greater scale (thousands) than the others (tenths
to tens). If we divide it by 1000 and then run the regression, we get the right
estimates for the other covariates and (unsurprisingly) an estimate 1000 times
greater than the correct value for the final covariate. My guess is that this
has something to do with how optim is performing optimization. Does anyone have
any suggestions?
Many thanks,
chris
Hi all,
I'm trying to read the source code for some functions that are
included in the package "nlme". Specifically, I'd like to see the
source code for "lme" and "nlme". However, instead of a long bunch of
code (as I get for example from typing in "lm" or "survreg"), I just
get the short bits of code I've pasted in below.
Can anybody tell me how I can see the source code for these functions?
I don't really get the idea behind "UseMethod(X)"--it just seems to
hide what's really going on.
I've even looked through the various files associated with R on my
computer's hard drive, but I can't figure out where to find the
functions.
Thanks in advance for any assistance,
Malcolm
> lme
function (fixed, data = sys.frame(sys.parent()), random, correlation =
NULL,
weights = NULL, subset, method = c("REML", "ML"), na.action =
na.fail,
control = list(), contrasts = NULL, keep.data = TRUE)
UseMethod("lme")
<environment: namespace:nlme>
> nlme
function (model, data = sys.frame(sys.parent()), fixed, random = fixed,
groups, start, correlation = NULL, weights = NULL, subset,
method = c("ML", "REML"), na.action = na.fail, naPattern,
control = list(), verbose = FALSE)
{
UseMethod("nlme")
}
<environment: namespace:nlme>
last thing, we get this error message when we run negbin in zelig
1: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
control$trace > :
iteration limit reached
2: In theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
control$trace > :
iteration limit reached
does that mean there are not enough observations for the number of
explanatory variables? because we don't get it when we use fewer explanatory
variables
Hi everyone!
JeeHye and I are working on a paper that uses a negative binomial model.
The dependent variable is the number of uses force in a given period of time
(it might have been every three months or something like that...). So, this
seems like it might be pretty NOT iid. So that is why they use the neg bin.
So we ran in this R using zelig and did robust sandwich clustered errors and
pretty much did not have their key explanatory variable come out significant
(percent in congress from same political party as president during that
period of interest). So we thought okay let's try stata. So we did and there
was an option to cluster on a particular variable...they clustered on
"president." Got their results, their exp var of interest came out
significant. So, why was this? Why would you cluster your standard errors on
something? When is it okay to do this? When is it not? To be honest, what
does clustering on something for your standard errors even mean?
Also, how would we go about checking for overdispersion...we want to try the
poisson model and then compare it to the neg bin...see if there is even
overdispersion (which there must be). But how do you check for it? And are
there other ways of controling for dependence between observations in event
count models? I mean with other regression types there are splices and time
dummy variables, could you do something similar with an event count model?
Sorry about all the questions...I guess it would be just be helpful to hear
back from some of you regarding whatever it is that you know about in
regards to a subset of them, all of them, none of them...whatever comments
you have.
Hope everyone is having a great break!
sparsha
Hi all,
There have been requests for help on possible problems with the replication
projects. Because of this, I will hold office hours this week at my usual
time, Wed 2-4. Feel free to come by. A couple notes:
1) you should continue to seek help from the listserv, even for questions
related to your replications as it is likely that others may be having
similar problems
2) if you're dealing with models that you are unfamiliar with, you should
always start by reading up on it via other references or textbooks
3) if you run into coding problems, such as when using Zelig or other R
packages, always try debugging yourselves first. it is likely that you made
an error somewhere in the process.
Let us know if there are other questions.
Dear list....
We are seeking to replicate an ordered probit model. Amongst the independent
variables there are democratic id, republican id, independent id, and also
north central, south, west, northeast. All of these are dummy variables.
Below the table it states "The reference variable for party identification
is "Independent" and for region it is "North East". In running our
regression we have not specified which variable is the reference
variable. Should we do so and, if so, do you know how?
Also, we can get close to the estimates, with the correct significance on
the different variables, but we can't get it quite there. Is it possible to
give some indication as to how close is close enough?
Thanks
Kyle & Sam
i was trying to use that xtable thing on zelig outputs, specifically 2.1,
but it doesn't work for zelig. is there something similar that turns zelig
summaries into sweet latex code so we don't have to do it manually?
Pardon the interruption. I meant for this question to go to the list.
Thanks!
---------- Forwarded message ----------
From: Lauren Peritz <lauren.peritz at gmail.com>
Date: Wed, Mar 18, 2009 at 7:31 PM
Subject: Fwd: [gov2001-l] PS6 1.3
To: Patrick Lam <plam at fas.harvard.edu>
Hi Patrick,
Is there a good source for commonly used link functions? For the
t-distribution, in question 1.3 I've found the pre-programmed R function.
I've also found some generalized formulae employing the gamma function.
However, i'm still confused, despite your hints below. Any other advice?
Thanks,
Lauren
# "t-function" log-likelihood
tfunct.ll <- function(par, X, y){
#par=parameters
#X=matrix of covariates with intercept
#y=dependent variable
# how do we link parameters to distribution of y? delta <- (mu - m0) *
sqrt(n)/sigma?
delta <- par[1:ncol(X)]
out<-sum(y*log(pt(X,3,delta))+(1-y)*log(pt(X,3,delta)))
return(out)
}
---------- Forwarded message ----------
From: Patrick Lam <plam at fas.harvard.edu>
Date: Sun, Mar 15, 2009 at 1:45 AM
Subject: Re: [gov2001-l] PS6 1.3
To: gov2001-l at lists.fas.harvard.edu
Hi Olena,
I'd go back and perhaps look at my section notes. Ask yourself the
following questions:
What are the parameters I'm trying to find here?
How do they relate to x and y?
How do I link my parameters to the distribution of y?
Also, if you managed to do the scobit problem correctly, then it's not that
much different here than that problem.
On Sat, Mar 14, 2009 at 9:11 PM, Olena Ageyeva <eageeva at hotmail.com> wrote:
> Have anyone figured out what the formula is?
> I tried to implement it exactly as it says in the PS, at least as I
> understood it and I really doubt that it's right.
>
> That is what from my understanding it should look like:
>
> ll.t<-function(par,x,y)
> {
> delta=par
> out<-sum(y*log(pt(x,3, delta))+(1-y)*log(pt(x,3,delta)))
> return(out)
>
> }
>
> I have tried x%*%beta instead of x as well and have gotten weird results
> which did not look right to me.
>
> What am I doing wrong?
> I have no studying partner and am really tight on time. Could anyone help
> please?
> Thanks!
> Sincerely,
> Olena Ageyeva
>
>
>
> ------------------------------
> Windows Live? Groups: Create an online spot for your favorite groups to
> meet. Check it out.<http://windowslive.com/online/groups?ocid=TXT_TAGLM_WL_groups_032009>
>
> _______________________________________________
> gov2001-l mailing list
> gov2001-l at lists.fas.harvard.edu
> http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
>
>
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam
_______________________________________________
gov2001-l mailing list
gov2001-l at lists.fas.harvard.eduhttp://lists.fas.harvard.edu/mailman/listinfo/gov2001-l
> One thing jumps out...you seem to be missing a fourth "0" in your
> optim() line. It should read:
> optim.out <- optim(par = c(0,0,0,0), scobit.ll, y = y, X = X,
> method="BFGS",control=list(fnscale = -1), hessian = T)
In addition, I think you may have both some missing and extra negative
signs in this part of your log-lik function:
outA <- sum(-y*log((1+exp(-mu))^a))
outB <- sum((1-y)*log(1-(1+exp(-mu))^a))