Hi all,
Problem set 7 has been graded. A couple things:
1. The Zero-Inflated Poisson model has various stochastic components. The
first stochastic component is the choice of the data generated process,
either process 1 or process 2. This choice follows a Bernoulli. The second
component is conditional on the choice, Y is either distributed zero or
Poisson. This problem was almost universally missed. Most of you got the
second component but not the first.
2. When drawing your parameters from the multivariate normal, you need to
draw all of them together at once. Some of you drew betas and gammas
separately from different multivariate normals, using subsets of the
variance covariance matrix. This is wrong, as you are discarding the
covariances between parameters. In the same way you don't draw each beta
separately, you shouldn't draw any parameters separately regardless of what
Greek letter they are.
3. In many of your problem sets, there is an excessive use of for loops.
This is very very very very bad coding habit. In general, unless you are
iterating some process, you should not use a for loop because it is the most
inefficient way to do things. I notice that many of you use for loops for
simple vector operations. For example, if you have a vector x, and you want
to multiply every element by 2, you should do:
new.x <- x * 2
and NOT
new.x <- c()
for(i in 1:length(x)){
new.x[i] <- 2 * x[i]
}
This may seem to be a trivial example, but in almost all your problem sets,
there is something exactly like this, especially in log-likelihood
functions. This whole problem set could have (and should have) been done
without a single for loop, so I encourage you all to look at the answer key
and code. In future methods classes and your future work, you need to be
mindful of this and NOT use for loops. Code that takes seconds to run may
end up taking hours because of excessive for loops. I cannot stress this
point enough.
4. Finally, remember not to hardcode anything in your functions, especially
when indicating which elements of your parameter vector are betas and
gammas. For example, don't do this in your log-likelihood functions:
betas <- par[1:6]
gammas <- par[7:8]
Instead do
betas <- par[1:ncol(X)]
gammas <- par[(ncol(X) + 1), (ncol(X) + ncol(Z))]
--
Patrick Lam
Department of Government and Institute for Quantitative Social Science,
Harvard University
http://www.people.fas.harvard.edu/~plam