You need to increase the memory. In your .Renviron file (if you don't
have it, create this file in your home directory), add these lines:
R_LIBS=~/.R/library
R_NSIZE=5000k
R_VSIZE=100M
Change the memory specification as necessary. You have to restart R
everytime you change this file. NOTE: The first line specifies which local
directory you want to install additional packages from CRAN.
Kosuke
On Sun, 6 Apr 2003, Ryan Thomas Moore wrote:
>
> Any idea what this Error is? The data set is only 36000 by 40!
>
> Thanks,
> Ryan
>
> > pro.1 <- glm(COL1 ~ SIZE + RTNQ1BUY + AGE + FRTN6 + AVGVOL + NASVOL +
> STDRET + MINMAX + S.P + D.P + SLDY + SGI + GMG + R.D + LSY + CHGEPS +
> ACCRUAL + CAPX,
> + family = binomial(link = "probit"), data=est.samp2)
> Error in rep(data, t1) : cannot allocate vector of length 1264086916
>
> --
> ------------------------------------------
> Ryan T. Moore ~ Government & Social Policy
> Ph.D. Candidate ~ Harvard University
>
Please turn in the replication/comment assignment tomorrow. Your memo
should include, at least, the following:
1. Indicate whether you were able to replicate the results. If you were
not able to replicate, you should inlclude the results you got and
indicate where the problems might be.
2. Make suggestions about the possible extensions. Be as concrete as
possible.
Please bring two copies of your memo; one for the student and the other
for me.
Good luck,
Kosuke
---------- Forwarded message ----------
Date: Sun, 6 Apr 2003 13:20:14 -0400 (EDT)
From: Lanhee Joseph Chen <lchen(a)fas.harvard.edu>
To: Kosuke Imai <kimai(a)fas.harvard.edu>
Subject: Question about replication memo assignment
Kosuke:
I had a quick question about the replication assignment due tomorrow -- do
we actually need to replicate the paper's tables and furnish those in our
memo, or are we just replicating for our own edification and simply
commenting on the paper and providing helpful suggestions in the memo?
Thanks,
Lanhee
--------------------------------------------------------------------
Lanhee J. Chen 18 Banks Street #309
Doctoral Candidate Cambridge, MA 02138
Department of Government (617) 492-9555
Harvard University lchen(a)fas.harvard.edu
--------------------------------------------------------------------
On Sat, 5 Apr 2003, Phillip Y. Lipscy wrote:
> Hi Kosuke/Gary,
>
> I have a question about first differences in our replication. Hiscox uses
> Clarify to calculate the first differences for several variables and gets the
> "effects estimated for change in each variable from minimum (0) to maximum (1)
> values for equations including only that variable and bill dummies."
>
> So two questions:
>
> 1. Does it make sense to vary the variables from 0 to 1 in a given period, when,
> for example, the empirical maximum value for the variable never exceeds 0.3?
> Would it be better to vary it from 0 to 0.1 or something? In this case, 1 is
> not even a "maximum" because in other periods, it takes on values like "1.3"
> (i.e. there's no theoretical reason why 1 would be priviledged).
good question. and you have the right intuition. the answer depends on
whether the counterfactual is reasonable. if you find that inflation
influences pres'l approvial, you wouldn't ask (in the u.s.) what approval
would be if inflation were 200%! what's right in this case is a matter of
theory, not data, but there are some things that can be said. have a look
at my paper on counterfactuals (when can history be our guide) with
Langche Zeng at my web site under Preprints.
> 2. Is it OK to leave all other explanatory variables out of the equation when
> calculating first differences? Or is it better to hold them to their
> mean/median, or does it make any difference?
do you want to control for these explanatory vars? if so then you must do
it explicitly by including them and holding them constant (which is the
same thing as controlling for them).
Gary
>
> I'm specifically talking about Table 2 (p.9), footnote b (i'll attach the
> article in case you want to look at it).
>
> Thanks!
> Phillip.
>
>
>
> -------------------------------------------------
> Phillip Y. Lipscy
> Perkins Hall Room #129
> 35 Oxford Street
> Cambridge, MA 02138
> (617)493-4893 DORM
> (617)851-8220 CELL
> lipscy(a)fas.harvard.edu
> http://www.people.fas.harvard.edu/~lipscy/
>
> First Year Student, Ph.D. Program
> Harvard University, FAS, Department of Government
> -------------------------------------------------
>
>
>
>
>
Dear All,
Quick question about using Amelia, Gary et. al.'s data imputation program.
When we type our filename in the input section, the program and the window in
which it was operating both disappear. We suspect that this is due to either
our file type (space delimited, ascii, .dat, with NAs labeled "NA") or due to
our dataset's size (n=29,233, 343 variables). We were curious: does anyone
know what we might be doing wrong?
Many thanks.
Best,
Dan
Hi all,
In the article I'm replicating the data were analyzed in Stata
using "generalized least squares corrected for first order autocorrelation
using panel specific process". What does it mean and how is it done in R?
The author although inclues "a set of 26 dummy variables, one for each year
covered by the data (1971-1997) *less one* to mitigate autocorrelation". So
which dummy should not be included when correcting autocorrelation? The one
for 1997?
Thanks,
Asif
Hi Fei and everyone else who might have had this problem,
I'm trying to open stata files in R, I loaded library(foreign), and I get
this:
> xmatrix<-read.dta("~/.vnc/xmatrix.dta")
Error in read.dta("~/.vnc/xmatrix.dta") :
Not a Stata version 5-7/SE .dta file
Any suggestions? Other people have worked with this file, so I assume
it's convertable somehow...
On Tue, 1 Apr 2003 Fei_Yu(a)ksgphd.harvard.edu wrote:
> Please find attached datasets for the paper. I will send out replication
> report separately.
>
> We got the data from the following World Bank website:
>
> http://www.worldbank.org/lsms/country/china/chnhome.html
>
> The datasets attached to this email are compiled by us, after
> merging/modifying the original survey data.
>
> As mentioned in the report, we were not able to replicate the reports due
> to technical difficulties. Your advice on possible ways to improve
> replication is most welcome. I have written to the authors asking for
> advice, but haven't heard from them yet.
>
> Cheers,
>
> Fei
>
A couple of things...
1. today's section materials are now available on the course website.
2. The replications/comments assignment is due in class on Monday. Please
bring two copies of your memo; an additional copy is for me. The purpose
of this assignment is to help your friend. Try to find problems in the
analysis and suggest possible extensions.
3. Assignment 7 is due April 16. Hence, there is no section next week.
4. If you want to talk to me about the project, e-mail me to make an
appointment. Remember that you only have about one month to finish the
project.
Good luck!
Kosuke
> From what I understand, the difference between the two graphs we are
> supposed to draw in question (a) is PRESINC. Also, each graph will have
> two curves: one for low probability of DWIN, the other for high
> probability. Assuming this is correct (?), 2 questions remain:
That's right.
> To separate estimates in the case of incumbency and non-incumbency,
> should we subset the data, so that PRESINC equals 1 for one subset, and
> 0 or -1 for the other? Following this, should we run two glm probit
> regressions for each subset? If so, there will be no coefficient for
> PRESINC and PRESINC*JULYECQ2 for the first subset, since PRESINC always
> equals 1 there..
No... You can run one regression and then set the values of explanatory
variables according to each senario.
> Should we simulate multiple parameters with mvrnorm() as in HW5, and
> then calculate multiple pi-hat values (to be plotted on the x-axis in
> this case)?
Yes, once you do that, you can draw a density plot, which is a smoothed
version of histogram (see my solution set to hw5 for the code).
Kosuke
I'm getting a straight line for part b when i plot the mean probability
versus the adaaca variable value. This is the code I use:
for (value in min(data$ADAACA):max(data$ADAACA)){
explan <- as.matrix(c(1, max(data$JULYECQ2), 0, value,0))
prob <- sort(pnorm(par.draws %*% explan))
adaacaProb[i] <- mean(prob)
Could anyone tell me what I could be doing wrong or what the problem might
be? thanks!
~mee-jung