Hi Joseph,
To the extent possible, you should try to write your code to minimize
the strain you put on R's memory. Here are some general suggestions:
With large datasets specifically, there are a few things you can do:
(1. Increase the memory allotted to R:
http://gking.harvard.edu/zelig/docs/How_do_I2.html -- it looks like
you've already done this)
2. Selectively upload only the columns/variables you need (using
read.table or scan)
3. Try your luck with one of the packages that people have written for
large data sets -- I haven't had much luck, but you're welcome to try
-- filehash, biglm are some of the packages that I've heard about. See
http://yusung.blogspot.com/2007/09/dealing-with-large-data-set-in-r.html
http://n4.nabble.com/Large-data-sets-with-R-binding-to-hadoop-available-td8…
But, your point is well taken -- we'll try our best to pair up people
who have large data to work with. That might provide some fruitful
exchanges of ideas, as well as pair up people already reasonably well
equipped to handle lots of data.
Maya
On Tue, Mar 23, 2010 at 2:12 AM, Gavinlertvatana, Poj
<pgavinlertvatana at hbs.edu> wrote:
Hi,
I have questions about memory & runtime:
1.?????? For the replication study, do we have to worry about the computing resources of
the reviewer?? I?m asking because some of my dataset won?t load on my laptop, and I?m
using a computing server (20G memory allocated to R) to do the computation.? One of my
datasets is c. 700MB.
2.?????? Is there a more efficient way of running a regression with fixed effects?
I?m trying to replicate a Stata code that uses ?areg? and ?absorb(x)?, which looks like
it runs the regression with x as factors, a.k.a., fixed effects, dummy variables.? Stata
runs this REALLY quickly (minutes), while my code in R runs really slowly (many more
minutes).
My code looks something like:
x1.factor<-factor(x1)
lm(y~x1.factor+x2+x3+?)
The paper only cares about coefficient estimates for x2, and doesn?t care about
coefficients for x1.factor.? x1 has about 120 categories.? I was wondering if anyone knew
of a more efficient way to run this since (1) x1 has lots of categories, and (2) I don?t
care (yet) about the coefficients for x1.factor.
Thanks!
Best regards,
Joseph
Joseph Poj Gavinlertvatana
Doctoral student, Marketing
Harvard Business School
203 Wyss Hall, Soldiers Field, Boston, MA 02163
Ph?? 617.230.5907
Fx?? 617.496.4397
Txt/Vm 617.910.0563
Em pgavinlertvatana at
hbs.edu
_______________________________________________
gov2001-l mailing list
gov2001-l at
lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l