[gov2001-l] Performance - Gov2001

27 Mar 2008

Hi all,

Did anyone encounter any issues with performance? We
ran into a running-time bottleneck and are stuck in
it.

Our dataset contains about 1mln entries. And iterating
through it with a few manipulations takes over
24 hours. :(

Does anyone have any general pointers about how to make
R code more efficient? For example, we gather that
doing things like dat[dat[['DATE']] == myDate,] are very
expensive operations. Is this true?

It's no surprise that R, just like MATLAB, exposes the
tradeoff of concise code and performance, but we need
to get this replication done somehow. Perhaps doing some
initial data-filtering tasks in C++ is a viable solution? :)
How do people usually deal with the problem of "too much
data"?

Thank you!!

-Alexei and Ben