_________________________________

Ömer Faruk Örsün

PhD Candidate

Department of International Relations

Koç University

CAS 289

_________________________________

On Thu, Feb 7, 2013 at 8:50 PM, Honaker, James <jhonaker@iq.harvard.edu> wrote:

Dear Ömer,

I'd second some of Matt's points. The 95 variables you point to is not too extreme, but when you turn on "Intercs" you are creating very many more variables. Exactly how many more variables depends on the number of unique values, v, in your cross-sectional variable "cs" and the degree/order "k" of your spline or polynomials of time. You will be adding v*k variables. If you have 100 countries (for example) and a 5th order spline, you have added 500 variables (to the 95 you started with). In very large setting, I would build up slowly from simple models that computationally work in your environment, and then increase the complexity of the model and see how large you can get before it fails. If it fails right off the bat in the simplest settings, that might be a pointer to a problem elsewhere (a common slip I've fallen to is too many unique values in the "CS" variable, like a country code that includes the year).

As for large memory machines, forgive me if my comments are too obvious, but one pragmatic tip is see if there is anyway you can get load monitoring of your R process, and see how much memory your job has before it fails, perhaps as simple as linux "top". You can do a little of this within R using gc(). In my experience in some high performance settings, you have to bother admins to let them adjust your privileges to actually get the potential amount of theoretically available, once the cluster has been burned by some user with a forgotten job that never terminates and a memory leak. Also, if they don't run R commonly, it might not be configured to take advantage of the servers capabilities.

James.

________________________________________
From: amelia-bounces@lists.gking.harvard.edu [amelia-bounces@lists.gking.harvard.edu] On Behalf Of OMER FARUK Orsun [oorsun@ku.edu.tr]
Sent: Thursday, February 07, 2013 1:04 PM
To: Ndiaye, Mamadou

Cc: amelia@lists.gking.harvard.edu
Subject: Re: [amelia] Error of " resulting vector exceeds vector length limit in 'AnswerType'"

Dear Ddiaye,
Thanks a lot for your suggestion.
Best,
Ömer

_________________________________
Ömer Faruk Örsün
PhD Candidate
Department of International Relations
Koç University
CAS 289
_________________________________

On Thu, Feb 7, 2013 at 7:41 PM, Ndiaye, Mamadou <MNdiaye@publichealthmdc.com<mailto:MNdiaye@publichealthmdc.com>> wrote:
To improve the memory limitation impeding R, I found the package SOAR very useful:
http://cran.r-project.org/web/packages/SOAR/vignettes/SOAR.pdf

Thank you
M. Ndiaye

From: amelia-bounces@lists.gking.harvard.edu<mailto:amelia-bounces@lists.gking.harvard.edu> [mailto:amelia-bounces@lists.gking.harvard.edu<mailto:amelia-bounces@lists.gking.harvard.edu>] On Behalf Of Matt Blackwell

Sent: Thursday, February 07, 2013 11:24 AM
To: OMER FARUK Orsun

Cc: amelia@lists.gking.harvard.edu<mailto:amelia@lists.gking.harvard.edu>

Subject: Re: [amelia] Error of " resulting vector exceeds vector length limit in 'AnswerType'"

Hi Ömer,

First, note that you may not have enough observations to get good imputations with that many variables. Amelia might have poor properties in that case. You can save a lot of hassle here by not interacting the polynomials of time with the cross-section (setting "intercs = FALSE").

I imagine you have 500 GB of hard disk space, not RAM, but either way, this is probably related to the maximum vector size that R can handle, which is currently 2^31-1. Obviously that is *very* large, but if you want to go beyond that you would have to use R 3.0.0 (still under development) which will allow for longer vectors on certain machines. For more information, see this help file in R:

?"Memory-limits"

If you are on Windows, you might be able to increase the amount of memory dedicated the R process.

Hope that helps!

Cheers,
matt.

~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Political Science
University of Rochester
url: http://www.mattblackwell.org

On Thu, Feb 7, 2013 at 12:10 PM, OMER FARUK Orsun <oorsun@ku.edu.tr<mailto:oorsun@ku.edu.tr>> wrote:
Hi Matt,
Many thanks for your response. The missingness in my data is severe, as a result, I might need to introduce all available data. Is there another way to avoid memory related errors given that I have a 500 GB RAM computer?
Best Regards,
Ömer

_________________________________
Ömer Faruk Örsün
PhD Candidate
Department of International Relations
Koç University
CAS 289
_________________________________

On Thu, Feb 7, 2013 at 4:54 PM, Matt Blackwell <m.blackwell@rochester.edu<mailto:m.blackwell@rochester.edu>> wrote:
Hi Ömer,

It seems as though you are running into memory issues with R itself. Note that using "intercs = TRUE" and "polytime = 2" will add 3*K variables to the data, where K is the number of dyads in the data. Given your description of the data, that could be an extremely large data set. You might want to run Amelia on a smaller subset of the data to see how the imputations go and then tentatively test out smaller imputation models.

Hope that helps!

Cheers,
matt.

~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Political Science
University of Rochester
url: http://www.mattblackwell.org

On Thu, Feb 7, 2013 at 7:24 AM, OMER FARUK Orsun <oorsun@ku.edu.tr<mailto:oorsun@ku.edu.tr>> wrote:

Dear Lister,

I am using Amelia II (Version 1.6.4) with a 500 GB computer specification and my data consist of directed dyads and my imputation model has 94 variables and 493,853 observations. I use the following command:

library(Amelia)

library(foreign)

mydata <- read.dta("data.dta")

require(Amelia)

set.seed(1234)

a.out <- amelia(mydata, m=10, p2s = 2, tolerance = 0.005, empri = .1*nrow(mydata), ts="year", cs="dyadid" , polytime=2, intercs = TRUE)

After 7 hours, I receive the following message:

amelia starting

beginning prep functions

Error in cbind(deparse.level, ...) :

resulting vector exceeds vector length limit in 'AnswerType'

I've already searched the Amelia II archieves and R archives, I was not able to locate a solution.

I would deeply appreciate any help!

Best Regards,

Ömer

_________________________________
Ömer Faruk Örsün
PhD Candidate
Department of International Relations
Koç University
CAS 289
_________________________________

--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Amelia mailing list

Amelia@lists.gking.harvard.edu<mailto:Amelia@lists.gking.harvard.edu>

To unsubscribe from this list or get other information:

https://lists.gking.harvard.edu/mailman/listinfo/amelia