Hi Akthem, 

I believe this error is due to the computer or R running out of RAM. You could try to see if the code runs without the parallel argument (probably setting m = 1 to test one imputation). Sometimes parallel doesn't handle large data sets well. If you get an error message there, then it might be the case that some of the internal copying of the data.frame is causing RAM to max out (we do try to minimize this). Let us know if that works for you. 

Cheers,
Matt


~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Government
Harvard University

On Wed, Jan 10, 2018 at 8:52 AM, Akthem Rehab <akthem@gmail.com> wrote:

Hi All,

I am using Amelia to impute a time series data set generated from sensors in an industrial setting. Doing that for 8 variables (I only picked continuous variables for imputation) and ~40M readings (a reading/second).

 

Here is my Amelia code:

 

Test <- amelia(Query1[1:2e6,], m=3, p2s=2, cs=NULL, ts=”TIME”, incheck = T, parallel = “snow”, ncpus = 3, collect = T,

                Idvars = c(“D78”, “D82”, “D83”), lags = “C0”, “C1”, “C5”, “C6”, “C16”, “C17”, “C18”, “C19”),
                leads = “C0”, “C1”, “C5”, “C6”, “C16”, “C17”, “C18”, “C19”))

 

The code runs fine as long as the number of readings does not exceed ~1.2M. After that I receive the following error:

 

Error in unserialize(node$con) : error reading from connection

 

Some investigation shows that this has to do with the parallel workers. I noticed that the memory/worker does not exceed ~4GB and then goes back down before generating the error.

 

I am running Windows Server 2016 with Oracle Distribution of R v 3.3.0. Amelia is version 1.7.4.

 

I tried to troubleshoot with Oracle Community support before finding out that the issue also occurs when the data is a data.frame and not an ORE.Frame.

 

Here is the link for the troubleshooting thread - https://community.oracle.com/thread/4109587

 

Appreciate your support.

 

Regards,
Akthem