Re: [amelia] Data Imputation with Amelia on large dataset: Taking very long time

30 Jan 2016

Hi Mithilesh,

My guess is that you might be asking too much of the data here. You are
including a separate quadratic function of time for each cross-sectional
unit in the data (polytime = 2, intercs=TRUE) and this might be problematic
if some of the characteristics of the cross-sectional unit are constant
within unit. Can you try to run Amelia with intercs = FALSE and see if (a)
things speed up and (b) if the error message disappears?

Also, what version of Amelia are you using? There was a bug with that error
message in previous versions, but should be fixed in 1.7.4.

Cheers,
Matt

On Sat, Jan 30, 2016 at 12:53 PM, Mithilesh Kumar &lt;mithileshk.in(a)gmail.com&gt;
wrote:

...
  Hi Matt,

 After running with ridge prior for 4 hours I am getting following error:

 *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
  contrasts can be applied only to factors with 2 or more levels*
 *In addition: There were 19 warnings (use warnings() to see them)*

 I am using following code:

 dt.out <- amelia(x = dt, m = 3, idvars = "device_unique_id", ts =
 "pickupdate", cs = "destinationcountry",
         priors = NULL, lags = NULL, empri = 0.01*nrow(dt), polytime = 2,
 intercs = TRUE, p2s = 2, incheck = TRUE, ords = NULL,
        noms = c("cartype", "AirportTransaction", "status",
"browser.x",
 "interactionchannel", "paymentmethod",
        "segmentname", "ip_address", "geo_country",
"geo_region",
 "operating_system", "browser.y", "language",
        "creative_freq", "creative_rec", "user_group_id",
"is_remarketing",
 "post_click_conv", "post_view_conv",
"advertiser_frequency",
  "advertiser_recency", "latitude", "longitude", 
"device_model_id"))

 Regards,

 On Sat, Jan 30, 2016 at 9:35 AM, Matt Blackwell <
 mblackwell(a)gov.harvard.edu&gt; wrote:

  Hi Mithilesh,

 It's not so much a limitation on the number of observations, but you are
 asking a lot of Amelia here. If there are 28 categorical variables each
 with more than 10 categories (and you have marked them so), then you adding
 roughly 280 variables to the imputation model which is quite a few. But
 that shouldn't be too bad, given the size of your data. It seems more
 likely to be the extremely high missingness rate. You might try using the
 ridge prior ("empri" argument in the amelia function). See section 4.7.1 of
 vignette for more information about this setting:

 https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf

<https://urldefense.proofpoint.com/v2/url?u=https-3A__cran.r-2Dproject.org_web_packages_Amelia_vignettes_amelia.pdf&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=uEZ8qUv7U9gjWlMLKrTHFEkD3WeMo3tCAZqn7XKnGj8&s=sJ_wcTfgsvS3q8MLtKhFrLwQElq6TCoiEfXgMgKQwjo&e=>

 Cheers,
 Matt

 ~~~~~~~~~~~
 Matthew Blackwell
 Assistant Professor of Government
 Harvard University
 url: http://www.mattblackwell.org

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.mattblackwell.org&d=CwMFaQ&c=WO-RGvefibhHBZq3fL85hQ&r=EwICq0J5pL8CwgEJz8qkmauGonk0XmiLpxcYOEgk2a0&m=uEZ8qUv7U9gjWlMLKrTHFEkD3WeMo3tCAZqn7XKnGj8&s=std4gz2pQc2j7Th4J1LX3xAT4emsjOs2mjXiC8-Pb4w&e=>

 On Fri, Jan 29, 2016 at 10:54 PM, Mithilesh Kumar <
 mithileshk.in(a)gmail.com&gt; wrote:

  I have 761,592 obs for 31 variables on users
behaviours towards online
 ads. Out of 31 variables, 28 are categorical. Many cat. variables have more
 than 10 categories. I am using Amelia for missing data imputation.

 It's taking very long time. Are there other ways to do it fast? What's
 the Amelia limits on number of observations ?

 Is there any R-package which perform better on large dataset for missing
 data imputation?

 I checked for complete cases, there are only 172 complete cases which is
 very insignificant as compare to total dataset.

 --
 Mithilesh Kumar

 --
 Amelia mailing list served by HUIT
 [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
 More info about Amelia: http://gking.harvard.edu/amelia
 Amelia mailing list
 Amelia(a)lists.gking.harvard.edu

 To unsubscribe from this list or get other information:

 https://lists.gking.harvard.edu/mailman/listinfo/amelia

 --
 Mithilesh Kumar

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] Data Imputation with Amelia on large dataset: Taking very long time