Hi Matt,

After running with ridge prior for 4 hours I am getting following error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :  contrasts can be applied only to factors with 2 or more levels
In addition: There were 19 warnings (use warnings() to see them)

I am using following code:

dt.out <- amelia(x = dt, m = 3, idvars = "device_unique_id", ts = "pickupdate", cs = "destinationcountry", 
        priors = NULL, lags = NULL, empri = 0.01*nrow(dt), polytime = 2, intercs = TRUE, p2s = 2, incheck = TRUE, ords = NULL,
       noms = c("cartype", "AirportTransaction", "status", "browser.x", "interactionchannel", "paymentmethod", 
       "segmentname", "ip_address", "geo_country", "geo_region", "operating_system", "browser.y", "language", 
       "creative_freq", "creative_rec", "user_group_id", "is_remarketing", "post_click_conv", "post_view_conv", "advertiser_frequency",              "advertiser_recency", "latitude", "longitude",  "device_model_id"))

Regards,

On Sat, Jan 30, 2016 at 9:35 AM, Matt Blackwell <mblackwell@gov.harvard.edu> wrote:
Hi Mithilesh, 

It's not so much a limitation on the number of observations, but you are asking a lot of Amelia here. If there are 28 categorical variables each with more than 10 categories (and you have marked them so), then you adding roughly 280 variables to the imputation model which is quite a few. But that shouldn't be too bad, given the size of your data. It seems more likely to be the extremely high missingness rate. You might try using the ridge prior ("empri" argument in the amelia function). See section 4.7.1 of vignette for more information about this setting: 

https://cran.r-project.org/web/packages/Amelia/vignettes/amelia.pdf

Cheers,
Matt

~~~~~~~~~~~
Matthew Blackwell
Assistant Professor of Government
Harvard University

On Fri, Jan 29, 2016 at 10:54 PM, Mithilesh Kumar <mithileshk.in@gmail.com> wrote:

I have 761,592 obs for 31 variables on users behaviours towards online ads. Out of 31 variables, 28 are categorical. Many cat. variables have more than 10 categories. I am using Amelia for missing data imputation.

It's taking very long time. Are there other ways to do it fast? What's the Amelia limits on number of observations ?

Is there any R-package which perform better on large dataset for missing data imputation?

I checked for complete cases, there are only 172 complete cases which is very insignificant as compare to total dataset.


--
Mithilesh Kumar




--
Amelia mailing list served by HUIT
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
Amelia mailing list
Amelia@lists.gking.harvard.edu

To unsubscribe from this list or get other information:

https://lists.gking.harvard.edu/mailman/listinfo/amelia




--
Mithilesh Kumar