Amelia

amelia@lists.gking.harvard.edu

669 discussions

by Gary King

Hi Mark, a better practice would be to put the transformed variable in Amelia, get out the best possible imputations you can, do your analysis, and then transform results to your quantity of interest. Clarify or Zelig style analyses might help with that. Best of luck with your research, Gary -- *Gary King* - Albert J. Weatherhead III University Professor - Director, IQSS <http://iq.harvard.edu/> - Harvard University GaryKing.org - King(a)Harvard.edu - @KingGary <https://twitter.com/kinggary> - 617-500-7570 - Assistant <king-assist(a)iq.harvard.edu>: 617-495-9271 On Sat, Sep 15, 2018 at 4:33 PM Mark Seeto <markseeto(a)gmail.com> wrote: > Dear Amelia group, > > Suppose my data set has a variable v that I want to include as a > predictor variable in a regression model. Supoose that some > transformation of v, for example, sqrt(v) or log(50 - v), looks more > normally distributed than v does. However, to keep the interpretation > of the model simpler, I want to include v itself as a predictor > variable, not a transformation of v. > > What I had been doing previously was to use the "sqrts" or "logs" > argument of amelia(), and then use v (not the transformed v) in the > model. Or if a different transformation was required, I would create > the transformed variable then impute (with v as an idvar) then > back-transform, and use the back-transformed v in the model. > > Is this considered poor practice because I was using the transformed v > for imputation but using v itself in the regression model? If it is, > would I be better off simply imputing without using any transformation > of v, assuming that v is the variable I want to include in the > regression model? > > Thanks for any advice, and thanks to the Amelia team for all their work. > > Mark > -- > Amelia mailing list served by HUIT > [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia > More info about Amelia: http://gking.harvard.edu/amelia > Amelia mailing list > Amelia(a)lists.gking.harvard.edu > > To unsubscribe from this list or get other information: > > https://lists.gking.harvard.edu/mailman/listinfo/amelia >

5 years, 7 months

Transformations

by Mark Seeto

Dear Amelia group, Suppose my data set has a variable v that I want to include as a predictor variable in a regression model. Supoose that some transformation of v, for example, sqrt(v) or log(50 - v), looks more normally distributed than v does. However, to keep the interpretation of the model simpler, I want to include v itself as a predictor variable, not a transformation of v. What I had been doing previously was to use the "sqrts" or "logs" argument of amelia(), and then use v (not the transformed v) in the model. Or if a different transformation was required, I would create the transformed variable then impute (with v as an idvar) then back-transform, and use the back-transformed v in the model. Is this considered poor practice because I was using the transformed v for imputation but using v itself in the regression model? If it is, would I be better off simply imputing without using any transformation of v, assuming that v is the variable I want to include in the regression model? Thanks for any advice, and thanks to the Amelia team for all their work. Mark

5 years, 8 months

a new question surrounding collinearity

by Alex daSilva

Hello, I'm using Amelia II to impute missing data in a longitudinal setting. I'm running into similar warnings that others have noticed regarding a variable being perfectly collinear with another variable. I have 65 variables and 5 are deemed to be perfectly collinear. The most logical thing to do is to remove the 5 variables and continue with the imputation process, which I do and the model converges fine. However, I'm wondering if it makes sense to add these 5 variables back in *after *the imputation process (these 5 variables contain no missing data). I realize that it is ideal to have all the variables included in the original imputation model to best estimate the missing values. However, at first glance, it doesn't seem harmful to add back in variables that are collinear. Adding back in collinear features might seem weird, but I'll be analyzing the data with penalized regression and would like to keep all of the original data in the model. I'd appreciate any feedback! Thanks!

6 years

Order of Data Transformation: Before or After Using EMB?

by Michel

Hello. I have an additional question on data transformation. I was thinking on first of all applying the EMB algorithm on my data and then - after having fulfilled all my missing values - I'd transform my data into ln (natural logarithm). However, I'm not sure anymore if this is the most consistent way to proceed because I read on "AMELIA II: A Program for Missing Data" (Honaker, King, and Blackwell; 2012) the following: "Any variable that will be in the analysis model should also be in the imputation model. THIS INCLUDES ANY TRANSFORMATIONS (...)." So, please could you advise me on which is the most consistent way to proceed: either (i) fulfill all missing values using EMB algorithm and only after that transform my data into ln; OR (ii) transform my data into ln and then subsequently use the EMB algorithm to fulfill all missing values? I'm looking forward to hearing from you. Many added thanks for your help. Michel

6 years, 2 months

Trying to Load R File into Amelia II

by Michel

Hello. I'm trying to load a database from an R file (extension .RData) into Amelia II by using AmeliaView. However, when I choose the R file that contains my data to be loaded, it seems that Amelia II stops running because I cannot see any database loaded into it (Amelia's screen doesn't change at all), even waiting some time to see if the data eventually would be loaded, which never happened so far. So, could you please give me some guidance on what am I doing wrong? Just to give you some context about my database: it is a Cross-Sectional Time Series, consisting of the daily closing position of each stock market of the G-20 countries from 2003 to 2017, totaling around around 80,000 data points (i.e. around 4,000 values per country), out of which about 5% are missing values. Many thanks. Best wishes, Michel

6 years, 2 months

Re: [amelia] Amelia Error - Error in unserialize(node$con) : error reading from connection

by Matt Blackwell

Hi Akthem, I believe this error is due to the computer or R running out of RAM. You could try to see if the code runs without the parallel argument (probably setting m = 1 to test one imputation). Sometimes parallel doesn't handle large data sets well. If you get an error message there, then it might be the case that some of the internal copying of the data.frame is causing RAM to max out (we do try to minimize this). Let us know if that works for you. Cheers, Matt ~~~~~~~~~~~ Matthew Blackwell Assistant Professor of Government Harvard University url: http://www.mattblackwell.org On Wed, Jan 10, 2018 at 8:52 AM, Akthem Rehab <akthem(a)gmail.com> wrote: > Hi All, > > I am using Amelia to impute a time series data set generated from sensors > in an industrial setting. Doing that for 8 variables (I only picked > continuous variables for imputation) and ~40M readings (a reading/second). > > > > Here is my Amelia code: > > > > Test <- amelia(Query1[1:2e6,], m=3, p2s=2, cs=NULL, ts=”TIME”, incheck = > T, parallel = “snow”, ncpus = 3, collect = T, > > Idvars = c(“D78”, “D82”, “D83”), lags = “C0”, “C1”, “C5”, > “C6”, “C16”, “C17”, “C18”, “C19”), > leads = “C0”, “C1”, “C5”, “C6”, “C16”, “C17”, “C18”, > “C19”)) > > > > The code runs fine as long as the number of readings does not exceed > ~1.2M. After that I receive the following error: > > > > Error in unserialize(node$con) : error reading from connection > > > > Some investigation shows that this has to do with the parallel workers. I > noticed that the memory/worker does not exceed ~4GB and then goes back down > before generating the error. > > > > I am running Windows Server 2016 with Oracle Distribution of R v 3.3.0. > Amelia is version 1.7.4. > > > > I tried to troubleshoot with Oracle Community support before finding out > that the issue also occurs when the data is a data.frame and not an > ORE.Frame. > > > > Here is the link for the troubleshooting thread - > https://community.oracle.com/thread/4109587 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__community.oracle.com_t…> > > > > Appreciate your support. > > > > Regards, > Akthem > > >

6 years, 3 months

Amelia Error - Error in unserialize(node$con) : error reading fromconnection

by Akthem Rehab

Hi All, I am using Amelia to impute a time series data set generated from sensors in an industrial setting. Doing that for 8 variables (I only picked continuous variables for imputation) and ~40M readings (a reading/second). Here is my Amelia code: Test <- amelia(Query1[1:2e6,], m=3, p2s=2, cs=NULL, ts=”TIME”, incheck = T, parallel = “snow”, ncpus = 3, collect = T, Idvars = c(“D78”, “D82”, “D83”), lags = “C0”, “C1”, “C5”, “C6”, “C16”, “C17”, “C18”, “C19”), leads = “C0”, “C1”, “C5”, “C6”, “C16”, “C17”, “C18”, “C19”)) The code runs fine as long as the number of readings does not exceed ~1.2M. After that I receive the following error: Error in unserialize(node$con) : error reading from connection Some investigation shows that this has to do with the parallel workers. I noticed that the memory/worker does not exceed ~4GB and then goes back down before generating the error. I am running Windows Server 2016 with Oracle Distribution of R v 3.3.0. Amelia is version 1.7.4. I tried to troubleshoot with Oracle Community support before finding out that the issue also occurs when the data is a data.frame and not an ORE.Frame. Here is the link for the troubleshooting thread - https://community.oracle.com/thread/4109587 Appreciate your support. Regards, Akthem

6 years, 4 months

question on imposing bound on Amelia

by Julia CAGE

Hi, I am using Amelia to simulate missing vote values for French elections (before performing multiparty electoral data analysis using Clarify). I need to make sure that after the simulation the sum of the votes for the different parties (vFN + vPC + vPS + vUMP + vVerts) is below one (these are the main parties and some much smaller parties are not included). I thought the prior I generated (see below) would make the trick but it does not seem to work. And the thing is that in the imputed data I generate now, the sum of the votes is very often above 1 which makes no sense (and generate issue then on Clarify with the logistic transformation). Any idea of how I could handle that? Many thanks in advance, Best, Julia Here is my code: database <- read.dta13("rall.dta") prior <- matrix(NA, nrow=nrow(database),ncol=5) for (i in 1:nrow(database)){ v3 <- database$vFN[i] v5 <- database$vPC[i] v7 <- database$vPS[i] v9 <- database$vUMP[i] v11 <- database$vVerts[i] prior[i,] <- c(i, 3, 0, 1 - v5+v7+v9+v11, 0.999999) } prior <- prior[!is.na(prior[,4]),] a.out <- amelia(database, m = 5,ts="year", cs = "district",priors=prior,lgstc=c("vFN","vPC","vPS","vUMP","vVerts"),bound=rbind(c(4,0,Inf),c(6,0,Inf),c(8,0,Inf),c(10,0,Inf),c(12,0,Inf))) write.amelia(obj=a.out, file.stem = "R19932012/outdata", format = "dta")

6 years, 4 months

Error and too many time variables used

by Lawrence Chen

Hi, I’m new to using Amelia. I’m trying to impute missing data for a time-series cross-sectional data, but I'm having trouble running amelia() the way I think I should. I would greatly appreciate some guidance. I created a data.frame() that has 8 time points each for 260 participants and a single score column for which I’m trying to impute some missing data. The data frame has 2080 (i.e., 8*260) rows by 3 columns (“month”, “ID”, “score”). With this, I tried to run the following command: ``` a.out <- amelia(data, ts="month", cs="ID", polytime=2, intercs=TRUE, p2s=2) ``` It reported (which I terminated part way through after receiving errors): amelia starting beginning prep functions Variables used: score time.1 time.2 time.3 time.4 time.5 time.6 time.7 time.8 time.9 time.10 time.11 time.12 time.13 time.14 time.15 time.16 time.17 time.18 time.19 time.20 time.21 time.22 time.23 time.24 time.25 time.26 time.27 time.28 time.29 time.30 time.31 time.32 time.33 time.34 time.35 time.36 time.37 time.38 time.39 time.40 time.41 time.42 time.43 time.44 time.45 time.46 time.47 time.48 time.49 time.50 time.51 time.52 time.53 time.54 time.55 time.56 time.57 time.58 time.59 time.60 time.61 time.62 time.63 time.64 time.65 time.66 time.67 time.68 time.69 time.70 time.71 time.72 time.73 time.74 time.75 time.76 time.77 time.78 time.79 time.80 time.81 time.82 time.83 time.84 time.85 time.86 time.87 time.88 time.89 time.90 time.91 time.92 time.93 time.94 time.95 time.96 time.97 time.98 time.99 time.100 time.101 time.102 time.103 time.104 time.105 time.106 time.107 time.108 time.109 time.110 time.111 time.112 time.113 time.114 time.115 time.116 time.117 time.118 time.119 time.120 time.... <truncated> running bootstrap -- Imputation 1 -- setting up EM chain indicies 1(300713)! 2 error: inv_sympd(): matrix seems singular (216)! 3 error: inv_sympd(): matrix seems singular (208)! Warning message: In amelia.prep(x = x, m = m, idvars = idvars, empri = empri, ts = ts, : You have a small number of observations, relative to the number, of variables in the imputation model. Consider removing some variables, or reducing the order of time polynomials to reduce the number of parameters. I don’t understand the error. I also don’t understand how it determined the `time.x` variables—I know it has something to do with my number of participants but I don’t understand how. The warning message suggests I have too many variables because of this. When I tried using the “freetrade" dataset, it used way fewer `time.x` variables (i.e., 26) even though there were only 19 time points in the data set and didn’t have problems. Could someone explain to me about the error or what may be the problem and what I should do to correct it? Also, when using time series data, do I use amelia() differently whether the time variable is treated as chronological time (e.g., January, February, March, …) or time of onset (e.g., one month since birth, two months since birth, etc.)? Please advise. Best regards, Lawrence

6 years, 10 months

Studies of use cases of the Amelia (precipitation-temperature data)

by Laura Viviana Cabezas Pinzon

Hi, my name is Laura and I'm new using Amelia. I want to use Amelia to complete precipitation and temperature data. But I can not find references to studies that show such application. Does anyone know of studies and examples of use cases of the Amelia program for precipitation and temperature data? I would appreciate your collaboration! Best regards, Laura Cabezas

6 years, 10 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia