Amelia May 2002

amelia@lists.gking.harvard.edu

3 participants
9 discussions

by Gary King

On Thu, 30 May 2002, Jack Newton wrote: > Dr. King, > > I attempted to use your Amelia program on a data matrix with all > variables being nominal (and set as such in the "Other Options" screen). > Upon trying to analyze the data, Amelia exited with the following error: > > C:\GAUSS\SRC\SORTMC.SRC(89) : error G0057 : Procedure stack overflow - > expression too complex > Currently active call: SORTMC [89] > > I've attached the log file as well as the data matrix I was trying to > analyze. > > Any help would be greatly appreciated. > > Cheers, > Jack Newton > > You have around 3000 observations, a very large fraction of which is missing. Although you have only 8 columns of data, declaring them all nominal means that Amelia will create a very large number of variables from these (one less than the number of categories for each variable). This is probably more than can be handled by a big margin. Maybe you can find some other (continuous) variables to add to your dataset? Note also the Amelia Listserv, which I'm ccing and you can subscribe to at my web page. Best of luck. Gary King : Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu : : Center for Basic Research Direct (617) 495-2027 : : in the Social Sciences Assistant (617) 495-9271 : : 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 : : Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 : - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 10 months

Re: Amelia/Clarify

by Gary King

On Wed, 29 May 2002, Paul Warwick wrote: > I'm now able to generate the univariate means, thanks to some help from > Mike Tomz. > > I do have a larger question, though. Amelia doesn't like my data sets. > They consist of elite evaluations of party positions etc. in various > countries, so that each file has a small number of cases (between 10 and > 18) and a large number of variables. (Incidentally, in the file I > experimented on, more variables were complete (no missing data) than Amelia > would allow me to specify!). Generally speaking, the files are about > 55-70% valid data. Setting a prior as high as 1000 still failed, > apparently due to too much missing data. I can get a solution using SPSS's > MVA routine but, persuaded as always that EMis is better than EM, I'm > wondering if there are any other tricks I should try before giving up on > Amelia? > > I'm not sure who to address this to, so I'm sending it to you with a copy > to Mike. > > I'm planning to attend the summer meth. meeting in Seattle (just down the > road for me), so I imagine I'll see you there. > > Regards, > Paul > Paul, the best place to address questions like this is the Amelia Listserv <amelia(a)latte.harvard.edu>; we all get copies. The problem you're having is not specific to EMis. It is fundamental for any method of imputation. With so few observations, you couldn't really include many variables at all. This is true for EMis and essentially all other approaches. Since you seem to have a number of these small data sets, I would suggest that you consider stacking them up. You might have to include dummy variables (fixed effects) for the small data sets, but perhaps you could get away without them. Its the same problem of a small n that affects other research too; finding a way to borrow strength statistically from other related data can greatly increase statistical power. Anyway, it is worth a shot! Best of luck and see you in Seattle, Gary : Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu : : Center for Basic Research Direct (617) 495-2027 : : in the Social Sciences Assistant (617) 495-9271 : : 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 : : Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 : - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 10 months

Data Consideration for Amelia

by Matthew Vile

Dr. King, I'm sorry to keep pestering you with these questions; unfortunately, there is no one in my department with the kind of mathmatical training necessary to completely decipher the EMis algorithm ;-). I think I understand what you proscribe in King et al (2001) but I'd like to confirm it. (please note, I have not tried any of the following - I'm using an old PII with 128mb of RAM. Every attempt to run Amelia is a major chore - having to shut down all other background routines, etc. So, this time I'm asking first ...) Presume I have variable that counts the number of mentions (in an open-ended context) of a particular subject (in this case, crime as the "biggest problem"). This seems to me to be most closely related to a event-count type variable, implying that I should take the square root (or some partial power) to stablize the variance, etc. However, conceptually this variable is measuring salience which must be bounded at zero, implying I should be applying some form of non-linear (logistic, perhaps) transformation. Empirically, because of the significant positive skew of the variable, Amelia returns a large number of negative imputed values. What can be done about this situation (aside from springing for Gauss - which my department won't do)? I can see three options: 1. impute from the raw data, either leaving the continuous values or truncating any negative imputations to zero. 2. impute from transformed data using one or the other transformation 3. impute from the transformed data using both transformations Which seems best to you? If I do apply a transformation before imputation, do I "un-transform" the data after (in the case of the square-root, I'd say no because any negative imputations produced by Amelia would have their signs reversed...) Again, thank you for your time -- Matthew "ElectricBlooz" Vile UNO Survey Research Center http://www.swd.org/mardi/blooz.htm - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

re: program hang

by Matthew Vile

Thank you to those who responded. Shortly after I posted, I attempted running Amelia on the dataset I described again, but with only the most minimal options (ie. ascii output, no assigning new variable names). I also increased the max_workspace option to 1 mb less than my total "avaiable" RAM (from task manager) + the total size of my virtual memory. This time the program did execute, albeit very slowly. <grin> As near as I can tell, if you monitor Amelia as closely as I was - there will be times when appears to the system as if it had hung, even though it remains processing. The lesson - never give up! -- Matthew "ElectricBlooz" Vile UNO Survey Research Center http://www.swd.org/mardi/blooz.htm - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

program hang

by Matthew Vile

As a new user to Amelia, I have encountered a couple of difficulties; if any of you have seen the same type of problems and know how to fix them I'd appreciate any help you can give. System: PII @ 300mhz RAM: 128mb (@60mb available - from Task Manager) OS: Windows NT 4.0 (build 1381, Service Pack 6) Dataset: 1555 obs. 32 vars. (14 fully observed, 2 nom(w/ missing) w/ @6 cats. each) most variables <200 missing, 2 variables @ 50% missing Setup: ridge prior = 2 AMnds = 5 AMsn = 10 Method: EMis (Amelia for Windows) Intially, I received the workspace error message so I have set the max_workspace variable in the gsrun.cfg to 120mb (which should force the use of virtual memory). Now, the program appears to be running until it reaches stage 3. Once the program window displaces the stage 3 message it ceases to update, and the task manage reports that the "gauss" and "gauss-dos" applications are "not responding." Is this normal behavior (i.e. should I just wait) or is the program actually hanging at this phase? If it is, does anyone have any suggestions? On a related note: the program only allows for 32 characters of information in the "fully specified" global. Is there anyway that this can be increased? As it currently stands I have to leave out one of my fully specified variables because I run out of space on the line. Thanks in advance for your help. -- Matthew "ElectricBlooz" Vile UNO Survey Research Center http://www.swd.org/mardi/blooz.htm - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

Re: Amelia Errors

by Gary King

My guess is that the problem is your data transfer software and not Amelia. I haven't known Amelia to produce data sets without anything in them. Its possible of course, but I haven't seen it. I'm CCing the listserv in case some of my colleagues or others have a suggestion. Gary King On Wed, 8 May 2002, Bob Fitzgerald wrote: > Dear Prof. King and Colleagues: > > I ran Amelia (Windows version 2.0, 7/15/2001) on a dataset consisting of > 16,722 observations and 34 variables. Amelia was run on the following > platform: 1 gigahertz Pentium III CPU, 512 Megs Ram, Windows XP Professional. > > The following issues/questions arose (I have included the log file for your > reference): > > 1) The program ran through 16 iterations and appeared to conclude > successfully, generating 5 imputed data sets. I had initially tried to pass > Amelia a version 7 Stata data file, but the program failed to load the > Stata data (it opened another DOS window and sat there until I cancelled > the operation). I translated the file from Stata to Gauss format using > DBMSCOPY. The Gauss data file was successfully loaded and Amelia calculated > the correct Ns, means, etc. > > 2) I selected the Gauss file output option, and each of the 5 resulting > data files was byte equal to the size of the input data file. However, the > resulting data files are not populated, or if there are entries, they are > non-numeric. (I opened each in DBMSCOPY using the "View Data" option.) > > 3) I ran Amelia with the following parameters: AMempri=3. > AMFully: I entered as many as Amelia would accept (which was far fewer > than the number of fully populated fields in the data file--Can you > specify the hard coded limit or perhaps allow for entering a larger number > in a later version? > I changed the configuration file to set WORKSPACE=256.0, and > CACHE_SIZE=128. > > It seems peculiar that Amelia continued to generate the datasets if there > were substantial estimation problems in convergence or other procedures, > and further, that garbage datasets exactly equal in size to the original > Gauss input file could be produced. Any insights or assistance you can > provide would be most helpful. I would be happy to send one of the > resulting files to you for diagnostic purposes. Zipped each file is about 1 > meg. > > Please let me know if I can provide any further information that would be > helpful in determining if there are problems in the current version of Amelia. > > Many thanks. > > Bob > > Robert Fitzgerald > Senior Research Associate > MPR Associates, Inc. > 2150 Shattuck Avenue Suite 800 > Berkeley, CA 94704 > (510) 849-4942 - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

Re: missing data and Amelia

by Gary King

On Wed, 8 May 2002, Matthew Vile wrote: > Dr. King, > > I hope this email reaches you at a convenient time. I am student > working on my dissertation in public opinion and I have been considering > using your program Amelia to sovle some missing data problems. > > My question is - generally speaking, what is the maximum "missingness" > that Amelia can reasonably handle. I have read your APSR article (King > et al 2001), and I see that your Monte Carlo sims relied on a dataset in > which @ 5% of the data were missing. In your opinion, could Amelia > impute to a variable in which 50% were missing, 66%? Under what > conditions would you feel comfortable doing this? > > Specific example - the variable measuring fear of assault was only asked > to approxiamtely 700 individuals out of 1800. presuming I have > indicators of this variable (I do have some) would you feel comfortable > impute to the missing 1100 cases? > > I appreciate your consideration and any advice you might have. The answer depends on both the pattern and the level of missingness, but Amelia should work without a problem on the application you describe. The more data are missing, however, the more model-dependent your results will be. Of course, that would be true whether you use Amelia or any other method. Best of luck, Gary King : Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu : : Center for Basic Research Direct (617) 495-2027 : : in the Social Sciences Assistant (617) 495-9271 : : 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 : : Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 : - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

RE: [amelia] altering memory allocation in Amelia

by Gary King

On Fri, 3 May 2002, McElroy, Brendan wrote: > Dear Gary, > > Thanks for your response to the above request. I think I understand what > you mean by estimating theta: take a random sample of the dataset, run EMis > and take the average vector of coefficients and variance-covariance matrix > from the five imputed datasets and input them back into Amelia using the > AMmupr and AMsigpr options. I'm not sure what to do next, since I still > can't load the full dataset into Amelia. Can you help? i'd take a random sample and put that into amelia to get estimates of theta. then, at least using the Gauss version, you would be able to use that estimated theta matrix to produce the imputations on (all sequential) subsets of the data. We should automate this, but I don't think its been done yet. > > I've also got a second request. I can't seem to find anything on the STATA > combining commands 'mi', either on STATA's net resources or on the > statalist. Apparently there was a zip file on your website containing the > ado files at some stage. If you still have them, can you send them on to > me, or can you let me know where to go to get them? I'd have a look at Clarify, also at my web page. Clarify is Amelia-ready. Gary > > Thanks again. > > Brendan McElroy > HRB Research Fellow > Departments of Economics and General Practice > Aras na Laoi > University College Cork > Western Road > Cork > Ireland > Tel: +353 21 490 3522 > > > -----Original Message----- > From: Gary King [mailto:king@harvard.edu] > Sent: 27 April 2002 23:02 > To: McElroy, Brendan > Cc: 'amelia(a)latte.harvard.edu' > Subject: Re: [amelia] altering memory allocation in Amelia > > > > I don't know its ever been tested with that many observations. > I think a reasonable procedure would be to take a random sample just for > estimating theta and then doing the imputation for each observation. > that would be computationally efficient and wouldn't lose very much > efficiency. I'd have to look to see whether its possible to do this with > the present version of Amelia... > > Gary > > On Fri, 26 Apr 2002, McElroy, Brendan wrote: > > > I'm new to Amelia and I'm having problems with memory size. I have a > STATA > > dataset with nine variables and 400,751 observations weighing in at 7.8Mb. > > I can load three of the variables - cost, age and sex - into Amelia (for > > windows) but the program crashes when I try to run it. There are only 29 > > missing records on the age variable and the other two are fully coded. > The > > program crashes immediately when I try to load the full dataset. One of > the > > variables - disability - has 58,162 missing records and this is what I > > really want Amelia to help with. I guess I've two related questions: Is > my > > dataset so big that multiple imputation will take too long and I should > > revert to something like listwise deletion or least squares imputation, > > both of which STATA can handle easily? If it's not too big, how do I > alter > > the memory space allocated by the program? > > > > > Yours sincerely, > > > > > > Brendan McElroy > > > HRB Research Fellow > > > Departments of Economics and General Practice > > > Aras na Laoi > > > University College Cork > > > Western Road > > > Cork > > > Ireland > > > Tel: +353 21 490 3522 > > > > > > > - > > amelia mailing list served by Harvard-MIT Data Center > > List Address: amelia(a)latte.harvard.edu > > Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia > > > - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

Re: Amelia Query

by Gary King

no, you ought to be able to do both. Gary On Tue, 30 Apr 2002, bob wrote: > Thanks. > > Once I have identified country as fully observed (AMfully), should I NOT > then declare it as nominal (AMnoms)? Does that tell Amelia to attempt to > impute it? > > ----- Original Message ----- > From: Gary King <king(a)harvard.edu> > To: bob <bob(a)idasact.org.za> > Cc: Bob at UCT <rmattes(a)cssr.uct.ac.za>; Amelia Listserv > <amelia(a)latte.harvard.edu> > Sent: Tuesday, April 30, 2002 3:37 PM > Subject: Re: Amelia Query > > > > It might be that you have identified a fully observed variable as > > something to be imputed (you could try it without it), but I'm not > > positive. I'm forwarding this to the new Amelia listerv. Maybe one of my > > colleagues or someone else can help? > > > > Gary > > > > > > On Tue, 30 Apr 2002, bob wrote: > > > > > Gary, > > > > > > Thanks for your earlier reply re: Amelia. I think we have cleared up > the > > > working space problem. > > > > > > Now another challenge confronts us. > > > > > > During stage 1 of 4, (EM), the program bombs after iteration 34, and > gives > > > the message, "sweep: elements of m cannot be zero. Exec stopped in line > > > 79." > > > > > > I have run this twice to make sure I had checked the correct global > commands > > > for the correct variablres. The program was using the conditional > model. 2 > > > variables are fully obserfed, and 1 is nominal. The 12 category > "Country" > > > variable is both fully observed and nominal. > > > > > > Does the error message have something to do with this? > > > > > > Presently, there are 37 variables, and just over 20,000 observations. > > > > > > This is not time series data, but should both Country, as well as > > > Rural/Urban be identified as cross sections? > > > > > > Any help you can give would be greatly appreciated. > > > > > > Regards, > > > > > > Bob Mattes > > > > > > > > > ----- Original Message ----- > > > From: Gary King <king(a)harvard.edu> > > > To: mike <mike(a)idasact.org.za> > > > Cc: <ajoseph(a)fas.harvard.edu>; <tercer(a)latte.harvard.edu>; > > > <kscheve(a)latte.harvard.edu>; Robert Mattes <bob(a)idasact.org.za> > > > Sent: Thursday, March 28, 2002 3:05 PM > > > Subject: Re: your mail > > > > > > > > > > > > > > have a look at this: http://gking.harvard.edu/amelia/node55.html > > > > I think it will answer your question. > > > > Gary King > > > > > > > > : Gary King, King(a)Harvard.Edu http://GKing.Harvard.Edu : > > > > : Center for Basic Research Direct (617) 495-2027 : > > > > : in the Social Sciences Assistant (617) 495-9271 : > > > > : 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 : > > > > : Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 : > > > > > > > > On Thu, 28 Mar 2002, mike wrote: > > > > > > > > > Dear Gary King, > > > > > > > > > > This follows up on the message I left on your voice-mail earlier > today. > > > > > > > > > > I'm a political scientist from Michigan State University who, with > > > Robert Mattes of the University of Cape Town, co-directs the > Afrobarometer. > > > The Afrobarometer is a large-scale, cross-national survey research > project > > > on public attitudes to democracy and markets in 12 African countries. > > > > > > > > > > We are currently analyzing a fairly large data set (150 variables x > > > 21,000 cases) from Round 1 of the Afrobarometer. It contains quite a > bit of > > > missing data, both randomly distributed and country specific. On the > advice > > > in the AMELIA manual, we have cut down on missingness by rescaling > "don't > > > knows". We have also reduced the core variables for analysis to 37 > before > > > trying to implement AMELIA. > > > > > > > > > > Our basic problem is that AMELIA will not run when we include all > 21,000 > > > cases. It keeps giving us an error message that says "insufficient > > > workspace memory". We can get AMELIA to generate 5 imputed data sets > for a > > > sub-sample of 2000 cases. It takes about half an hour. And we can get > > > AMELIA to complete the iterations in Step 1 on 5,000 cases. But the > program > > > bombs in Step 2 when it tries to impute covariances (after about 45 > > > minutes), again yielding the same error message. Attempts at 10,000 and > > > 21,000 will not even start running. > > > > > > > > > > We have here a Pentium 4 computer with 1 Ghtz hard drive and 256 mg > free > > > RAM and 20 Gbyte disk. > > > > > > > > > > Our questions are as follows: > > > > > > > > > > * Is there a limit to the number of cases that AMELIA can handle? > Will > > > it work on a data set of 37 variables by 21,000 cases? > > > > > > > > > > * Does the error message refer to AMELIA's workspace or the > computer's > > > workspace? In other words, does the problem lie in the capacity of the > > > hardware or the software? > > > > > > > > > > * If we include a 12-value multinomial variable for "country", does > > > this take us over AMELIA's limit of 40 variables? We have tried to run > the > > > program both with and without this variable, with the same problematic > > > result. > > > > > > > > > > * For your information, we have specified one nominal variable > > > (country, when included) and two fully observed variables (one of which > is > > > country, when included). > > > > > > > > > > We will make one more effort to call you today before Cape Town > closes > > > down for Easter. In case you want to call, you can reach us at 011 27 > 83 > > > 234 0333. I have copied this message to your associates at Harvard in > case > > > you aren't in this week. If we have not spoken today, would you be so > kind > > > as to reply by e-mail, with a copy to bob(a)idasact.org.za. > > > > > > > > > > With many thanks for your time. We look forward to using your > valuable > > > program. > > > > > > > > > > Yours, > > > > > > > > > > Mike Bratton. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - amelia mailing list served by Harvard-MIT Data Center List Address: amelia(a)latte.harvard.edu Subscribe/Unsubscribe: http://lists.hmdc.harvard.edu/?info=amelia

21 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Amelia May 2002