I added a new real number variable to my dataset, but when I try to impute
it along with the rest of the dataset, I receive the following error:
Error in if (sum(non.vary == 0)) { :
argument is not interpretable as logical
It is a real number variable (checked it with is.real()), and it doesn't
seem to be different from the other real number variables in the dataset.
I've updated to the newest version of Amelia (Version 1.2-17, built:
2010-05-10). I noticed from looking through the mailing list archives that
this error was present in 1.2-14, and was fixed (by Matt Blackwell) in
1.2.-15. It seems to be back. Perhaps this is my fault somehow however. Any
help would be appreciated.
Thank you,
Gregory
Dear fellow Amelia users,
For my PhD project I am currently working on a pooled cross-sectional time-series dataset which contains many missing values. Since Im new to Amelia, I have a couple of questions on the use of the program. I also tried to find the answers to my questions in the archive, but some issues are still unclear to me and I hope some of you could help me out. My questions are as follows:
1) My dataset on political parties spans the 1984-2006 period and I only have observations for 1984, 1988, 1992, 1996, 1999, 2002 and 2006, as data is only gathered around elections. Altogether this entails that I have far more missing values than observations. Would you still recommend using Amelia with such an enormous degree of missingness?
All my other questions deal with possible violations of the criterion that the imputation model should include at least as much information as will be used in the analysis model. This is stressed in the Amelia manual (page 10) and journal article.
2) My analysis model contains interaction-effects and Euclidian distance measures, I understand that I also have to add these to the imputation model. However, the consequence of this approach is that I end up with imputed interaction-effects and Euclidian distance measures that dont make any sense, as Amelia does not know how these variables are constructed. For example: in my analysis model, the interaction effect C is meant to be A multiplied by B, but the Amelia algorithm will replace missing values of C by something different than A*B. Since this fundamentally alters the goal of my analysis, I wonder whether it is also allowed to transform the data AFTER running Amelia. In case of the example above, this would imply that I would only include A and B in my imputation model and compute interaction-effect C myself after running Amelia. Is this a good procedure, or would it bias my results?
3)In my analysis model I focus on lagged or first-differenced effects of my X variables on Y. Do I also have to make these transformations before I run the imputation model, or can I lag/take first differences of my variables after running Amelia? The latter would be much more practical to me, because I always have regular 3-4 year intervals of missingness between 2 datapoints in my time-series, which means that I will be unable to take first differences of any variable before running the imputation model (as this will generate a variable that is always missing).
4)My final question is whether it is allowed to add new data to the analysis model after the data has been imputed by Amelia. I want to do this, because I would like to merge parties with their supporters on the basis of left-right positions. However, in order to know the left-right positions of the parties for every year, I first have to impute the missing data on the parties left-right positions. After running these imputations I have all the information I need in order to merge the parties with another dataset that contains information on their supporters. Do I want to do this, or would I again violate the assumption that the analysis model must contain the same information as the imputation model?
I hope my questions make sense (Im sorry in case they dont) and hope that some of you have any advice. Your help is very much appreciated. Let me know if anything is unclear, so I can clarify it.
Best regards,
Marc
Hi,
I have a data set that is based on observations of vehicles by lane.
For example, each truck that passes the detector will be counted, and
its characteristics recorded (length, weight etc). By summing up the
counts into higher time periods, say an hour, I can use Amelia to
impute missing counts of vehicles (statisticians look the other way,
but I tell Amelia that the time series varies by time of day (the ts
variable runs from 0 to 24) and by inserting day of week as the cs
(cross section) variable (0 through 6). While that may be
non-standard perversion of the input parameters, it seems to work
pretty well.) I have other data for the missing periods from other
detectors, so I think it makes sense to try to use Amelia rather than
simply estimating a time series model for the missing counts.
Now that I can impute counts I want to impute missing characteristics.
For example in an hour of good observation, every truck will have a
length recorded. When the detector is kaput for some reason, I want
to impute the missing average lengths along with the missing truck
counts.
The problem is that sometimes there are no observations (a true count
of zero) for a period, and so the expected length for the period is a
"true" NA, rather than just a missing variable. This is quite common;
while the trucks are *usually* in the right hand lanes, they are
sometimes are detected in the middle lanes. The middle lane detectors
therefore *usually* have a count of zero and indeterminate characteristics.
My question is how to proceed using Amelia. My naive strategy would
be to run Amelia once to impute the counts, and then run Amelia again
for each imputation (5 times), for the characteristics of the vehicles
(as a non-time dependent imputation) *only* for the non-zero periods
and lanes, and then use Zelig to compute average lengths. Does this
make sense, or have I crossed the line from imputation to imagination?
My other thought would be to aggregate up to daily periods and make it
so there should never be zero counts, but I'd really like to preserve
the hourly variation in the data.
One other note: I've coded my data by observation time (with multiple
lanes of data). I could also code it as one record per lane per
observation time, which would allow me to drop zero count lanes. I
just can't see how this would help.
Any advice would be appreciated.
Regards,
James Marca
Hi,
I have a dataset where my variable of interest is the fisheries
production (a continuous variable). This dataset contains information,
in general, from 2005 to 2007 by month and county, which characterizes
a time-series-cross-section data. What I need to do is to impute the
values for 2008 for every month and county, based on past values and
trends. There are some values for some counties only at the beginning
of 2008 (mainly for the first four months), all the rest is missing.
Since the sample design is fixed (i.e. every month all counties were
visited to collect information), I created this unavailable counties
and months for 2008 (based on previous available information), and
filled with NA the fisheries production I wanted to impute. Then I
used Amelia II to impute the values as follows:
out <- amelia(data.na, ts = "TIME", cs = "COUNTY", polytime = 2,
logs = "PROD", p2s = 2, m = 15,
lags = "PROD", leads = "PROD",
empri = 0.1 * nrow(data.na), intercs = TRUE)
where data.na is my dataset, TIME is continuous from the first to the
last available information ordered according to year and month, COUNTY
are the counties, and PROD is the variable of interest. I used logs=
because the data is highly skewed. I also used lags= and leads=, and a
ridge prior (empri=) due to the high rate of missingness.
Now, my aim here is not to make any further data analysis. The
objective of the imputation is to have an estimated production for
each county and month, only with the purpose of information, since
there were no data collection for the imputed period. Said that, my
question is: to have this estimated production could I just take the
mean of this m = 15 imputed values? If not, what would be the best
approach to get these result?
Thanks in advance,
---
Fernando Mayer
e-mail: fernandomayer [@] gmail.com
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia
>
> Hi all,
>
> I'm working with a panel data set which contains many missing values for
> demographic variable (e.g. mortality rates, life expectancy, etc). Some of
> them will be modeled as outcome variables. As it is well know, they are not
> missing at random, mostly concentrating around poor and non-democratic
> countries. Thus I am assuming I have to model the missing process or at
> least provide some information such as the mean of the missing data should
> lower or higher for a particular set of countries. I have seen that I can
> provide some prior for the imputation procedure in your software but I not
> sure whether this is an explicit model of the non-ignorable process. Any
> suggestions?
>
> Congratulations on your amazing software!
>
> Help and advice really appreciated,
>
> Antonio.
>
Hi out there!
I hear Amelia is amazing. I've always been a Stata user, but I'm happy to convert if someone can help me with this problem I have.
I am trying to to perform multiple imputation on a panel dataset where all variables have some missing values (save the unique ID numbers and the time variable). The missing rate varies from 1% to 10%.
I would like to ask what options Amelia offers me to move forward. Stata is a dead end, I think. I understand the issues with this level of missingness. Listwise deletion is not an option because 1) the missingness is very likely correlated with other observed variables (e.g., income), so the missingness is MAR at best, and 2) it would make the sample size too small to be useful. STATA doesn't offer pairwise deletion, so I'd have to code this up myself. And plus - according to a Stata listserv thread - pairwise deletion generates worse biases than listwise according to (Allison 2002).
So here's my situation: I have a rich panel dataset from a developing country that could yield some interesting policy results. It is the unfortunate consequence of working on data from a developing country, that the data has missing values. I've tried the mi functions in Stata using mvn (the multivariate normal estimation option), and I get error messages like the ones copied below. I've read in STATA's MI manual that doing univariate estimations for multiple imputations is incorrect procedure if the results are not used in independent analyses. I understand this, but it may be my only option.
Can Amelia help me?
Thanks,
jennie
ERROR MESSAGES
1)
Iteration 0: variance-covariance matrix (Sigma) is not positive definite
posterior distribution is not proper
2)
Iteration 0: imputed data contain missing values
This may occur when imputation variables are used as independent variables, when independent variables contain missing values, or when
variance-covariance matrix becomes not positive definite. You can specify option force if you wish to proceed anyway.
3)
Iteration 0: variance-covariance matrix (Sigma) is not positive definite
EM did not converge
Hi I was looking at the source code to try to figure out the
difference between splinetime and polytime, and I *think* I came
across a bug. I found "polytime" inside of amcheck.r in a section
dealing with "splinetime". Here is my diff patch:
diff --git a/R/amcheck.r b/R/amcheck.r
index 246fa75..981ef9b 100644
--- a/R/amcheck.r
+++ b/R/amcheck.r
@@ -472,7 +472,7 @@ amcheck <- function(x,m=5,p2s=1,frontend=FALSE,idvars=NULL,logs=NULL,
if (!identical(splinetime,NULL)) {
#Error code: 54
#Spline of time are longer than one integer
- if (length(polytime) > 1) {
+ if (length(splinetime) > 1) {
error.code<-54
error.mess<-paste("The spline of time setting is greater than one integer.")
return(list(code=error.code,mess=error.mess))
@@ -497,7 +497,7 @@ amcheck <- function(x,m=5,p2s=1,frontend=FALSE,idvars=NULL,logs=NULL,
error.mess<-paste("You have set splines of time without setting the time series variable.")
return(list(code=error.code,mess=error.mess))
}
- if (all(!intercs,identical(polytime,0))) {
+ if (all(!intercs,identical(splinetime,0))) {
warning(paste("You've set the spline of time to zero with no interaction with \n",
"the cross-sectional variable. This has no effect on the imputation."))
}
Hope that is signal rather than noise.
Regards,
James Marca