yes, that would work, but you probably shouldn't discard the imputations after that since the variation for a cell value across the imputations reflects the uncertainty of your (averaged) point estimate.  Some will likely have larger variances than others and so there is real information in those data.  One thing you could do to recoup some of the information is to  summarize each point with a mean and a variance.

Gary
--
Gary KingAlbert J. Weatherhead III University Professor - Director, IQSS - Harvard University
GKing.Harvard.edu - King@Harvard.edu - @kinggary - 617-500-7570 - Asst 495-9271 - Fax 812-8581



On Fri, Sep 17, 2010 at 1:14 AM, Fernando Mayer <fernandomayer@gmail.com> wrote:
Hi,

I have a dataset where my variable of interest is the fisheries
production (a continuous variable). This dataset contains information,
in general, from 2005 to 2007 by month and county, which characterizes
a time-series-cross-section data. What I need to do is to impute the
values for 2008 for every month and county, based on past values and
trends. There are some values for some counties only at the beginning
of 2008 (mainly for the first four months), all the rest is missing.

Since the sample design is fixed (i.e. every month all counties were
visited to collect information), I created this unavailable counties
and months for 2008 (based on previous available information), and
filled with NA the fisheries production I wanted to impute. Then I
used Amelia II to impute the values as follows:

out <- amelia(data.na, ts = "TIME", cs = "COUNTY", polytime = 2,
                   logs = "PROD", p2s = 2, m = 15,
                   lags = "PROD", leads = "PROD",
                   empri = 0.1 * nrow(data.na), intercs = TRUE)

where data.na is my dataset, TIME is continuous from the first to the
last available information ordered according to year and month, COUNTY
are the counties, and PROD is the variable of interest. I used logs=
because the data is highly skewed. I also used lags= and leads=, and a
ridge prior (empri=) due to the high rate of missingness.

Now, my aim here is not to make any further data analysis. The
objective of the imputation is to have an estimated production for
each county and month, only with the purpose of information, since
there were no data collection for the imputed period. Said that, my
question is: to have this estimated production could I just take the
mean of this m = 15 imputed values? If not, what would be the best
approach to get these result?

Thanks in advance,

---
Fernando Mayer
e-mail: fernandomayer [@] gmail.com
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia