On Fri, Sep 17, 2010 at 3:13 PM, Fernando Mayer <fernandomayer@gmail.com> wrote:

Dear Dr. King,

thank you very much for your explanation. If I understand what you
told, I should compute the mean for each month and county with the
available m = 15 datasets, and estimate the variance for each of this
point estimates according to equation (3) of King et al. (2001) [1]
(as proposed by Rubin), since this equation consider both within and
across variances? Should this be suffice?

Kindly regards,

---
Fernando Mayer
e-mail: fernandomayer [@] gmail.com

On Fri, Sep 17, 2010 at 8:24 AM, Gary King <king@harvard.edu> wrote:
> yes, that would work, but you probably shouldn't discard the imputations
> after that since the variation for a cell value across the imputations
> reflects the uncertainty of your (averaged) point estimate. Some will
> likely have larger variances than others and so there is real information in
> those data. One thing you could do to recoup some of the information is to
> summarize each point with a mean and a variance.
> Gary
> --
> Gary King - Albert J. Weatherhead III University Professor - Director, IQSS
> - Harvard University
> GKing.Harvard.edu - King@Harvard.edu - @kinggary - 617-500-7570 - Asst
> 495-9271 - Fax 812-8581
>
>
> On Fri, Sep 17, 2010 at 1:14 AM, Fernando Mayer <fernandomayer@gmail.com>
> wrote:
>>
>> Hi,
>>
>> I have a dataset where my variable of interest is the fisheries
>> production (a continuous variable). This dataset contains information,
>> in general, from 2005 to 2007 by month and county, which characterizes
>> a time-series-cross-section data. What I need to do is to impute the
>> values for 2008 for every month and county, based on past values and
>> trends. There are some values for some counties only at the beginning
>> of 2008 (mainly for the first four months), all the rest is missing.
>>
>> Since the sample design is fixed (i.e. every month all counties were
>> visited to collect information), I created this unavailable counties
>> and months for 2008 (based on previous available information), and
>> filled with NA the fisheries production I wanted to impute. Then I
>> used Amelia II to impute the values as follows:
>>
>> out <- amelia(data.na, ts = "TIME", cs = "COUNTY", polytime = 2,
>> logs = "PROD", p2s = 2, m = 15,
>> lags = "PROD", leads = "PROD",
>> empri = 0.1 * nrow(data.na), intercs = TRUE)
>>
>> where data.na is my dataset, TIME is continuous from the first to the
>> last available information ordered according to year and month, COUNTY
>> are the counties, and PROD is the variable of interest. I used logs=
>> because the data is highly skewed. I also used lags= and leads=, and a
>> ridge prior (empri=) due to the high rate of missingness.
>>
>> Now, my aim here is not to make any further data analysis. The
>> objective of the imputation is to have an estimated production for
>> each county and month, only with the purpose of information, since
>> there were no data collection for the imputed period. Said that, my
>> question is: to have this estimated production could I just take the
>> mean of this m = 15 imputed values? If not, what would be the best
>> approach to get these result?
>>
>> Thanks in advance,
>>
>> ---
>> Fernando Mayer
>> e-mail: fernandomayer [@] gmail.com
>> -
>> Amelia mailing list served by Harvard-MIT Data Center
>> [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
>> More info about Amelia: http://gking.harvard.edu/amelia
>
>
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
More info about Amelia: http://gking.harvard.edu/amelia