Thanks for your responses, Matt and Cyrus. I guess I didn't express
my thoughts clearly. Say we have variables A and B and C, missing
40%, 40%, and 10% of the cases respectively. All three variables are
to be included in the regression model. Our objective is to keep
original (unimputed) data as much as we can, and only impute when we
have to. We think 10% is tolerable, but want to impute on A and B.
So there are three things we are considering doing:
1) impute on A and B, then merge imputed A and B back into the
original data set. My question: can we now include the imputed A and
B and unimputed C together in a regression model and use the rules to
combine the results (since there will be five or ten sets of A's and
B's)?
2) Another way to achieve this objective is to include C in the Amelia
model as an ID variable so that C won't be imputed. But as I
understand, C then SHOULD NOT be included in the regression model
either. Is this right, Matt? Is so, this route won't work for us.
3) Suppose X is an additive index with A and B. Now with imputed A
and B, we have all cases of X with valid values. But if we feel
uncomfortable with imputing values to cases where BOTH A and B are
missing (say 15% of the cases), can we then convert these cases back
to missing and use them along with other imputed cases? Matt
cautioned against introducing missingness back into imputed data.
Could you briefly explain why, Matt?
Sorry about the lengthy email. I am new to Amelia. Many thanks to
both of you for answering our questions. Your efforts are deeply
appreciated.
Best,
Shanruo
On Apr 1, 2010, at 8:18 AM, Matt Blackwell wrote:
This is correct. Amelia will only fill in missing
cells with imputed
values. Your observed data is always preserved in the imputed
datasets, even when it is used in the imputation.
Cheers,
matt.
On Thu, Apr 1, 2010 at 11:12 AM, Cyrus Samii <cds81(a)columbia.edu>
wrote:
> On this, Matt, correct me if I am wrong, but Amelia won't overwrite
> existing data with imputed values, even though it will fully
> incorporate variables with no missingness into the model (to get the
> covariances, etc). So if I follow what you are saying, Shanruo, what
> you are proposing is what is pretty much always done. Or maybe I am
> misunderstanding...
>
> Cyrus
>
> On Wed, Mar 31, 2010 at 5:52 PM, Matt Blackwell
> <blackwel(a)fas.harvard.edu> wrote:
>> Hi Shanruo,
>>
>> I'm not exactly sure what you mean. If there are certain variables
>> in
>> your data that you do not want to impute, you can always set them to
>> an ID variable in the Amelia options. But you should only do this
>> for
>> variables that you do not want to include in the analysis or
>> imputation, such as unit-specific identifiers. In most cases you do
>> not want to re-introduce missingness into your data by removing
>> imputed values.
>>
>> In any event, you can always use the combination rules for getting
>> your estimates.
>>
>> Cheers,
>> matt.
>>
>> On Tue, Mar 30, 2010 at 12:00 PM, Shanruo Ning Zhang
>> <nizhang(a)calpoly.edu> wrote:
>>> Dear Amelia authors,
>>> As an effort to reduce the amount of imputed values used in our
>>> analysis, my
>>> co-author and I are contemplating merging some imputed variables
>>> back into
>>> the original (un-imputed) data set and use imputed and un-imputed
>>> variables
>>> together. Is this a feasible plan? If so, can I still use the
>>> functions on
>>> page 5-6 on the the Amelia User's Guide (Amelia II: A Program for
>>> Missing
>>> Data) to combine the results?
>>> Thanks very much.
>>> Best,
>>> Shanruo Ning Zhang
>>> Assistant Professor
>>> California Polytechnic State University
>>> San Luis Obispo, CA 93401
>>>
>>> On Mar 17, 2010, at 6:19 PM, Matt Blackwell wrote:
>>>
>>>> Hi Sivan,
>>>>
>>>> Typically, the length of the chain indicated how much
>>>> information is
>>>> in the missing part of your data. So, lower chain lengths are
>>>> generally better and 4-6 are fairly short chains. Chains in large,
>>>> complicated datasets can easily reach into the hundreds or
>>>> thousands.
>>>>
>>>> Cheers,
>>>> matt.
>>>>
>>>> On Wed, Mar 17, 2010 at 9:08 PM, Sivan Rotenberg
>>>> <sivanrotenberg(a)gmail.com> wrote:
>>>>>
>>>>> Hi Matt,
>>>>>
>>>>> Thank you very much! I have another question. Is there an
>>>>> optimum number
>>>>> of
>>>>> chains per imputation? I run about 20 sets and when I did it
>>>>> for my data
>>>>> I
>>>>> got chain lengths between 4-6, is that ok? Does the chain
>>>>> length provide
>>>>> any
>>>>> additional information?
>>>>>
>>>>> Thank you,
>>>>> Sivan
>>>>>
>>>>> On Wed, Mar 17, 2010 at 8:59 PM, Matt Blackwell
>>>>> <blackwel(a)fas.harvard.edu>
>>>>> wrote:
>>>>>>
>>>>>> Hi Sivan,
>>>>>>
>>>>>> I think they problem is that you may have most likely set the
>>>>>> the
>>>>>> polynomials of time to intersect with the cross-section.
>>>>>> Unfortunately
>>>>>> this adds many parameters to the model (2 for each cross-section
>>>>>> unit). If you uncheck this box (in the TSCS menu), then it
>>>>>> should run.
>>>>>>
>>>>>> Cheers,
>>>>>> matt.
>>>>>>
>>>>>> On Wed, Mar 17, 2010 at 8:30 PM, Sivan Rotenberg
>>>>>> <sivanrotenberg(a)gmail.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to impute my data file that has participants
>>>>>>> over three
>>>>>>> days of
>>>>>>> sampling and two predictors. I set up my data in SPSS with
>>>>>>> like this:
>>>>>>>
>>>>>>> ID Time predictor1 predictor2 measure 1 measure 2
>>>>>>> measure 3
>>>>>>> measure
>>>>>>> 4
>>>>>>> 101.0 1.0 10.0 8:58 1.097
-0.492
>>>>>>> 9.680 -0.095
>>>>>>> 101.0 2.0 10.0 9:00 2.399
0.299
>>>>>>> 13.202 -0.801
>>>>>>> 101.0 3.0 10.0 8:15 1.636
-0.316
>>>>>>> 20.445 -0.931
>>>>>>> 102.0 1.0 27.0 7:07 3.903
0.473
>>>>>>> 31.106 -0.916
>>>>>>> 102.0 2.0 27.0 7:21 3.797
0.473
>>>>>>> 31.994 -0.966
>>>>>>> 102.0 3.0 27.0 7:34 2.829
0.227
>>>>>>> 29.497 -0.958
>>>>>>>
>>>>>>> I tried using the time series cross sectional option using
>>>>>>> time as ts
>>>>>>> and ID
>>>>>>> as cs. I put the polynomials time as 1 (although I'm
really
>>>>>>> not sure
>>>>>>> about
>>>>>>> that) and used an EM prior of 4. At the bottom of my
>>>>>>> AmeliaView it says
>>>>>>> I
>>>>>>> have 381 observations and 9 variables, which does not exceed
>>>>>>> the
>>>>>>> p(p+3)/2
>>>>>>> formula yet I am still getting error code 34:
>>>>>>>
>>>>>>> Amelia Error Code: 34
>>>>>>> The number of observations in too low to estimate the number
>>>>>>> of
>>>>>>> parameters. You can either remove some variables, reduce
>>>>>>> the order of the time polynomial, or increase the empirical
>>>>>>> prior.
>>>>>>>
>>>>>>> You have recieved an error. You can close this window and
>>>>>>> reset
>>>>>>> various options to correct the error.
>>>>>>>
>>>>>>> I'm not sure what I'm doing wrong! Any help would
be greatly
>>>>>>> appreciated!
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Sivan
>>>>>>>
>>>>>
>>>>>
>>>> -
>>>> Amelia mailing list served by Harvard-MIT Data Center
>>>> [Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
>>>> More info about Amelia:
http://gking.harvard.edu/amelia
>>>
>>> -
>>> Amelia mailing list served by Harvard-MIT Data Center
>>> [Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
>>> More info about Amelia:
http://gking.harvard.edu/amelia
>>>
>>>
>> -
>> Amelia mailing list served by Harvard-MIT Data Center
>> [Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia
>> More info about Amelia:
http://gking.harvard.edu/amelia
>>
>
>
>
> --
> Cyrus Samii
> Political Science
> Columbia University
> cds81(a)columbia.edu
>
> Burundi Survey:
www.columbia.edu/~cds81/burundisurvey/
> ISERP Statistical Consulting:
iserp.columbia.edu/statistical-
> consulting
> Evidence in peacebuilding initiative:
peacebuildingsurveys.org
>
>
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: