Re: [amelia] Nested but not time series

12 Jun 2008

Matt, Nick

...
  Amelia, however, will not give the same answer for
each individual in
 the organization. Let's say, for example, the variable you want to
 impute is the organization's revenue. Amelia will impute a different
 value of the revenue for each individual observation in the
 organization. One way to deal with this might be to take the mean
 imputed revenue in each organization for each imputed dataset and use
 that as the imputed revenue for that organization in that imputed
 dataset.  There may be smarter ways of handling this sort of logical
 constraint, but that might work to get you started. 

Fortunately this is not what Nick seems to have but since you brought  
it up: my $0.02 on the topic...

I have been doing a lot of thinking about this.  The problem with what  
you propose above is that (I think) it has a tendency to converge to  
the expected value of level 2 decreasing the variance across  
imputations.  This leads to decreased standard errors in the analysis  
just like single imputation procedures that impute the expected value.

I had a good chat about this with Joe Schafer two summers ago at a  
conference and he suggested the following:

If you have a multi-level dataset (such as what Nick describes here),  
first aggregate up to the second level by averaging all your variables  
that vary within clusters.  Make sure to include the variables that do  
not vary across clusters.  Run multiple imputation on this single  
level dataset.  Produce 10 level 1 datasets (well, or however many you  
deemed necessary in previous step) with the newly imputed level 2  
datasets.  Impute the level 1 datasets once for each of the 10...

There is a problem with this too.  If you have data like I did where  
there was a lot of systematic missing data on level 1 (In my case it  
was economic, corruption and democracy indicators for countries) the  
first step of aggregation to level 2 will ignore the missing data  
while averaging producing very biased level 2 aggregates.

There might be a way to overcome this problem by producing projections  
as to what the missing values would have been and correcting the  
averages but how to implement this is well beyond me...  But I think  
this might be the right track to deal with level 2 missing through a  
multi-step imputation procedure that first imputes at level 2 and than  
at level 1.  The real question is how to make sure that when you  
aggregate up to level 2 at the first step how to correct for the  
missing data at level 1 beyond averaging that has to assume MCAR.   
Some estimation is needed to come up with good projected level 2  
aggregates.  Alternatively an iterative algorithm could also work.   
Aggregate up by averaging, impute level 1, delete level 2 imputed  
values, start over by averaging up...  Once these averages do not  
change anymore, you have good level 2 imputations.

But then again, this might converge to something that does not  
consider the missing data uncertainty...

Whomever will solve the level 2 missing issue will certainly get a lot  
of cites.  At this point the only software I know that even attempts  
to address this is Mplus.  Even that you pretty much have to have very  
little missing data and say multiple prayers for convergence...

Rereading this email I am not sure how clear it is.  I don't think I  
was as articulate as I could be if I wasn't so tired :)  If this  
sparked any interest and you need clarification on something, please  
ask...

Cheers

L
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [amelia] Nested but not time series