Matt, Nick
Amelia, however, will not give the same answer for
each individual in
the organization. Let's say, for example, the variable you want to
impute is the organization's revenue. Amelia will impute a different
value of the revenue for each individual observation in the
organization. One way to deal with this might be to take the mean
imputed revenue in each organization for each imputed dataset and use
that as the imputed revenue for that organization in that imputed
dataset. There may be smarter ways of handling this sort of logical
constraint, but that might work to get you started.
Fortunately this is not what Nick seems to have but since you brought
it up: my $0.02 on the topic...
I have been doing a lot of thinking about this. The problem with what
you propose above is that (I think) it has a tendency to converge to
the expected value of level 2 decreasing the variance across
imputations. This leads to decreased standard errors in the analysis
just like single imputation procedures that impute the expected value.
I had a good chat about this with Joe Schafer two summers ago at a
conference and he suggested the following:
If you have a multi-level dataset (such as what Nick describes here),
first aggregate up to the second level by averaging all your variables
that vary within clusters. Make sure to include the variables that do
not vary across clusters. Run multiple imputation on this single
level dataset. Produce 10 level 1 datasets (well, or however many you
deemed necessary in previous step) with the newly imputed level 2
datasets. Impute the level 1 datasets once for each of the 10...
There is a problem with this too. If you have data like I did where
there was a lot of systematic missing data on level 1 (In my case it
was economic, corruption and democracy indicators for countries) the
first step of aggregation to level 2 will ignore the missing data
while averaging producing very biased level 2 aggregates.
There might be a way to overcome this problem by producing projections
as to what the missing values would have been and correcting the
averages but how to implement this is well beyond me... But I think
this might be the right track to deal with level 2 missing through a
multi-step imputation procedure that first imputes at level 2 and than
at level 1. The real question is how to make sure that when you
aggregate up to level 2 at the first step how to correct for the
missing data at level 1 beyond averaging that has to assume MCAR.
Some estimation is needed to come up with good projected level 2
aggregates. Alternatively an iterative algorithm could also work.
Aggregate up by averaging, impute level 1, delete level 2 imputed
values, start over by averaging up... Once these averages do not
change anymore, you have good level 2 imputations.
But then again, this might converge to something that does not
consider the missing data uncertainty...
Whomever will solve the level 2 missing issue will certainly get a lot
of cites. At this point the only software I know that even attempts
to address this is Mplus. Even that you pretty much have to have very
little missing data and say multiple prayers for convergence...
Rereading this email I am not sure how clear it is. I don't think I
was as articulate as I could be if I wasn't so tired :) If this
sparked any interest and you need clarification on something, please
ask...
Cheers
L
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive:
http://lists.gking.harvard.edu/?info=amelia