Thank you very much for your answer.

Pablo Mitnik

Gary King wrote:
below...

On Sun, 12 Nov 2006, Pablo Mitnik wrote:

  
Dear Amelia-listers,

I have the following, probably atypical, missing data problem. My 
variable of interest is "hourly earnings," which I calculate by dividing 
"annual earnings" by "annual hours worked." The data set I am using (the 
March supplement of the Current Population Survey) has the first 
variable over 1960-2005. It also has the variables "total weeks worked" 
and "usual hours of work per week" in each year over 1960-2005, so 
"annual hours worked" can be obtained by multiplying these two 
variables. The problem is that while from 1977 on these variables are 
measured in integers (i.e, as the exact number of weeks and hours 
worked, in integers), over 1960-1976 they are measured in interval 
scales. Hence, for 1960-1976 the data may be considered as "partially 
missing." I am thinking of using Amelia II to do multiple imputation of 
these partially missing values, using the data for 1977-1978 as the 
basis for the imputation. These are my questions:

(a) Prima facie, is this a sensible idea?

If the answer to (a) is yes:

(b) Would it be better to impute each of the two variables ("total weeks 
worked" and "usual hours of work") separately, and then multiply them to 
obtain "total weeks worked," or to directly impute the latter variable? 
(My intuition is that the first method should be preferred, but I do not 
have a clear reason to give as to why this would be the case.)
    

its usually better to impute the separate variables in an index and 
combine them after imputation.

  
(c)  Should I treat the observations in each interval of the variable I am
imputing as a separate imputation problem, in which the boundaries of the
interval provide the boundaries for the imputed values? (This would assure
that each person gets imputed a value that is fully consistent with his or her
value as measured by the interval scale, but it would require me to do as many
imputations as intervals; moreover, for some intervals there may be
insufficient observations in 1977-78.)  Or should I impute all values at once,
and include the variable measured in an interval scale "on the right"? (The
problem of this is that the imputed values may be inconsistent with the values
measured in the interval scale).
    

yes, what you would ideally want here is an imputation method that is 
conditional on what you know about each variable (the paper by me and 
Jonathan Wand at my web site analyzes a dependent variable exactly like 
this in a single model, so that's one example; others are age coarsening 
models).  Amelia doesn't have these components and so it won't use that 
information, but you might be able to approximate it to some degree by 
rounding values to within the known intervals.  if you have a good model, 
this might work reasonably well.  i'd look closely at the results tho to 
make sure.


Gary king

  
I would appreciate any help a lot.

All best,

Pablo Mitnik


    
-
Amelia mailing list served by Harvard-MIT Data Center
[Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia
  

-- 
Pablo A. Mitnik
University of Wisconsin-Madison (http://www.wisc.edu/ )
Department of Sociology ( http://www.ssc.wisc.edu/soc/)
Center on Wisconsin Strategy (http://www.cows.org/ )
1180 Observatory Drive
Room 7114A
Madison, WI 53706
TEL (608) 2621839
E-mail: pmitnik@ssc.wisc.edu 
- Amelia mailing list served by Harvard-MIT Data Center [Un]Subscribe/View Archive: http://lists.gking.harvard.edu/?info=amelia