RE:AMELIA - Amelia

4 Dec 2002

I'd take the square root or log-plus-one to transform your data to a more 
normal scale before Amelia and then untransform after.

otherwise, the time-series-cross-sectional aspect will be the most 
difficult.  I would use Amelia's special features for these types of data, 
but then you should carefully check the imputations, as it seems you have 
been doing, to make sure they're reasonable.  Amelia should work in these 
situations a good deal better than SAS because of these features, but it 
will not necessarily fix all the problems.

Best of luck with your research,
Gary King

     : Gary King, King(a)Harvard.Edu    http://GKing.Harvard.Edu :
     : Center for Basic Research      Direct    (617) 495-2027 :
     :   in the Social Sciences       Assistant (617) 495-9271 :
     : 34 Kirkland Street, Rm. 2      HU-MIT DC (617) 495-4734 :
     : Harvard U, Cambridge, MA 02138    eFax   (928) 832-7022 :

On Mon, 2 Dec 2002, Akunda, Eric wrote:

...
  Dear Prof King,

 I have a data set that has missing values for some variables in the
 following way. I would like to describe the missing data to determine
 whether using Amelia will be appropriate (I have read the article that
 accompanies Amelia).

 My data are for the pharmaceutical industry. I have # of prescriptions
 sold as my response (which I convert to market shares). I then have as
 covariates the # of minutes the salespeople detail to physicians, the
 counts of these details, the detailing expenditures (in dollars), the
 price per prescription in dollars, the expenditure on direct-to-consumer
 advertising, the events and meetings expenditures (in dollars), and
 journal advertising. The latter three variables are incomplete because
 the firm collecting the data (from whom we purchased the same) did not
 collect the data at the time, and seems to have began to collect these
 variables on enquiries from its clients. The data are time series
 (monthly) and cross-sectional (by brand), and based on the fact that the
 brands enter the market sequentially create and unbalanced panel. I
 intend to use Amelia (the time-series cross-sectional function) to
 impute the missing values. However, my question is, would this be
 appropriate, based on the fact that the data are not really missing at
 random? Would this necessitate coming up with a specification unique to
 the data as you have advised in the paper? Eliminating observations in
 this dataset is unthinkable because the observations are very few (about
 500 for the pooled brands including those with missing values on the
 three covariates). I have tried to impute the data using proc MI in sas
 (based on the book by Paul Allison), but some of the imputed values were
 poor. I have the feeling that the importance resampling coupled with
 specifying a range for the missing values would improve the imputations
 (especially so because the specification for the panel data is clearer
 compared to the proc mi specification based on "my" understanding of
 both). What would your advice be? Thanks very much in advance for taking
 your precious time to provide an answer. I will update you on my
 experiences with using Amelia for this dataset.

 Kindest regards,
 Sincerely,
 Eric Akunda.
 PhD student, Marketing.
 University of North Carolina at Chapel Hill.