I'd take the square root or log-plus-one to transform your data to a more
normal scale before Amelia and then untransform after.
otherwise, the time-series-cross-sectional aspect will be the most
difficult. I would use Amelia's special features for these types of data,
but then you should carefully check the imputations, as it seems you have
been doing, to make sure they're reasonable. Amelia should work in these
situations a good deal better than SAS because of these features, but it
will not necessarily fix all the problems.
Best of luck with your research,
Gary King
: Gary King, King(a)Harvard.Edu
http://GKing.Harvard.Edu :
: Center for Basic Research Direct (617) 495-2027 :
: in the Social Sciences Assistant (617) 495-9271 :
: 34 Kirkland Street, Rm. 2 HU-MIT DC (617) 495-4734 :
: Harvard U, Cambridge, MA 02138 eFax (928) 832-7022 :
On Mon, 2 Dec 2002, Akunda, Eric wrote:
Dear Prof King,
I have a data set that has missing values for some variables in the
following way. I would like to describe the missing data to determine
whether using Amelia will be appropriate (I have read the article that
accompanies Amelia).
My data are for the pharmaceutical industry. I have # of prescriptions
sold as my response (which I convert to market shares). I then have as
covariates the # of minutes the salespeople detail to physicians, the
counts of these details, the detailing expenditures (in dollars), the
price per prescription in dollars, the expenditure on direct-to-consumer
advertising, the events and meetings expenditures (in dollars), and
journal advertising. The latter three variables are incomplete because
the firm collecting the data (from whom we purchased the same) did not
collect the data at the time, and seems to have began to collect these
variables on enquiries from its clients. The data are time series
(monthly) and cross-sectional (by brand), and based on the fact that the
brands enter the market sequentially create and unbalanced panel. I
intend to use Amelia (the time-series cross-sectional function) to
impute the missing values. However, my question is, would this be
appropriate, based on the fact that the data are not really missing at
random? Would this necessitate coming up with a specification unique to
the data as you have advised in the paper? Eliminating observations in
this dataset is unthinkable because the observations are very few (about
500 for the pooled brands including those with missing values on the
three covariates). I have tried to impute the data using proc MI in sas
(based on the book by Paul Allison), but some of the imputed values were
poor. I have the feeling that the importance resampling coupled with
specifying a range for the missing values would improve the imputations
(especially so because the specification for the panel data is clearer
compared to the proc mi specification based on "my" understanding of
both). What would your advice be? Thanks very much in advance for taking
your precious time to provide an answer. I will update you on my
experiences with using Amelia for this dataset.
Kindest regards,
Sincerely,
Eric Akunda.
PhD student, Marketing.
University of North Carolina at Chapel Hill.