Hi Joseph,
The interpretation of the coefficient for X on its own is a little tricky
here because of the interaction term. X not being significant when the XY
interaction is included only means that X has no statistically non-zero
impact on the mean function when Y=0. It might be that X still has a
significant impact on the outcome when Y=2 (you could test this out by
recentering the Y variable using I(Y-2) for example).
Your question about whether to include the interaction term is a really good
one. I think a couple of principles are operative: if there is a good
theoretical reason for keeping it than definitely keep it, and if there
isn't you might consider leaving it in anyway in order to keep your model as
flexible as possible (what's one degree of freedom lost especially if you
have a lot of data?). If we were operating purely in the world of
predictive inference (i.e. no theory to go on and none being tested, just
trying to predict as accurately as possible) then you could run a likelihood
ratio test. If the R^2 hasn't budged with the inclusion of the interaction
term, though, its a good bet that the likelihood ratio test won't suggest
inclusion of the extra interaction term anyway. Part of the problem with
hypothesis tests on parameters is that very tiny effects on the outcome will
pass hypothesis tests if you have a lot of data, so if yoususpect that is
what is happening here than you might feel compelled to drop the interaction
if you have some normative commitment to parsimony. Ultimately, these
questions of what to keep and what to drop must be founded on some kind of
hunch about the tradeoff between bias (dropping parameters might decrease
it) and variance (adding parameters might add to it). For people interested
in predictive inference, lots of approaches have been developed to automate
these decisions. See
http://www-stat.stanford.edu/~tibs/ElemStatLearn/ for
a good and free reference.
I guess my practical advice would be that if it suits your fancy and its
significant, than leave it in.
Iain
On Wed, Apr 7, 2010 at 3:36 PM, Gavinlertvatana, Poj <
pgavinlertvatana at hbs.edu> wrote:
Hi all,
I'm test two models in linear regression, and I get this situation:
? When I add variable X as a covariate, it is significant.
? When I add variable X*Y interaction, X*Y is significant but X
becomes insignificant
The fit (R-squared) is the equal for both models (c. 0.9).
How would I choose one over another? The former is more parsimonious, but
the second is just as valid, isn't it? I want to find arguments to choose
model 2 over model 1, but can?t really find a justification.
Best regards,
Joseph
Joseph Poj Gavinlertvatana
Doctoral student, Marketing
Harvard Business School
Wyss Hall, Soldiers Field, Boston, MA 02163
Ph 617.230.5907
Fx 617.496.4397
Txt/Vm 617.910.0563
Em pgavinlertvatana at
hbs.edu
_______________________________________________
gov2001-l mailing list
gov2001-l at
lists.fas.harvard.edu
http://lists.fas.harvard.edu/mailman/listinfo/gov2001-l