Why the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term?

David Lu

Join Date: May 2016

Posts: 105
#1

Why the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term?

23 May 2016, 14:48

Hi all,

I used cross-sectional data and poisson regression. I did stepwise regression but found the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term. How comes it happen? And how do I solve this problem?

Thanks,
David

Code:

poisson income iv1t iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust) poisson income iv1t iv2 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust) poisson income c.iv1t##c.iv2 c.iv1t#c.iv1t c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode, vce(robust)

Reuslts:
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29961
#2

23 May 2016, 19:44

First, in a quadratic model, the statistical significance of the linear and quadratic terms should never be looked at alone. At best the joint significance of both coefficients is meaningful; either one by itself is not.

But in your final model, which also includes an interaction between iv1t and the linear term for iv2, you will notice that the coefficient of iv2 has barely changed at all. The standard error has gone up a little bit. You have gone slightly significant to almost significant. Nothing to write home about. Not even worth looking at, really. Nothing much has changed numerically. The quadratic iv2 coefficient has changed appreciably.

Now, all of that said, none of it matters. You are comparing apples to walnuts here. In the presence of the interaction iv1t#iv2, neither the iv2 term nor the iv2^2 term means the same thing that it does in the model without the interaction term. There is no reason, actually, to expect them to be the same, or even similar. With the interaction term, he iv2 and iv2#iv2 coefficients give you the quadratic representation of the iv2 effect conditional on iv1 = 0. If 0 is an important value of iv1, then perhaps this is of some interest. If, as is often the case, 0 isn't even within the range of observed values of iv1, then it's just a huge red herring. Remember that by putting the interaction with iv1 in you are no longer modeling a single quadratic effect of iv2. You are modeling an effect of iv2 that is a different quadratic for each value of iv1. And so it really only makes sense to talk about the effects of iv2 at specific, well-chosen values of iv1. To see what is going on in your model, it is best to use the -margins- command, followed by -marginsplot-.

In quadratic models, it is really pointless to directly interpret the coefficients of the linear and quadratic terms by themselves. The quadratic coefficient's sign tells you if you are dealing with an upright or upside down U relationship, and its magnitude tells you how wide or flat the parabola is. The linear coefficient in its own right has no meaning whatsoever. (Well, it is the slope of the parabola at the point where iv2 = 0--which may or may not be a useful number depending on your context.) The real use of the linear coefficient is to calculate -linear coeff/(2*quadratic coeff). This gives the value of iv2 where the parabola reaches its peak or nadir. It is the location of the axis of symmetry of the parabola, and your model means rather different things if that value falls squarely inside the range of observed values of iv2 (in which case you really do have a U), or beyond that range, in which case you have a slightly curvilinear relationship, but no real U. When you put an interaction term with iv1 in there, you now have different parabolas for each value of iv1. The linear coefficient for the parabola at a given value of iv1 is _b[iv2] + _b[iv1#iv2]*iv1. So, as iv1 changes, that linear coefficient changes with it (in a linear way) and therefore the center of the parabola moves as well. (Things would be more complicated still if you also interacted iv1 with iv2#iv2.)
1 like
Comment
David Lu

Join Date: May 2016

Posts: 105
#3

24 May 2016, 00:40

Originally posted by Clyde Schechter View Post

First, in a quadratic model, the statistical significance of the linear and quadratic terms should never be looked at alone. At best the joint significance of both coefficients is meaningful; either one by itself is not.

But in your final model, which also includes an interaction between iv1t and the linear term for iv2, you will notice that the coefficient of iv2 has barely changed at all. The standard error has gone up a little bit. You have gone slightly significant to almost significant. Nothing to write home about. Not even worth looking at, really. Nothing much has changed numerically. The quadratic iv2 coefficient has changed appreciably.

Now, all of that said, none of it matters. You are comparing apples to walnuts here. In the presence of the interaction iv1t#iv2, neither the iv2 term nor the iv2^2 term means the same thing that it does in the model without the interaction term. There is no reason, actually, to expect them to be the same, or even similar. With the interaction term, he iv2 and iv2#iv2 coefficients give you the quadratic representation of the iv2 effect conditional on iv1 = 0. If 0 is an important value of iv1, then perhaps this is of some interest. If, as is often the case, 0 isn't even within the range of observed values of iv1, then it's just a huge red herring. Remember that by putting the interaction with iv1 in you are no longer modeling a single quadratic effect of iv2. You are modeling an effect of iv2 that is a different quadratic for each value of iv1. And so it really only makes sense to talk about the effects of iv2 at specific, well-chosen values of iv1. To see what is going on in your model, it is best to use the -margins- command, followed by -marginsplot-.

In quadratic models, it is really pointless to directly interpret the coefficients of the linear and quadratic terms by themselves. The quadratic coefficient's sign tells you if you are dealing with an upright or upside down U relationship, and its magnitude tells you how wide or flat the parabola is. The linear coefficient in its own right has no meaning whatsoever. (Well, it is the slope of the parabola at the point where iv2 = 0--which may or may not be a useful number depending on your context.) The real use of the linear coefficient is to calculate -linear coeff/(2*quadratic coeff). This gives the value of iv2 where the parabola reaches its peak or nadir. It is the location of the axis of symmetry of the parabola, and your model means rather different things if that value falls squarely inside the range of observed values of iv2 (in which case you really do have a U), or beyond that range, in which case you have a slightly curvilinear relationship, but no real U. When you put an interaction term with iv1 in there, you now have different parabolas for each value of iv1. The linear coefficient for the parabola at a given value of iv1 is _b[iv2] + _b[iv1#iv2]*iv1. So, as iv1 changes, that linear coefficient changes with it (in a linear way) and therefore the center of the parabola moves as well. (Things would be more complicated still if you also interacted iv1 with iv2#iv2.)

Hi Clyde,

Thanks for your explanation. I've attached the grahp below. In my models,I am modeling an effect of iv1 for each value of iv2, and talking about the effects of iv1 at specific, well-chosen values of iv2. I know the models are not the same. And I have to do the piecewise regression, so does it still make sense in this case (the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term)?

Thanks,
David
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#4

24 May 2016, 01:56

Those predictions don't look right. What is the unit of your income variable. If it is something like euros/dollars/pounds per year then a prediction of 1.50e+10 (=15,000,000,000) is just unrealistic. Quadratic models can easily lead to such extreme predictions. Are there many observations in your data with a large iv1t and iv2=0? I would start with just a scatter plot of income versus iv1t and get a feel for what is realistic. I would then try different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
David Lu

Join Date: May 2016

Posts: 105
#5

24 May 2016, 02:56

Originally posted by Maarten Buis View Post

Those predictions don't look right. What is the unit of your income variable. If it is something like euros/dollars/pounds per year then a prediction of 1.50e+10 (=15,000,000,000) is just unrealistic. Quadratic models can easily lead to such extreme predictions. Are there many observations in your data with a large iv1t and iv2=0? I would start with just a scatter plot of income versus iv1t and get a feel for what is realistic. I would then try different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on.

Hi Maarten,

Thanks for your suggestion. The income is a firm's income in yuan (not euro), and the maxium of the data is 4.90e+09. I am trying piecewise regression by including non-linearity, i.e. adding the the squared term of iv2 and think if it is significant, I'll then add the interaction term with iv1t. But unfortunately, only adding the the squared term of iv2 make the coefficient of iv2 and its square term insignificant. It would be much more clearer only report the result of the best model, but I want to show it in piecewise regression. In that case,do you think these piecewise regressions make sense by keeping adding the interaction term of iv1t in the final model even if the squared term of iv2 is not significant alone?

Thanks,
David

Last edited by David Lu; 24 May 2016, 02:59.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#6

24 May 2016, 03:50

Clyde already explained that you don't look at the signficance of a variable and that variable squared in isolation.

What do you mean with "piecewise regression": are you estimating seperate models for different groups? It does not surprise me that the significance is different in subsets of your data, if only because that way you reduce the sample size, but also because apperently you expect the results to be different across groups (why else would you want to estimate seperate models).

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
David Lu

Join Date: May 2016

Posts: 105
#7

30 May 2016, 01:48

Originally posted by Maarten Buis View Post

Clyde already explained that you don't look at the signficance of a variable and that variable squared in isolation.

What do you mean with "piecewise regression": are you estimating seperate models for different groups? It does not surprise me that the significance is different in subsets of your data, if only because that way you reduce the sample size, but also because apperently you expect the results to be different across groups (why else would you want to estimate seperate models).

Dear Maarten,

What I mean piecewise regression is regression as the following example, in model 1 add only control variables, then model 2 iv1, model3 iv1 iv1sq, model4 iv1 iv1sq and iv1#iv2,since it's done step by step, so it's called stepwise or piecewise model. It doesn't reduce the sample size actually, and not compare results across groups. The purpose of using stepwise regression is to see if the Rsq/explanatory power increase significantly when adding variables.

And based on your suggestion, I checked my data and found not many observations in your data with a large iv1t and iv2=0. Also, I started with just a scatter plot of income versus iv1t and got a feel for what is realistic (attached below). And I then tried different ways of including non-linearity, e.g. also adding the interaction term with iv1t and the squared term, splines (help mkspline) to check what is going on, the results seem to explode more. So, in that case, what do you think how to deal with the extreme predictions ?

Thanks,
David

Last edited by David Lu; 30 May 2016, 01:51.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#8

30 May 2016, 02:23

You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
David Lu

Join Date: May 2016

Posts: 105
#9

30 May 2016, 02:40

Originally posted by Maarten Buis View Post

You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.

Hi Maarten,

Thank you for your further suggestion. I used glm (command attached) and estimate the exact same model, but cannot directly predict the Cook's distance after glm as it cannot be used in vce(robust) term (error attached). Is there some alternative to predict Cook's distance of model using robust term ?

Thanks,
David

Code:

. glm income c.iv1t##c.iv2 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode ,family(poisson) link(log) vce(robust) predict d1, cooksd . predict d1, cooksd standardized not allowed after robust estimation r(198);
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#10

30 May 2016, 02:42

http://blog.stata.com/2014/05/08/usi...ential-points/

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
David Lu

Join Date: May 2016

Posts: 105
#11

30 May 2016, 03:47

Originally posted by Maarten Buis View Post

http://blog.stata.com/2014/05/08/usi...ential-points/

Hi Maarten,

Thank you for the helpful post. Now I can obtain the dfbeta values(attached), and I tried and found some factors that explains those extreme values and incorporate those in the model and it explode less but still extreme .Even worse, adding more variable make it less significant of the coefficients. So, how to then deal with the results?

Thanks,
David
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#12

30 May 2016, 04:02

That is the art of model building. This is something that you need to do, as you know most about the situation, the data, the way the data was collected, the research question, the aim of your study, etc. etc. Just document what you tried, and be honest and be open to the possibility that your less then 300 observations just don't contain the information necessary to reliabaly answer the question you want to answer.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
David Lu

Join Date: May 2016

Posts: 105
#13

29 Jun 2016, 09:27

Originally posted by Maarten Buis View Post

You have at least two obvious outliers. So I would check for influential observations. For example, if you estimate your model with glm instead of poisson, you can estimate the exact same model, but you can directly predict the Cook's distance after glm (see glm postestimation in the predict section). How to then deal with them is an art; I would be very reluctant to remove them from your analysis, as they do seem legitimate observations, instead my first attempt would be to try and find some factors that explains those extreme values and incorporate those in your model.

Dear Maarten,

I remember last time you suggested to try and find some factors that explains those extreme values and incorporate those in the model. Just a bold guess, does it make sense to first tag those extreme values and creat a dummy variable and incorporate it in the model to control the effect of outliers ?

Thanks,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#14

29 Jun 2016, 11:25

David:
First, I would check whether those "weird" values are due to a trivial error in data entry.
If that is not the case, it would probably be wiser to present the result of your regression with and without extreme observations.
Another option is -rreg- but it seems less good-looking than in the past.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Why the coefficients of the linear term and the quadratic term become insignificant after dropping the interaction term?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment