Determining whether to include a squared term and maximization

Eliza Lara

Join Date: Feb 2020

Posts: 1
#1

Determining whether to include a squared term and maximization

23 Feb 2020, 12:55

Say I have coffee = beta0 + beta1educ + beta2imptax + beta3ln(income) + beta4age + beta5agesq + u

What test should I run to determine whether to include the squared term in my model? And if I do include it what test should I run to determine maximization?
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10298
#2

24 Feb 2020, 14:07

Say I have coffee = beta0 + beta1educ + beta2imptax + beta3ln(income) + beta4age + beta5agesq + u
What test should I run to determine whether to include the squared term in my model?

I assume that age is an important covariate in your model and the test here is whether adding a quadratic term is justified relative to the linear term only. So the test is simply whether the coefficient of the quadratic term is equal to 0, which translates to whether the coefficient is significant. If it is significant, include the quadratic term and if it is not, drop it.

And if I do include it what test should I run to determine maximization?

I assume that you are asking how to find the maximum. Back to basic calculus, you take the first order derivative and set it to 0 (necessary condition). To ensure that the value is a maximum or minimum, you check the sign of the second derivative (sufficient condition).

$$\frac{\partial}{\partial \text{age}}\left(\beta_{4}\text{age} +\beta_{5}\text{age}^{2}\right)= 0\Rightarrow \text{age}= -\frac{\beta_{4}}{2\beta_{5}}.$$

Here is an example where we have a U relationship between age and birth weight (so a minimum as opposed to a maximum).

Code:

webuse lbw, clear regress bwt smoke i.race c.age##c.age margins, expression(-_b[age]/(2*_b[c.age#c.age]))

Res.:

Code:

. margins, expression(-_b[age]/(2*_b[c.age#c.age])) Warning: expression() does not contain predict() or xb(). Warning: prediction constant over observations. Predictive margins Number of obs = 189 Model VCE : OLS Expression : -_b[age]/(2*_b[c.age#c.age]) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _cons | 24.77376 1.616256 15.33 0.000 21.60596 27.94157 ------------------------------------------------------------------------------

So here note that the minimum is at age= 24.77 years. Visually

Code:

margins, at(age = (10(5)30)) marginsplot, scheme(s1color)

Last edited by Andrew Musau; 24 Feb 2020, 14:11.
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30192
#3

24 Feb 2020, 15:56

To Andrew Musau's excellent advice I would just add one point. If the test of the quadratic coefficient leads you to include it, but you then find that the value of age which maximizes (or minimizes as the case may be) the predicted value is far outside the range of realistic or interesting values of the age variable in your real-world context, then I would consider omitting the quadratic term anyway. If there is no real turning point in or near the range of important values of age, then the quadratic term represents only a minor curvilinear tweak to the linear term that probably has no real-world importance. This would particularly be the case if the sample size is very large, so that a trivial effect can easily have a very small p-value.
4 likes
Comment

Announcement

Determining whether to include a squared term and maximization

Comment

Comment