Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining whether to include a squared term and maximization

    Say I have coffee = beta0 + beta1educ + beta2imptax + beta3ln(income) + beta4age + beta5agesq + u
    What test should I run to determine whether to include the squared term in my model? And if I do include it what test should I run to determine maximization?

  • #2
    Say I have coffee = beta0 + beta1educ + beta2imptax + beta3ln(income) + beta4age + beta5agesq + u
    What test should I run to determine whether to include the squared term in my model?
    I assume that age is an important covariate in your model and the test here is whether adding a quadratic term is justified relative to the linear term only. So the test is simply whether the coefficient of the quadratic term is equal to 0, which translates to whether the coefficient is significant. If it is significant, include the quadratic term and if it is not, drop it.

    And if I do include it what test should I run to determine maximization?
    I assume that you are asking how to find the maximum. Back to basic calculus, you take the first order derivative and set it to 0 (necessary condition). To ensure that the value is a maximum or minimum, you check the sign of the second derivative (sufficient condition).

    $$\frac{\partial}{\partial \text{age}}\left(\beta_{4}\text{age} +\beta_{5}\text{age}^{2}\right)= 0\Rightarrow \text{age}= -\frac{\beta_{4}}{2\beta_{5}}.$$

    Here is an example where we have a U relationship between age and birth weight (so a minimum as opposed to a maximum).

    Code:
    webuse lbw, clear
    regress bwt smoke i.race c.age##c.age
    margins, expression(-_b[age]/(2*_b[c.age#c.age]))
    Res.:

    Code:
    . margins, expression(-_b[age]/(2*_b[c.age#c.age]))
    Warning: expression() does not contain predict() or xb().
    Warning: prediction constant over observations.
    
    Predictive margins                              Number of obs     =        189
    Model VCE    : OLS
    
    Expression   : -_b[age]/(2*_b[c.age#c.age])
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _cons |   24.77376   1.616256    15.33   0.000     21.60596    27.94157
    ------------------------------------------------------------------------------
    So here note that the minimum is at age= 24.77 years. Visually

    Code:
    margins, at(age = (10(5)30))
    marginsplot, scheme(s1color)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	22.1 KB
ID:	1538250




    Last edited by Andrew Musau; 24 Feb 2020, 14:11.

    Comment


    • #3
      To Andrew Musau's excellent advice I would just add one point. If the test of the quadratic coefficient leads you to include it, but you then find that the value of age which maximizes (or minimizes as the case may be) the predicted value is far outside the range of realistic or interesting values of the age variable in your real-world context, then I would consider omitting the quadratic term anyway. If there is no real turning point in or near the range of important values of age, then the quadratic term represents only a minor curvilinear tweak to the linear term that probably has no real-world importance. This would particularly be the case if the sample size is very large, so that a trivial effect can easily have a very small p-value.

      Comment

      Working...
      X