Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cubic age variable in linear regression

    Hi all,

    Upon visual inspection of my data, I noticed that my continuous variable, csh_sh, increased and decreased with age. Using a bar chart comparing mean values across age categories, I observed a cubic function.

    I therefore used the following commands which confirmed this:

    Code:
    regress csh_sh c.age##c.age##c.age
    margins, at(age = (18(7)90))
    marginsplot
    Click image for larger version

Name:	cubic age graph 2.0.jpg
Views:	1
Size:	287.6 KB
ID:	1548594






    I now have my regression model as csh_sh = b0 + b1age +b2age^2 + b3age^3 excluding control variables for simplification purposes. The results from OLS estimation is as follows:

    Code:
    regress csh_sh c.age##c.age##c.age
    ---------------------------------------------------------------------------------
    cashshare | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
    age | -.0256759 .0049533 -5.18 0.000 -.035386 -.0159658
    |
    c.age#c.age | .0005023 .0001001 5.02 0.000 .0003062 .0006985
    |
    c.age#c.age#c.age | -3.11e-06 6.42e-07 -4.84 0.000 -4.37e-06 -1.85e-06
    |
    _cons | .6778854 .0771562 8.79 0.000 .5266348 .8291361


    With regard to the interpretation of coefficients (in bold), can any provide any recommendations? I know that interpreting these coefficients is a lot more complex than interpreting linear relationships.

    When I include my control variables, all three coefficients are no longer statistically insignificant. In this case, I would argue no significant relationship is present. However, I would be useful to understand what the size of of the coefficients actually mean/imply.

    Any advice/recommendation on this would be really appreciated. Thanks!
    Last edited by sladmin; 11 May 2020, 08:02. Reason: anonymize original poster

  • #2
    The interpretation is as likely to be substantive as much as statistical. I'd note the unsurprising widening around age 88 when you may have relatively few people in the sample.

    Despite your P-values the usual advice would be to be very cautious about overfitting and consider carefully other choices such as splines and fractional polynomials.

    I once scoffed a little at quadratics as at best empirical until Marcello Pagano gently reminded me that Newton showed that projectiles in simplified circumstances are expected to describe parabolas, as is or should be familiar to anyone fooling around with a water hose and/or studying mechanics in physics or applied mathematics. But I still worry about cubics without an independent rationale.

    The three coefficients when you fit a cubic have very different units and dimensions and are highly dependent any way. It's their combined effect that you have to judge and the graph you have shown is the first and most important step in my view.
    Last edited by Nick Cox; 23 Apr 2020, 02:21.

    Comment


    • #3
      Hi Nick,

      Thanks very much for your advice. I ran the regression including the control variables. When using marginsplot again, the curve appears to be a negative quadratic as below. I am now considering an age squared variable only, and no cubic variable.

      Q) When assessing the relationship before specifying the model, should one rely more on the marginsplot where control variables are already included in regression?

      I would assume that including control variables means the linear predictions for age will be more accurate (assuming the control variables explain variation in the dependent variable).

      Many thanks again for your help.
      Click image for larger version

Name:	age marginsplot negative quadratic.jpg
Views:	1
Size:	124.2 KB
ID:	1548691

      Comment

      Working...
      X