Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins: atmeans and asobserved are same for linear regression?

    Hello all,

    I have a possibly very basic question, but I am struggling to understand the margins options. For context, I am trying to get the average adjusted predictions using margins after running an OLS linear regression model with a number of control variables.

    Here is a simple reproducible example that shows my question, where I am trying to predict systolic blood pressure by race, and comparing the approaches of asobserved and atmeans.

    Code:
    webuse nhanes2f, clear
    reg bpsystol age i.sex i.race i.health
    
    
    ------------------------------------------------------------------------------
        bpsystol | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |   .6227996   .0125339    49.69   0.000     .5982308    .6473684
                 |
             sex |
         Female  |  -4.159605   .4001854   -10.39   0.000    -4.944046   -3.375164
                 |
            race |
          Black  |   3.818875   .6592626     5.79   0.000     2.526592    5.111157
          Other  |   .2950134   1.450936     0.20   0.839    -2.549102    3.139129
                 |
          health |
           Fair  |   1.130953   .9025727     1.25   0.210    -.6382642    2.900171
        Average  |  -1.066945   .8524102    -1.25   0.211    -2.737834    .6039445
           Good  |  -2.475439   .8802208    -2.81   0.005    -4.200843   -.7500361
      Excellent  |  -3.522989   .8990483    -3.92   0.000    -5.285298    -1.76068
                 |
           _cons |   104.6021   1.096111    95.43   0.000     102.4535    106.7507
    ------------------------------------------------------------------------------
    
    
    margins race
    
    Predictive margins                                      Number of obs = 10,335
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            race |
          White  |   130.4806   .2134168   611.39   0.000     130.0622    130.8989
          Black  |   134.2994   .6220484   215.90   0.000     133.0801    135.5188
          Other  |   130.7756   1.434931    91.14   0.000     127.9628    133.5883
    ------------------------------------------------------------------------------
    
    . margins race, atmeans
    
    Adjusted predictions                                    Number of obs = 10,335
    Model VCE: OLS
    
    Expression: Linear prediction, predict()
    At: age      = 47.56584 (mean)
        1.sex    = .4749879 (mean)
        2.sex    = .5250121 (mean)
        1.race   = .8755685 (mean)
        2.race   = .1050798 (mean)
        3.race   = .0193517 (mean)
        1.health =  .070537 (mean)
        2.health = .1615868 (mean)
        3.health = .2842767 (mean)
        4.health = .2507015 (mean)
        5.health = .2328979 (mean)
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
            race |
          White  |   130.4806   .2134168   611.39   0.000     130.0622    130.8989
          Black  |   134.2994   .6220484   215.90   0.000     133.0801    135.5188
          Other  |   130.7756   1.434931    91.14   0.000     127.9628    133.5883
    ------------------------------------------------------------------------------
    Having read the documentation from Richard Williams on the differences between the two approaches to calculating the differences, I had expected that the results would be different. Instead, they are the exact same. I feel I am obviously not understanding something - perhaps fairly obvious as well...

    Hoping that anyone can help explain, thank you so much in advance!

  • #2
    Yes, in a purely linear model with no higher order polynomial or interaction terms, -margins- and -margins, atmeans- will produce identical results. It's really a simple matter. -margins- produces an average over the joint distribution of all the model variables of the predicted values of the outcome variable. The predicted values of the outcome variable are, in this kind of model, simple linear combinations of the predictor variables.

    Now, when you use -margins, atmeans-, the means of the predictors are calculated first and then the linear combination defined by the regression is applied to that.

    Now, in general it is not true that f(mean value of X) = mean(f(X)) for arbitrary functions f. But that equation does hold when f is a purely linear function, as is the case here where f is the regression's linear combination of the predictors. It is only for these very simple linear models that this will happen. If the model itself is non-linear, or if it contains non-linear terms in the variables, then the results will differ.

    Comment

    Working...
    X