Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata dropping standard errors in regression

    Hi everyone,

    I am trying to run a simple regression on this test data to understand better how to read the coefficients in a multiple regression with two or more binary variables:

    # Sex Race y
    # 1 Male White 1
    # 2 Female White 3
    # 3 Male Black 5
    # 4 Female Black 7

    In this case the model is y = B0+B1Race+B2Sex. I have coded males as 0 and females as 1, and white as 0 and black as 1. When I run the regression I get the following results:

    . reg y Race Sex

    Source | SS df MS Number of obs = 4
    -------------+---------------------------------- F(2, 1) = .
    Model | 20 2 10 Prob > F = .
    Residual | 0 1 0 R-squared = 1.0000
    -------------+---------------------------------- Adj R-squared = 1.0000
    Total | 20 3 6.66666667 Root MSE = 0

    ------------------------------------------------------------------------------
    y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Race | 4 . . . . .
    Sex | 2 . . . . .
    _cons | 1 . . . . .

    Stata drops the standard errors and just gives me the coefficients and I cannot figure out why, as when I run the same regression in R I get the following results with no standard errors dropped:

    # Coefficients:
    # Estimate Std. Error t value Pr(>|t|)
    # (Intercept) 1 3.85e-16 2.60e+15 2.4e-16 ***
    # SexFemale 2 4.44e-16 4.50e+15 < 2e-16 ***
    # RaceBlack 4 4.44e-16 9.01e+15 < 2e-16 ***
    # ...
    # Warning message:
    # In summary.lm(lm(y ~ Sex + Race, d)) :
    # essentially perfect fit: summary may be unreliable

    I understand that in this case, my coefficients should be read as the difference between the groups, as noted here: https://stats.stackexchange.com/ques...ical-variables However, I am confused about Stata dropping the standard errors. Could somebody please help me to understand why Stata is dropping the standard errors? Thanks in advance.
    Last edited by Mike Jim; 02 Aug 2018, 10:29.

  • #2
    The data in your model have a perfect fit to the equation y = 7 - 2*sexa - 4*racea. The residuals are all exactly zero. Consequently the residual variance is zero and so are the standard errors of all the model coefficients. For whatever, reason, Stata chooses simply not to bother calculating and showing them. You will see this happen with any linear model that has R2 = 1 in Stata.

    Note, by the way, that the standard errors shown in your R output are all very close to zero, but they are not exactly zero. So they are, in fact, wrong.

    Comment


    • #3
      Thank you so much Clyde! I was puzzled by this difference between R and Stata but now it is all clear. Thank you!

      Comment


      • #4
        To add to Clyde's comment, if you're going to play with regression, use a reasonable number of observations. It seldom if ever makes sense to do regression in situations where N is about the number of parameters.

        Comment

        Working...
        X