Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression Results with "If" Statement

    I am trying to run a simple regression for a school assignment. The question asks me to run a regression and restrict my estimation to var == 1.

    Initially, I wrote the following code:

    Code:
    keep if var == 1
    reg y x
    and then ran my regressions.

    But, my classmate answered the question with a differently. Rather than dropping observations where var!=1, he wrote

    Code:
    reg y x if var == 1
    We get different results for our regressions.

    Why do we get different results?

  • #2
    It is the general policy here not to provide assistance with school assignments. Your post is a borderline case: it appears you have actually done the assignment, and are trying to go a bit beyond and understand farther. So I'll respond, but I caution you that, in general, you should not post school assignments here.

    There is no reason I can see why this should have happened, and I cannot replicate the phenomenon with any examples I have tried. I suggest that you post some example data, perhaps the very data set used in the assignment, along with all the commands your friend and you used in setting up the problem (including those that preceded what you show in #1) and the output you each received. (Be sure to read the Forum FAQ, especially #12, so you use the proper approaches to showing this information.)

    My bottom line is that if you both started from the exact same data and used the commands you show, you should have gotten the same results. Most likely, there was some discrepancy in the starting data. Or, alternatively, perhaps the commands actually used are not quite as you show them. But if you can show me an example to the contrary, I'll try to troubleshoot it.

    Finally, a pointer on Stata terminology The -if var == 1- construct is not called an if statement. It is an if condtion. Stata also has a different construct that is called an if statement or if command. But it does something different and the syntax is different. It is not uncommon for new users to confuse these, and even experienced users sometimes mistakenly use one where the other is called for. One way to help keep your usage of these different constructs correct is to consistently refer to them by their correct names, so as to emphasize the difference in your mind.

    Comment


    • #3
      Andrew:
      welcome to this forum.
      Forewording that I do share Clyde's helpful explanation verbatim, the only instance that springs to my mind that justifies apparently different results with the two appraoches relates to fitted values.
      In the following toy-example, I start with a copy of the -auto.dta- file that come with Stata:
      Code:
      . use "C:\Users\user\Desktop\AUTO.dta"
      (1978 Automobile Data)
      
      . keep if foreign==1
      (52 observations deleted)
      
      . regress price mpg
      
            Source |       SS           df       MS      Number of obs   =        22
      -------------+----------------------------------   F(1, 20)        =     13.25
             Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
          Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
      -------------+----------------------------------   Adj R-squared   =    0.3685
             Total |   144363213        21   6874438.7   Root MSE        =    2083.6
      
      ------------------------------------------------------------------------------
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
             _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
      ------------------------------------------------------------------------------
      
      . predict fitted_1, xb
      
      . sum fitted_1
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
          fitted_1 |         22    6384.682    1655.222   2321.911   9081.815
      As this dataset has 22 observations only, there's no chance that the fitted values are more than 22.

      However, if we use the Stata built-in dataset and we forget to include an -if- clause when we ask for -predict-:
      Code:
      . sysuse auto.dta
      (1978 Automobile Data)
      
      . regress price mpg if foreign==1
      
            Source |       SS           df       MS      Number of obs   =        22
      -------------+----------------------------------   F(1, 20)        =     13.25
             Model |  57534941.7         1  57534941.7   Prob > F        =    0.0016
          Residual |  86828271.1        20  4341413.55   R-squared       =    0.3985
      -------------+----------------------------------   Adj R-squared   =    0.3685
             Total |   144363213        21   6874438.7   Root MSE        =    2083.6
      
      ------------------------------------------------------------------------------
             price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               mpg |  -250.3668   68.77435    -3.64   0.002    -393.8276    -106.906
             _cons |   12586.95   1760.689     7.15   0.000     8914.217    16259.68
      ------------------------------------------------------------------------------
      
      . predict fitted_2, xb
      
      . sum fitted_2
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
          fitted_2 |         74    7254.814    1448.498   2321.911   9582.549
      Stata returns 74 fitted values, regardless the -if- caluse invoked in -regress-.

      Obviously, the same -if- clause should have been invoked in the -predict- code:
      Code:
      . sum fitted_2 if foreign==1
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
          fitted_2 |         22    6384.682    1655.222   2321.911   9081.815
      to obtain, as expected, the very same -summarize- results as far as the fitted values are concerned.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment

      Working...
      X