Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How does stata pick the base for multiple regression?

    Hello! And happy (soon) new year!

    I am attempting to do multiple regression with control variables, and I was wondering what my base represents. Does stata pick the first variable in the regression code for the base? For example:

    Code:
    regress dependentvar explanatoryvar controlvar1 controlvar2
    Would this mean that the explanatory variable be picked as a base? I know that you can pick a base with a categorical variable, but my explanatory variable is continuous.

    Best wishes

    Cassie

  • #2
    Cassie:
    your query sounds unclear to me.
    What you define as base is probably the constant (intercept) of your OLS.
    In the following toy-example, the constant represents the coefficients of -mpg-=0:
    Code:
    . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
    (1978 automobile data)
    
    . regress price mpg
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(1, 72)        =     20.26
           Model |   139449474         1   139449474   Prob > F        =    0.0000
        Residual |   495615923        72  6883554.48   R-squared       =    0.2196
    -------------+----------------------------------   Adj R-squared   =    0.2087
           Total |   635065396        73  8699525.97   Root MSE        =    2623.7
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
           _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
    ------------------------------------------------------------------------------
    
    . predict fitted, xb
    
    . list make mpg fitted in 1
    
         +------------------------------+
         | make          mpg     fitted |
         |------------------------------|
      1. | AMC Concord    22   5997.385 |
         +------------------------------+
    
    . di 11253.06 + (-238.8943*22)
    5997.3854
    
    .
    If a hypothetical car had mpg=0, the fitted value would boil down to -_cons- only.

    PS: I do reciprocate all the best for the incoming 2022.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Cassie:
      your query sounds unclear to me.
      What you define as base is probably the constant (intercept) of your OLS.
      In the following toy-example, the constant represents the coefficients of -mpg-=0:
      Code:
      . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
      (1978 automobile data)
      
      . regress price mpg
      
      Source | SS df MS Number of obs = 74
      -------------+---------------------------------- F(1, 72) = 20.26
      Model | 139449474 1 139449474 Prob > F = 0.0000
      Residual | 495615923 72 6883554.48 R-squared = 0.2196
      -------------+---------------------------------- Adj R-squared = 0.2087
      Total | 635065396 73 8699525.97 Root MSE = 2623.7
      
      ------------------------------------------------------------------------------
      price | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879
      _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03
      ------------------------------------------------------------------------------
      
      . predict fitted, xb
      
      . list make mpg fitted in 1
      
      +------------------------------+
      | make mpg fitted |
      |------------------------------|
      1. | AMC Concord 22 5997.385 |
      +------------------------------+
      
      . di 11253.06 + (-238.8943*22)
      5997.3854
      
      .
      If a hypothetical car had mpg=0, the fitted value would boil down to -_cons- only.

      PS: I do reciprocate all the best for the incoming 2022.
      Hello again Carlo!

      I'm sorry if this seems unclear. I'm not really sure how to use dataex so I hope you don't mind if I attach a screenshot of my results.

      In this equation, political trust is on a numerical likert scale, which I am treating as a continuous (as this is what's instructed of me by my tutor). I want to be able to control for age and gender, when looking at my hypothesis of "The more political trust British people have the less they prioritise the environment". I don't want to offend anyone with my hypothesis, this is purely for the purpose of an assessment I am doing - I am not trying to make a political statement here.

      I'm unsure how to interpret _cons. I know that without the control variables, _cons = 0 political trust. However, I'm worried this may change with the control variables?

      Apologies if this is a simple question.

      Click image for larger version

Name:	Screenshot 2021-12-28 at 19.06.42.png
Views:	1
Size:	243.4 KB
ID:	1642649

      Comment


      • #4
        Cassie:
        as per your results, your synthetic statement is in part correct, as it does not take into account that your main predictor is now adjusted for -age- and -gender-.
        That said:
        1) based on the existing literature, I would consider with your tutor whether converting a Likert scale into a continuous predictor makes sense;
        2) check whether you included all the necessary predictors and interactions in the right-hand side of your regression equation;
        3) eventually, I would check whether a linear + squared terms for -age- (centered around its mean) makes sense (turning point).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment

        Working...
        X