How does stata pick the base for multiple regression?

Cassie Wright

Join Date: Dec 2021

Posts: 44
#1

How does stata pick the base for multiple regression?

28 Dec 2021, 12:50

Hello! And happy (soon) new year!

I am attempting to do multiple regression with control variables, and I was wondering what my base represents. Does stata pick the first variable in the regression code for the base? For example:

Code:

regress dependentvar explanatoryvar controlvar1 controlvar2

Would this mean that the explanatory variable be picked as a base? I know that you can pick a base with a categorical variable, but my explanatory variable is continuous.

Best wishes

Cassie
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17609

28 Dec 2021, 12:56

Cassie:
your query sounds unclear to me.
What you define as base is probably the constant (intercept) of your OLS.
In the following toy-example, the constant represents the coefficients of -mpg-=0:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta"
(1978 automobile data)

. regress price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. predict fitted, xb

. list make mpg fitted in 1

     +------------------------------+
     | make          mpg     fitted |
     |------------------------------|
  1. | AMC Concord    22   5997.385 |
     +------------------------------+

. di 11253.06 + (-238.8943*22)
5997.3854

.

If a hypothetical car had mpg=0, the fitted value would boil down to -_cons- only.

PS: I do reciprocate all the best for the incoming 2022.

Kind regards,
Carlo
(StataNow 18.5)

Comment

Cassie Wright

Join Date: Dec 2021

Posts: 44
#3

28 Dec 2021, 13:11

Originally posted by Carlo Lazzaro View Post

Cassie:
your query sounds unclear to me.
What you define as base is probably the constant (intercept) of your OLS.
In the following toy-example, the constant represents the coefficients of -mpg-=0:

Code:

. use "C:\Program Files\Stata17\ado\base\a\auto.dta" (1978 automobile data) . regress price mpg Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(1, 72) = 20.26 Model | 139449474 1 139449474 Prob > F = 0.0000 Residual | 495615923 72 6883554.48 R-squared = 0.2196 -------------+---------------------------------- Adj R-squared = 0.2087 Total | 635065396 73 8699525.97 Root MSE = 2623.7 ------------------------------------------------------------------------------ price | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879 _cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03 ------------------------------------------------------------------------------ . predict fitted, xb . list make mpg fitted in 1 +------------------------------+ | make mpg fitted | |------------------------------| 1. | AMC Concord 22 5997.385 | +------------------------------+ . di 11253.06 + (-238.8943*22) 5997.3854 .

If a hypothetical car had mpg=0, the fitted value would boil down to -_cons- only.

PS: I do reciprocate all the best for the incoming 2022.

Hello again Carlo!

I'm sorry if this seems unclear. I'm not really sure how to use dataex so I hope you don't mind if I attach a screenshot of my results.

In this equation, political trust is on a numerical likert scale, which I am treating as a continuous (as this is what's instructed of me by my tutor). I want to be able to control for age and gender, when looking at my hypothesis of "The more political trust British people have the less they prioritise the environment". I don't want to offend anyone with my hypothesis, this is purely for the purpose of an assessment I am doing - I am not trying to make a political statement here.

I'm unsure how to interpret _cons. I know that without the control variables, _cons = 0 political trust. However, I'm worried this may change with the control variables?

Apologies if this is a simple question.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17609
#4

28 Dec 2021, 14:07

Cassie:
as per your results, your synthetic statement is in part correct, as it does not take into account that your main predictor is now adjusted for -age- and -gender-.
That said:
1) based on the existing literature, I would consider with your tutor whether converting a Likert scale into a continuous predictor makes sense;
2) check whether you included all the necessary predictors and interactions in the right-hand side of your regression equation;
3) eventually, I would check whether a linear + squared terms for -age- (centered around its mean) makes sense (turning point).

Kind regards,
Carlo
(StataNow 18.5)
Comment

Announcement

How does stata pick the base for multiple regression?

Comment

Comment

Comment