looping regressions

Stephen Ch

Join Date: Apr 2022

Posts: 67
#1

looping regressions

22 Feb 2024, 17:24

Hi all,

I looked at Clyde's response on this post regarding looping regressions, but I fail to execute it properly.

I am not sure where I am going wrong with it.

Consider a dependent variable being -y- and three independent variables -x1, x2, x3-.

I want to run:

Code:

reg y x1 reg y x1 x2 reg y x1 x2 x3

The below code doesn't work, because each regression runs x1 only, x2 only, and x3 only.

Code:

local yvar y local xvar x1 x2 x3 foreach y of local ypvar{ foreach p of local xvar{ reg `y' x`p' } }

I got around this by making the inner loop with -forvalues- by declaring something like:

Code:

local xv1 x1 local xv2 x1 x2 local xv3 x1 x2 x3

However, what if I have 20 variables I want to cumulatively run regression in addition to my previous specification (e.g., run -reg y x1-, then run -reg y x1 x2-, etc until x20?)

Please advise and help!

Thanks.
Tags: None

Daniel Schaefer

Join Date: Mar 2020
Posts: 806

22 Feb 2024, 17:52

Suppose I have one dependent variable and five independent variables:

Code:

clear
set obs 1000

gen y = runiform()
forv i = 1/5 {
    gen x`i' = runiform()
}

Seems like you want a loop like this:

Code:

foreach var in x1 x2 x3 x4 x5 {
    local indvars = "`indvars' `var'"
    display "Model: `indvars'"
    reg y `indvars'
}

Code:

Model:  x1

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =      0.00
       Model |  .000053419         1  .000053419   Prob > F        =    0.9794
    Residual |  79.8543064       998  .080014335   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =   -0.0010
       Total |  79.8543598       999  .079934294   Root MSE        =    .28287

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   .0008324   .0322175     0.03   0.979    -.0623894    .0640543
       _cons |   .5131572   .0181225    28.32   0.000     .4775947    .5487197
------------------------------------------------------------------------------
Model:  x1 x2

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =      1.10
       Model |  .175361123         2  .087680561   Prob > F        =    0.3342
    Residual |  79.6789987       997  .079918755   R-squared       =    0.0022
-------------+----------------------------------   Adj R-squared   =    0.0002
       Total |  79.8543598       999  .079934294   Root MSE        =     .2827

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0003049   .0322074    -0.01   0.992     -.063507    .0628972
          x2 |  -.0464566   .0313669    -1.48   0.139    -.1080094    .0150961
       _cons |   .5375307   .0244715    21.97   0.000     .4895092    .5855522
------------------------------------------------------------------------------
Model:  x1 x2 x3

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =      0.74
       Model |  .176702018         3  .058900673   Prob > F        =    0.5305
    Residual |  79.6776578       996  .079997648   R-squared       =    0.0022
-------------+----------------------------------   Adj R-squared   =   -0.0008
       Total |  79.8543598       999  .079934294   Root MSE        =    .28284

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0003865   .0322295    -0.01   0.990     -.063632     .062859
          x2 |  -.0463146   .0314015    -1.47   0.141    -.1079354    .0153061
          x3 |    .003944   .0304637     0.13   0.897    -.0558363    .0637244
       _cons |   .5354937   .0291031    18.40   0.000     .4783832    .5926042
------------------------------------------------------------------------------
Model:  x1 x2 x3 x4

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(4, 995)       =      0.56
       Model |  .179383054         4  .044845763   Prob > F        =    0.6917
    Residual |  79.6749768       995  .080075354   R-squared       =    0.0022
-------------+----------------------------------   Adj R-squared   =   -0.0018
       Total |  79.8543598       999  .079934294   Root MSE        =    .28298

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0005174   .0322531    -0.02   0.987    -.0638093    .0627744
          x2 |  -.0462129   .0314217    -1.47   0.142    -.1078733    .0154475
          x3 |   .0039301   .0304786     0.13   0.897    -.0558795    .0637397
          x4 |  -.0057144   .0312297    -0.18   0.855    -.0669981    .0555693
       _cons |   .5384561   .0333157    16.16   0.000     .4730791    .6038332
------------------------------------------------------------------------------
Model:  x1 x2 x3 x4 x5

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(5, 994)       =      0.70
       Model |  .278965475         5  .055793095   Prob > F        =    0.6258
    Residual |  79.5753944       994  .080055729   R-squared       =    0.0035
-------------+----------------------------------   Adj R-squared   =   -0.0015
       Total |  79.8543598       999  .079934294   Root MSE        =    .28294

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0009052    .032251    -0.03   0.978     -.064193    .0623826
          x2 |  -.0462817   .0314179    -1.47   0.141    -.1079348    .0153713
          x3 |   .0047963   .0304847     0.16   0.875    -.0550255    .0646181
          x4 |  -.0057489   .0312259    -0.18   0.854    -.0670252    .0555274
          x5 |   .0345345   .0309641     1.12   0.265     -.026228    .0952971
       _cons |   .5211928   .0367321    14.19   0.000     .4491114    .5932742
------------------------------------------------------------------------------

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

22 Feb 2024, 17:55

Daniel, this is very helpful. What if I had two indepdent variables -y1, y2-, and I want to cumulatively loop regression from x1 to x3 on both?

Would I need a nested loop? I am failing to follow the coding suggested by you and Clyde with the two dependent variable case.

Originally posted by Daniel Schaefer View Post

Suppose I have one dependent variable and five independent variables:

Code:

clear
set obs 1000

gen y = runiform()
forv i = 1/5 {
gen x`i' = runiform()
}

Seems like you want a loop like this:

Code:

foreach var in x1 x2 x3 x4 x5 {
local indvars = "`indvars' `var'"
display "Model: `indvars'"
reg y `indvars'
}

Code:

Model: x1

Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 0.00
Model | .000053419 1 .000053419 Prob > F = 0.9794
Residual | 79.8543064 998 .080014335 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = -0.0010
Total | 79.8543598 999 .079934294 Root MSE = .28287

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | .0008324 .0322175 0.03 0.979 -.0623894 .0640543
_cons | .5131572 .0181225 28.32 0.000 .4775947 .5487197
------------------------------------------------------------------------------
Model: x1 x2

Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(2, 997) = 1.10
Model | .175361123 2 .087680561 Prob > F = 0.3342
Residual | 79.6789987 997 .079918755 R-squared = 0.0022
-------------+---------------------------------- Adj R-squared = 0.0002
Total | 79.8543598 999 .079934294 Root MSE = .2827

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0003049 .0322074 -0.01 0.992 -.063507 .0628972
x2 | -.0464566 .0313669 -1.48 0.139 -.1080094 .0150961
_cons | .5375307 .0244715 21.97 0.000 .4895092 .5855522
------------------------------------------------------------------------------
Model: x1 x2 x3

Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(3, 996) = 0.74
Model | .176702018 3 .058900673 Prob > F = 0.5305
Residual | 79.6776578 996 .079997648 R-squared = 0.0022
-------------+---------------------------------- Adj R-squared = -0.0008
Total | 79.8543598 999 .079934294 Root MSE = .28284

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0003865 .0322295 -0.01 0.990 -.063632 .062859
x2 | -.0463146 .0314015 -1.47 0.141 -.1079354 .0153061
x3 | .003944 .0304637 0.13 0.897 -.0558363 .0637244
_cons | .5354937 .0291031 18.40 0.000 .4783832 .5926042
------------------------------------------------------------------------------
Model: x1 x2 x3 x4

Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(4, 995) = 0.56
Model | .179383054 4 .044845763 Prob > F = 0.6917
Residual | 79.6749768 995 .080075354 R-squared = 0.0022
-------------+---------------------------------- Adj R-squared = -0.0018
Total | 79.8543598 999 .079934294 Root MSE = .28298

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0005174 .0322531 -0.02 0.987 -.0638093 .0627744
x2 | -.0462129 .0314217 -1.47 0.142 -.1078733 .0154475
x3 | .0039301 .0304786 0.13 0.897 -.0558795 .0637397
x4 | -.0057144 .0312297 -0.18 0.855 -.0669981 .0555693
_cons | .5384561 .0333157 16.16 0.000 .4730791 .6038332
------------------------------------------------------------------------------
Model: x1 x2 x3 x4 x5

Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(5, 994) = 0.70
Model | .278965475 5 .055793095 Prob > F = 0.6258
Residual | 79.5753944 994 .080055729 R-squared = 0.0035
-------------+---------------------------------- Adj R-squared = -0.0015
Total | 79.8543598 999 .079934294 Root MSE = .28294

------------------------------------------------------------------------------
y | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0009052 .032251 -0.03 0.978 -.064193 .0623826
x2 | -.0462817 .0314179 -1.47 0.141 -.1079348 .0153713
x3 | .0047963 .0304847 0.16 0.875 -.0550255 .0646181
x4 | -.0057489 .0312259 -0.18 0.854 -.0670252 .0555274
x5 | .0345345 .0309641 1.12 0.265 -.026228 .0952971
_cons | .5211928 .0367321 14.19 0.000 .4491114 .5932742
------------------------------------------------------------------------------

Comment

Daniel Schaefer

Join Date: Mar 2020
Posts: 806

22 Feb 2024, 18:03

For exactly two dependent variables I would probably just use the same loop twice.

Code:

foreach var in x1 x2 x3 x4 x5 {
    local indvars = "`indvars' `var'"
    display "Model: `indvars'"
    reg y1 `indvars'
}

foreach var in x1 x2 x3 x4 x5 {
    local indvars = "`indvars' `var'"
    display "Model: `indvars'"
    reg y2 `indvars'
}

For an arbitrary number of dependent variables, you'd want a nested loop.

Code:

foreach dep in y1 y2{
    local indvars = ""
    foreach var in x1 x2 x3 x4 x5 {
        local indvars = "`indvars' `var'"
        display "Model: `dep' `indvars'"
        reg `dep' `indvars'
    }
}

Comment

Bruce Weaver

Join Date: May 2014
Posts: 1109

22 Feb 2024, 18:07

My first thought was to use -nestreg- rather than modifying a local macro on each loop. Here is how it would work for the scenario you described in #3.

Code:

clear
set obs 1000
gen y1 = runiform()
gen y2 = runiform()
forv i = 1/3 {
    gen x`i' = runiform()
}

forvalues i = 1/2 {
  nestreg: regress y`i' (x1) (x2) (x3)    
}

Output:

Code:

. forvalues i = 1/2 {
  2.   nestreg: regress y`i' (x1) (x2) (x3)  
  3. }

Block 1: x1

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =      1.05
       Model |  .086985453         1  .086985453   Prob > F        =    0.3047
    Residual |   82.323753       998   .08248873   R-squared       =    0.0011
-------------+----------------------------------   Adj R-squared   =    0.0001
       Total |  82.4107384       999  .082493232   Root MSE        =    .28721

------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0328288    .031969    -1.03   0.305    -.0955631    .0299054
       _cons |   .5091189    .018474    27.56   0.000     .4728665    .5453712
------------------------------------------------------------------------------

Block 2: x2

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =      0.53
       Model |  .087046704         2  .043523352   Prob > F        =    0.5905
    Residual |  82.3236917       997  .082571406   R-squared       =    0.0011
-------------+----------------------------------   Adj R-squared   =   -0.0009
       Total |  82.4107384       999  .082493232   Root MSE        =    .28735

------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0327839   .0320276    -1.02   0.306    -.0956332    .0300654
          x2 |  -.0008527   .0313063    -0.03   0.978    -.0622864    .0605811
       _cons |   .5095165   .0235528    21.63   0.000     .4632977    .5557352
------------------------------------------------------------------------------

Block 3: x3

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =      1.40
       Model |  .345565665         3  .115188555   Prob > F        =    0.2419
    Residual |  82.0651728       996  .082394752   R-squared       =    0.0042
-------------+----------------------------------   Adj R-squared   =    0.0012
       Total |  82.4107384       999  .082493232   Root MSE        =    .28704

------------------------------------------------------------------------------
          y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   -.034607   .0320099    -1.08   0.280    -.0974216    .0282076
          x2 |  -.0005217   .0312733    -0.02   0.987    -.0618909    .0608475
          x3 |  -.0560831   .0316618    -1.77   0.077    -.1182146    .0060484
       _cons |   .5389422   .0288013    18.71   0.000     .4824239    .5954605
------------------------------------------------------------------------------


  +-------------------------------------------------------------+
  |       |          Block  Residual                     Change |
  | Block |       F     df        df   Pr > F       R2    in R2 |
  |-------+-----------------------------------------------------|
  |     1 |    1.05      1       998   0.3047   0.0011          |
  |     2 |    0.00      1       997   0.9783   0.0011   0.0000 |
  |     3 |    3.14      1       996   0.0768   0.0042   0.0031 |
  +-------------------------------------------------------------+

Block 1: x1

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(1, 998)       =      0.22
       Model |  .019210197         1  .019210197   Prob > F        =    0.6384
    Residual |  86.7356242       998  .086909443   R-squared       =    0.0002
-------------+----------------------------------   Adj R-squared   =   -0.0008
       Total |  86.7548344       999  .086841676   Root MSE        =     .2948

------------------------------------------------------------------------------
          y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0154276   .0328145    -0.47   0.638    -.0798209    .0489657
       _cons |   .5101567   .0189626    26.90   0.000     .4729456    .5473677
------------------------------------------------------------------------------

Block 2: x2

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(2, 997)       =      1.33
       Model |  .231617627         2  .115808814   Prob > F        =    0.2638
    Residual |  86.5232168       997  .086783567   R-squared       =    0.0027
-------------+----------------------------------   Adj R-squared   =    0.0007
       Total |  86.7548344       999  .086841676   Root MSE        =    .29459

------------------------------------------------------------------------------
          y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0180751   .0328344    -0.55   0.582    -.0825075    .0463573
          x2 |   .0502113   .0320949     1.56   0.118    -.0127699    .1131926
       _cons |   .4867433   .0241461    20.16   0.000     .4393603    .5341262
------------------------------------------------------------------------------

Block 3: x3

      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(3, 996)       =      1.27
       Model |  .329407656         3  .109802552   Prob > F        =    0.2849
    Residual |  86.4254268       996  .086772517   R-squared       =    0.0038
-------------+----------------------------------   Adj R-squared   =    0.0008
       Total |  86.7548344       999  .086841676   Root MSE        =    .29457

------------------------------------------------------------------------------
          y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0191964   .0328493    -0.58   0.559    -.0836581    .0452653
          x2 |   .0504149   .0320934     1.57   0.117    -.0125636    .1133933
          x3 |  -.0344932    .032492    -1.06   0.289    -.0982539    .0292675
       _cons |   .5048412   .0295566    17.08   0.000     .4468409    .5628415
------------------------------------------------------------------------------


  +-------------------------------------------------------------+
  |       |          Block  Residual                     Change |
  | Block |       F     df        df   Pr > F       R2    in R2 |
  |-------+-----------------------------------------------------|
  |     1 |    0.22      1       998   0.6384   0.0002          |
  |     2 |    2.45      1       997   0.1180   0.0027   0.0024 |
  |     3 |    1.13      1       996   0.2887   0.0038   0.0011 |
  +-------------------------------------------------------------+

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35212

22 Feb 2024, 18:08

Code:

foreach y in y1 y2 {
      local indvars

      foreach var in x1 x2 x3 x4 x5 {
             local indvars `indvars' `var'
             display "Model: `y' `indvars'
             reg `y' `indvars'
     }
}

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

22 Feb 2024, 18:17

Many thanks to Bruce, Daniel, and Nick.

Nick -- if I declared local dependent and indepdent variable sets before the loop, would this coding work?

Code:

local yvar y1 y2
local xvar x1 x2 x3
foreach y in local yvar {
      local indvars
      foreach var in local xvar {
             local indvars `indvars' `var'
             display "Model: `y' `indvars'
             reg `y' `indvars'
     }
}

Originally posted by Nick Cox View Post

Code:

foreach y in y1 y2 {
local indvars

foreach var in x1 x2 x3 x4 x5 {
local indvars `indvars' `var'
display "Model: `y' `indvars'
reg `y' `indvars'
}
}

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35212
#8

22 Feb 2024, 18:27

A problem with #4 is that the local macro indvars needs to be cleared before first use. This is done in the second block of code but not the first.

#7 No. That won't work. in local yvar is legal until you get inside the loop when you find that Stata doesn't understand you. Read the help for foreach to see why. The keywords in and of have utterly different meaning.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#9

22 Feb 2024, 18:32

Hi Nick, thanks for the response!

I will read the help -foreach-.

But a quick question in your code:

1. What does -local indvars- in the first loop do?
2. What does -local indvars `indvars' `var' do in the second do?

I want to understand the local macro indvars being cleared at which step.

Thanks!

Originally posted by Nick Cox View Post

A problem with #4 is that the local macro indvars needs to be cleared before first use. This is done in the second block of code but not the first.

#7 No. That won't work. in local yvar is legal until you get inside the loop when you find that Stata doesn't understand you. Read the help for foreach to see why. The keywords in and of have utterly different meaning.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#10

22 Feb 2024, 18:47

The questions may be quick, but the answers won't be.

#8 already addresses your first question.

I've written various tutorial reviews in this territory that discuss the main ideas in some detail.

https://journals.sagepub.com/doi/pdf...36867X20976340

https://journals.sagepub.com/doi/pdf...6867X211063415
1 like
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 806
#11

22 Feb 2024, 20:31

#5 is a great answer. I was not aware of -nestreg-, but it looks well suited to this problem. Seems worth it just for the cross-model comparisons at the end. I can imagine looping doesn't scale well once you have many models that all have to be interpreted. Those summary statistics at the end seem really useful for this kind of thing.

Nick, thanks for the more idiomatic code in #6 and for pointing out the issue with the first block in #4. Good catch, as usual.
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#12

23 Feb 2024, 08:26

Hi Nick, the first article really helped me to understand the local and loop structure in Stata better.

So quick questions with my own responses after reading your co-authored first article: can you please correct if there is a misunderstanding?

1. What does -local indvars- in the first loop do?
2. What does -local indvars `indvars' `var'- do in the second do?

1) you are declaring a local variable called "indvars" so that this is "live" in the innerloop. So, after each inner loop's completion, you are clearing this local indvars, because the final indvars from the last step of the inner loop will look like:

-`x1 x2 x3'-

2) The second line -local indvars `indvars' `var'- you are adding a list of string (independent variables) for the loop local variable.

Am I understanding this correctly?

Originally posted by Nick Cox View Post

The questions may be quick, but the answers won't be.

#8 already addresses your first question.

I've written various tutorial reviews in this territory that discuss the main ideas in some detail.

https://journals.sagepub.com/doi/pdf...36867X20976340

https://journals.sagepub.com/doi/pdf...6867X211063415
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#13

23 Feb 2024, 08:53

More or less. It's more subtle in either case.

Code:

local indvars

can be interpreted in two ways:

1. Setting the local to an empty string.

2. Abolishing the local macro altogether.

Stata doesn't really distinguish between the two cases. In the rest of life, if I have a box or bag with nothing in it, it's still a box or bag, which just happens to be empty. But to Stata an empty local macro doesn't exist....

If I go

Code:

local beasts `frog' toad

and the local macro frog doesn't exist, that's perfectly legal. Stata just substitutes an empty string, or otherwise put ignores the reference.

Code:

local indvars `indvars' `var'

is just adding one item at a time -- whatever is in local macro var -- to what is already in local macro indvars. (As said, that might be nothing, if the latter does not yet exist.)

The wording

adding a list of string (independent variables) for the loop local variable

doesn't capture this exactly, It does capture what happens as this is done repeatedly in a loop.

Also, Stata developers and programmers (should) never use the term local variable. In Stata a variable is (in other terms) a column or field in a dataset, and nothing else. (In Mata, matters are different.) Macros in Stata are not properly called variables. Naturally, they resemble in many ways variables in other languages, and people with more knowledge and experience of other languages are tempted to call them variables, but that's private or personal temptation, not correct as a matter of Stata usage. Call this pedantic if you like, but pedants who pay attention to detail get to write good Stata code more easily and effectively, and the others will get into messes more often.

More at https://www.stata.com/statalist/arch.../msg01258.html
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1109

#14

23 Feb 2024, 09:46

Originally posted by Daniel Schaefer View Post

#5 is a great answer. I was not aware of -nestreg-, but it looks well suited to this problem. Seems worth it just for the cross-model comparisons at the end. I can imagine looping doesn't scale well once you have many models that all have to be interpreted. Those summary statistics at the end seem really useful for this kind of thing.

Thanks Daniel. Note too that -nestreg- works with other types of models. Here's an example of using it with -logit-. The default here will give Wald tests comparing one block to the next, but I prefer likelihood ratio tests, so added the lrtable option.

Code:

* Example of -nestreg- with -logit-
* First, generate two dichotomous outcomes to illustrate
generate byte event1 = y1 > .75
generate byte event2 = y2 < .25
foreach event in event1 event2 {
  nestreg, lrtable: logit `event' (x1) (x2) (x3)    
}

Output:

Code:

. foreach event in event1 event2 {
  2.   nestreg, lrtable: logit `event' (x1) (x2) (x3)        
  3. }

Block 1: x1

Iteration 0:  Log likelihood = -551.07993  
Iteration 1:  Log likelihood = -551.03233  
Iteration 2:  Log likelihood = -551.03233  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(1)    =   0.10
                                                        Prob > chi2   = 0.7577
Log likelihood = -551.03233                             Pseudo R2     = 0.0001

------------------------------------------------------------------------------
      event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0804124   .2606281    -0.31   0.758    -.5912341    .4304092
       _cons |  -1.112351   .1498459    -7.42   0.000    -1.406043   -.8186581
------------------------------------------------------------------------------

Block 2: x2

Iteration 0:  Log likelihood = -551.07993  
Iteration 1:  Log likelihood = -550.42783  
Iteration 2:  Log likelihood = -550.42752  
Iteration 3:  Log likelihood = -550.42752  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(2)    =   1.30
                                                        Prob > chi2   = 0.5208
Log likelihood = -550.42752                             Pseudo R2     = 0.0012

------------------------------------------------------------------------------
      event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0953021   .2611743    -0.36   0.715    -.6071942    .4165901
          x2 |   .2806637   .2553362     1.10   0.272    -.2197861    .7811136
       _cons |  -1.244902   .1933907    -6.44   0.000    -1.623941   -.8658632
------------------------------------------------------------------------------

Block 3: x3

Iteration 0:  Log likelihood = -551.07993  
Iteration 1:  Log likelihood = -550.39182  
Iteration 2:  Log likelihood = -550.39146  
Iteration 3:  Log likelihood = -550.39146  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(3)    =   1.38
                                                        Prob > chi2   = 0.7110
Log likelihood = -550.39146                             Pseudo R2     = 0.0012

------------------------------------------------------------------------------
      event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0976551   .2613331    -0.37   0.709    -.6098585    .4145482
          x2 |    .281291   .2554339     1.10   0.271    -.2193503    .7819323
          x3 |  -.0694632   .2586687    -0.27   0.788    -.5764446    .4375181
       _cons |  -1.208622   .2356945    -5.13   0.000    -1.670574   -.7466688
------------------------------------------------------------------------------


  +----------------------------------------------------------------+
  | Block |        LL       LR     df  Pr > LR       AIC       BIC |
  |-------+--------------------------------------------------------|
  |     1 | -551.0323     0.10      1   0.7577  1106.065   1115.88 |
  |     2 | -550.4275     1.21      1   0.2714  1106.855  1121.578 |
  |     3 | -550.3915     0.07      1   0.7883  1108.783  1128.414 |
  +----------------------------------------------------------------+

Block 1: x1

Iteration 0:  Log likelihood = -573.05692  
Iteration 1:  Log likelihood = -573.04726  
Iteration 2:  Log likelihood = -573.04726  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(1)    =   0.02
                                                        Prob > chi2   = 0.8895
Log likelihood = -573.04726                             Pseudo R2     = 0.0000

------------------------------------------------------------------------------
      event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0352701   .2537572    -0.14   0.889     -.532625    .4620849
       _cons |  -1.028244   .1463384    -7.03   0.000    -1.315062   -.7414262
------------------------------------------------------------------------------

Block 2: x2

Iteration 0:  Log likelihood = -573.05692  
Iteration 1:  Log likelihood =  -572.0825  
Iteration 2:  Log likelihood = -572.08193  
Iteration 3:  Log likelihood = -572.08193  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(2)    =   1.95
                                                        Prob > chi2   = 0.3772
Log likelihood = -572.08193                             Pseudo R2     = 0.0017

------------------------------------------------------------------------------
      event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |  -.0170504    .254303    -0.07   0.947    -.5154751    .4813743
          x2 |  -.3457227   .2491851    -1.39   0.165    -.8341166    .1426711
       _cons |  -.8694459   .1848183    -4.70   0.000    -1.231683   -.5072087
------------------------------------------------------------------------------

Block 3: x3

Iteration 0:  Log likelihood = -573.05692  
Iteration 1:  Log likelihood = -571.96243  
Iteration 2:  Log likelihood = -571.96171  
Iteration 3:  Log likelihood = -571.96171  

Logistic regression                                     Number of obs =  1,000
                                                        LR chi2(3)    =   2.19
                                                        Prob > chi2   = 0.5338
Log likelihood = -571.96171                             Pseudo R2     = 0.0019

------------------------------------------------------------------------------
      event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   -.013244   .2544481    -0.05   0.958    -.5119531    .4854652
          x2 |  -.3461677   .2490922    -1.39   0.165    -.8343794     .142044
          x3 |   .1234019   .2517322     0.49   0.624    -.3699841    .6167879
       _cons |  -.9345228   .2278585    -4.10   0.000    -1.381117   -.4879284
------------------------------------------------------------------------------


  +----------------------------------------------------------------+
  | Block |        LL       LR     df  Pr > LR       AIC       BIC |
  |-------+--------------------------------------------------------|
  |     1 | -573.0473     0.02      1   0.8895  1150.095   1159.91 |
  |     2 | -572.0819     1.93      1   0.1647  1150.164  1164.887 |
  |     3 | -571.9617     0.24      1   0.6239  1151.923  1171.554 |
  +----------------------------------------------------------------+

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

#15

23 Feb 2024, 11:21

Hi Bruce,

I am now understanding how useful the -nestreg- is in my original situation which is multiple y variables and multiple x variables.

So does this work with reg, logit, probit, reghdfe, any ?

Thanks!

Originally posted by Bruce Weaver View Post

Code:

* Example of -nestreg- with -logit-
* First, generate two dichotomous outcomes to illustrate
generate byte event1 = y1 > .75
generate byte event2 = y2 < .25
foreach event in event1 event2 {
nestreg, lrtable: logit `event' (x1) (x2) (x3)
}

Output:

Code:

. foreach event in event1 event2 {
2. nestreg, lrtable: logit `event' (x1) (x2) (x3)
3. }

Block 1: x1

Iteration 0: Log likelihood = -551.07993
Iteration 1: Log likelihood = -551.03233
Iteration 2: Log likelihood = -551.03233

Logistic regression Number of obs = 1,000
LR chi2(1) = 0.10
Prob > chi2 = 0.7577
Log likelihood = -551.03233 Pseudo R2 = 0.0001

------------------------------------------------------------------------------
event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0804124 .2606281 -0.31 0.758 -.5912341 .4304092
_cons | -1.112351 .1498459 -7.42 0.000 -1.406043 -.8186581
------------------------------------------------------------------------------

Block 2: x2

Iteration 0: Log likelihood = -551.07993
Iteration 1: Log likelihood = -550.42783
Iteration 2: Log likelihood = -550.42752
Iteration 3: Log likelihood = -550.42752

Logistic regression Number of obs = 1,000
LR chi2(2) = 1.30
Prob > chi2 = 0.5208
Log likelihood = -550.42752 Pseudo R2 = 0.0012

------------------------------------------------------------------------------
event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0953021 .2611743 -0.36 0.715 -.6071942 .4165901
x2 | .2806637 .2553362 1.10 0.272 -.2197861 .7811136
_cons | -1.244902 .1933907 -6.44 0.000 -1.623941 -.8658632
------------------------------------------------------------------------------

Block 3: x3

Iteration 0: Log likelihood = -551.07993
Iteration 1: Log likelihood = -550.39182
Iteration 2: Log likelihood = -550.39146
Iteration 3: Log likelihood = -550.39146

Logistic regression Number of obs = 1,000
LR chi2(3) = 1.38
Prob > chi2 = 0.7110
Log likelihood = -550.39146 Pseudo R2 = 0.0012

------------------------------------------------------------------------------
event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0976551 .2613331 -0.37 0.709 -.6098585 .4145482
x2 | .281291 .2554339 1.10 0.271 -.2193503 .7819323
x3 | -.0694632 .2586687 -0.27 0.788 -.5764446 .4375181
_cons | -1.208622 .2356945 -5.13 0.000 -1.670574 -.7466688
------------------------------------------------------------------------------


+----------------------------------------------------------------+
| Block | LL LR df Pr > LR AIC BIC |
|-------+--------------------------------------------------------|
| 1 | -551.0323 0.10 1 0.7577 1106.065 1115.88 |
| 2 | -550.4275 1.21 1 0.2714 1106.855 1121.578 |
| 3 | -550.3915 0.07 1 0.7883 1108.783 1128.414 |
+----------------------------------------------------------------+

Block 1: x1

Iteration 0: Log likelihood = -573.05692
Iteration 1: Log likelihood = -573.04726
Iteration 2: Log likelihood = -573.04726

Logistic regression Number of obs = 1,000
LR chi2(1) = 0.02
Prob > chi2 = 0.8895
Log likelihood = -573.04726 Pseudo R2 = 0.0000

------------------------------------------------------------------------------
event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0352701 .2537572 -0.14 0.889 -.532625 .4620849
_cons | -1.028244 .1463384 -7.03 0.000 -1.315062 -.7414262
------------------------------------------------------------------------------

Block 2: x2

Iteration 0: Log likelihood = -573.05692
Iteration 1: Log likelihood = -572.0825
Iteration 2: Log likelihood = -572.08193
Iteration 3: Log likelihood = -572.08193

Logistic regression Number of obs = 1,000
LR chi2(2) = 1.95
Prob > chi2 = 0.3772
Log likelihood = -572.08193 Pseudo R2 = 0.0017

------------------------------------------------------------------------------
event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.0170504 .254303 -0.07 0.947 -.5154751 .4813743
x2 | -.3457227 .2491851 -1.39 0.165 -.8341166 .1426711
_cons | -.8694459 .1848183 -4.70 0.000 -1.231683 -.5072087
------------------------------------------------------------------------------

Block 3: x3

Iteration 0: Log likelihood = -573.05692
Iteration 1: Log likelihood = -571.96243
Iteration 2: Log likelihood = -571.96171
Iteration 3: Log likelihood = -571.96171

Logistic regression Number of obs = 1,000
LR chi2(3) = 2.19
Prob > chi2 = 0.5338
Log likelihood = -571.96171 Pseudo R2 = 0.0019

------------------------------------------------------------------------------
event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
x1 | -.013244 .2544481 -0.05 0.958 -.5119531 .4854652
x2 | -.3461677 .2490922 -1.39 0.165 -.8343794 .142044
x3 | .1234019 .2517322 0.49 0.624 -.3699841 .6167879
_cons | -.9345228 .2278585 -4.10 0.000 -1.381117 -.4879284
------------------------------------------------------------------------------


+----------------------------------------------------------------+
| Block | LL LR df Pr > LR AIC BIC |
|-------+--------------------------------------------------------|
| 1 | -573.0473 0.02 1 0.8895 1150.095 1159.91 |
| 2 | -572.0819 1.93 1 0.1647 1150.164 1164.887 |
| 3 | -571.9617 0.24 1 0.6239 1151.923 1171.554 |
+----------------------------------------------------------------+

Announcement