Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • looping regressions

    Hi all,

    I looked at Clyde's response on this post regarding looping regressions, but I fail to execute it properly.

    I am not sure where I am going wrong with it.

    Consider a dependent variable being -y- and three independent variables -x1, x2, x3-.

    I want to run:
    Code:
    reg y x1
    reg y x1 x2
    reg y x1 x2 x3

    The below code doesn't work, because each regression runs x1 only, x2 only, and x3 only.
    Code:
    local yvar y
    local xvar x1 x2 x3
    foreach y of local ypvar{
    foreach p of local xvar{
     reg `y' x`p'
    }
    }
    I got around this by making the inner loop with -forvalues- by declaring something like:

    Code:
    local xv1 x1
    local xv2 x1 x2
    local xv3 x1 x2 x3
    However, what if I have 20 variables I want to cumulatively run regression in addition to my previous specification (e.g., run -reg y x1-, then run -reg y x1 x2-, etc until x20?)

    Please advise and help!

    Thanks.

  • #2
    Suppose I have one dependent variable and five independent variables:

    Code:
    clear
    set obs 1000
    
    gen y = runiform()
    forv i = 1/5 {
        gen x`i' = runiform()
    }
    Seems like you want a loop like this:

    Code:
    foreach var in x1 x2 x3 x4 x5 {
        local indvars = "`indvars' `var'"
        display "Model: `indvars'"
        reg y `indvars'
    }
    Code:
    Model:  x1
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(1, 998)       =      0.00
           Model |  .000053419         1  .000053419   Prob > F        =    0.9794
        Residual |  79.8543064       998  .080014335   R-squared       =    0.0000
    -------------+----------------------------------   Adj R-squared   =   -0.0010
           Total |  79.8543598       999  .079934294   Root MSE        =    .28287
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |   .0008324   .0322175     0.03   0.979    -.0623894    .0640543
           _cons |   .5131572   .0181225    28.32   0.000     .4775947    .5487197
    ------------------------------------------------------------------------------
    Model:  x1 x2
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(2, 997)       =      1.10
           Model |  .175361123         2  .087680561   Prob > F        =    0.3342
        Residual |  79.6789987       997  .079918755   R-squared       =    0.0022
    -------------+----------------------------------   Adj R-squared   =    0.0002
           Total |  79.8543598       999  .079934294   Root MSE        =     .2827
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |  -.0003049   .0322074    -0.01   0.992     -.063507    .0628972
              x2 |  -.0464566   .0313669    -1.48   0.139    -.1080094    .0150961
           _cons |   .5375307   .0244715    21.97   0.000     .4895092    .5855522
    ------------------------------------------------------------------------------
    Model:  x1 x2 x3
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(3, 996)       =      0.74
           Model |  .176702018         3  .058900673   Prob > F        =    0.5305
        Residual |  79.6776578       996  .079997648   R-squared       =    0.0022
    -------------+----------------------------------   Adj R-squared   =   -0.0008
           Total |  79.8543598       999  .079934294   Root MSE        =    .28284
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |  -.0003865   .0322295    -0.01   0.990     -.063632     .062859
              x2 |  -.0463146   .0314015    -1.47   0.141    -.1079354    .0153061
              x3 |    .003944   .0304637     0.13   0.897    -.0558363    .0637244
           _cons |   .5354937   .0291031    18.40   0.000     .4783832    .5926042
    ------------------------------------------------------------------------------
    Model:  x1 x2 x3 x4
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(4, 995)       =      0.56
           Model |  .179383054         4  .044845763   Prob > F        =    0.6917
        Residual |  79.6749768       995  .080075354   R-squared       =    0.0022
    -------------+----------------------------------   Adj R-squared   =   -0.0018
           Total |  79.8543598       999  .079934294   Root MSE        =    .28298
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |  -.0005174   .0322531    -0.02   0.987    -.0638093    .0627744
              x2 |  -.0462129   .0314217    -1.47   0.142    -.1078733    .0154475
              x3 |   .0039301   .0304786     0.13   0.897    -.0558795    .0637397
              x4 |  -.0057144   .0312297    -0.18   0.855    -.0669981    .0555693
           _cons |   .5384561   .0333157    16.16   0.000     .4730791    .6038332
    ------------------------------------------------------------------------------
    Model:  x1 x2 x3 x4 x5
    
          Source |       SS           df       MS      Number of obs   =     1,000
    -------------+----------------------------------   F(5, 994)       =      0.70
           Model |  .278965475         5  .055793095   Prob > F        =    0.6258
        Residual |  79.5753944       994  .080055729   R-squared       =    0.0035
    -------------+----------------------------------   Adj R-squared   =   -0.0015
           Total |  79.8543598       999  .079934294   Root MSE        =    .28294
    
    ------------------------------------------------------------------------------
               y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |  -.0009052    .032251    -0.03   0.978     -.064193    .0623826
              x2 |  -.0462817   .0314179    -1.47   0.141    -.1079348    .0153713
              x3 |   .0047963   .0304847     0.16   0.875    -.0550255    .0646181
              x4 |  -.0057489   .0312259    -0.18   0.854    -.0670252    .0555274
              x5 |   .0345345   .0309641     1.12   0.265     -.026228    .0952971
           _cons |   .5211928   .0367321    14.19   0.000     .4491114    .5932742
    ------------------------------------------------------------------------------

    Comment


    • #3
      Daniel, this is very helpful. What if I had two indepdent variables -y1, y2-, and I want to cumulatively loop regression from x1 to x3 on both?

      Would I need a nested loop? I am failing to follow the coding suggested by you and Clyde with the two dependent variable case.

      Originally posted by Daniel Schaefer View Post
      Suppose I have one dependent variable and five independent variables:

      Code:
      clear
      set obs 1000
      
      gen y = runiform()
      forv i = 1/5 {
      gen x`i' = runiform()
      }
      Seems like you want a loop like this:

      Code:
      foreach var in x1 x2 x3 x4 x5 {
      local indvars = "`indvars' `var'"
      display "Model: `indvars'"
      reg y `indvars'
      }
      Code:
      Model: x1
      
      Source | SS df MS Number of obs = 1,000
      -------------+---------------------------------- F(1, 998) = 0.00
      Model | .000053419 1 .000053419 Prob > F = 0.9794
      Residual | 79.8543064 998 .080014335 R-squared = 0.0000
      -------------+---------------------------------- Adj R-squared = -0.0010
      Total | 79.8543598 999 .079934294 Root MSE = .28287
      
      ------------------------------------------------------------------------------
      y | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      x1 | .0008324 .0322175 0.03 0.979 -.0623894 .0640543
      _cons | .5131572 .0181225 28.32 0.000 .4775947 .5487197
      ------------------------------------------------------------------------------
      Model: x1 x2
      
      Source | SS df MS Number of obs = 1,000
      -------------+---------------------------------- F(2, 997) = 1.10
      Model | .175361123 2 .087680561 Prob > F = 0.3342
      Residual | 79.6789987 997 .079918755 R-squared = 0.0022
      -------------+---------------------------------- Adj R-squared = 0.0002
      Total | 79.8543598 999 .079934294 Root MSE = .2827
      
      ------------------------------------------------------------------------------
      y | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      x1 | -.0003049 .0322074 -0.01 0.992 -.063507 .0628972
      x2 | -.0464566 .0313669 -1.48 0.139 -.1080094 .0150961
      _cons | .5375307 .0244715 21.97 0.000 .4895092 .5855522
      ------------------------------------------------------------------------------
      Model: x1 x2 x3
      
      Source | SS df MS Number of obs = 1,000
      -------------+---------------------------------- F(3, 996) = 0.74
      Model | .176702018 3 .058900673 Prob > F = 0.5305
      Residual | 79.6776578 996 .079997648 R-squared = 0.0022
      -------------+---------------------------------- Adj R-squared = -0.0008
      Total | 79.8543598 999 .079934294 Root MSE = .28284
      
      ------------------------------------------------------------------------------
      y | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      x1 | -.0003865 .0322295 -0.01 0.990 -.063632 .062859
      x2 | -.0463146 .0314015 -1.47 0.141 -.1079354 .0153061
      x3 | .003944 .0304637 0.13 0.897 -.0558363 .0637244
      _cons | .5354937 .0291031 18.40 0.000 .4783832 .5926042
      ------------------------------------------------------------------------------
      Model: x1 x2 x3 x4
      
      Source | SS df MS Number of obs = 1,000
      -------------+---------------------------------- F(4, 995) = 0.56
      Model | .179383054 4 .044845763 Prob > F = 0.6917
      Residual | 79.6749768 995 .080075354 R-squared = 0.0022
      -------------+---------------------------------- Adj R-squared = -0.0018
      Total | 79.8543598 999 .079934294 Root MSE = .28298
      
      ------------------------------------------------------------------------------
      y | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      x1 | -.0005174 .0322531 -0.02 0.987 -.0638093 .0627744
      x2 | -.0462129 .0314217 -1.47 0.142 -.1078733 .0154475
      x3 | .0039301 .0304786 0.13 0.897 -.0558795 .0637397
      x4 | -.0057144 .0312297 -0.18 0.855 -.0669981 .0555693
      _cons | .5384561 .0333157 16.16 0.000 .4730791 .6038332
      ------------------------------------------------------------------------------
      Model: x1 x2 x3 x4 x5
      
      Source | SS df MS Number of obs = 1,000
      -------------+---------------------------------- F(5, 994) = 0.70
      Model | .278965475 5 .055793095 Prob > F = 0.6258
      Residual | 79.5753944 994 .080055729 R-squared = 0.0035
      -------------+---------------------------------- Adj R-squared = -0.0015
      Total | 79.8543598 999 .079934294 Root MSE = .28294
      
      ------------------------------------------------------------------------------
      y | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      x1 | -.0009052 .032251 -0.03 0.978 -.064193 .0623826
      x2 | -.0462817 .0314179 -1.47 0.141 -.1079348 .0153713
      x3 | .0047963 .0304847 0.16 0.875 -.0550255 .0646181
      x4 | -.0057489 .0312259 -0.18 0.854 -.0670252 .0555274
      x5 | .0345345 .0309641 1.12 0.265 -.026228 .0952971
      _cons | .5211928 .0367321 14.19 0.000 .4491114 .5932742
      ------------------------------------------------------------------------------

      Comment


      • #4
        For exactly two dependent variables I would probably just use the same loop twice.

        Code:
        foreach var in x1 x2 x3 x4 x5 {
            local indvars = "`indvars' `var'"
            display "Model: `indvars'"
            reg y1 `indvars'
        }
        
        foreach var in x1 x2 x3 x4 x5 {
            local indvars = "`indvars' `var'"
            display "Model: `indvars'"
            reg y2 `indvars'
        }
        For an arbitrary number of dependent variables, you'd want a nested loop.

        Code:
        foreach dep in y1 y2{
            local indvars = ""
            foreach var in x1 x2 x3 x4 x5 {
                local indvars = "`indvars' `var'"
                display "Model: `dep' `indvars'"
                reg `dep' `indvars'
            }
        }

        Comment


        • #5
          My first thought was to use -nestreg- rather than modifying a local macro on each loop. Here is how it would work for the scenario you described in #3.

          Code:
          clear
          set obs 1000
          gen y1 = runiform()
          gen y2 = runiform()
          forv i = 1/3 {
              gen x`i' = runiform()
          }
          
          forvalues i = 1/2 {
            nestreg: regress y`i' (x1) (x2) (x3)    
          }
          Output:
          Code:
          . forvalues i = 1/2 {
            2.   nestreg: regress y`i' (x1) (x2) (x3)  
            3. }
          
          Block 1: x1
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(1, 998)       =      1.05
                 Model |  .086985453         1  .086985453   Prob > F        =    0.3047
              Residual |   82.323753       998   .08248873   R-squared       =    0.0011
          -------------+----------------------------------   Adj R-squared   =    0.0001
                 Total |  82.4107384       999  .082493232   Root MSE        =    .28721
          
          ------------------------------------------------------------------------------
                    y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |  -.0328288    .031969    -1.03   0.305    -.0955631    .0299054
                 _cons |   .5091189    .018474    27.56   0.000     .4728665    .5453712
          ------------------------------------------------------------------------------
          
          Block 2: x2
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(2, 997)       =      0.53
                 Model |  .087046704         2  .043523352   Prob > F        =    0.5905
              Residual |  82.3236917       997  .082571406   R-squared       =    0.0011
          -------------+----------------------------------   Adj R-squared   =   -0.0009
                 Total |  82.4107384       999  .082493232   Root MSE        =    .28735
          
          ------------------------------------------------------------------------------
                    y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |  -.0327839   .0320276    -1.02   0.306    -.0956332    .0300654
                    x2 |  -.0008527   .0313063    -0.03   0.978    -.0622864    .0605811
                 _cons |   .5095165   .0235528    21.63   0.000     .4632977    .5557352
          ------------------------------------------------------------------------------
          
          Block 3: x3
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(3, 996)       =      1.40
                 Model |  .345565665         3  .115188555   Prob > F        =    0.2419
              Residual |  82.0651728       996  .082394752   R-squared       =    0.0042
          -------------+----------------------------------   Adj R-squared   =    0.0012
                 Total |  82.4107384       999  .082493232   Root MSE        =    .28704
          
          ------------------------------------------------------------------------------
                    y1 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |   -.034607   .0320099    -1.08   0.280    -.0974216    .0282076
                    x2 |  -.0005217   .0312733    -0.02   0.987    -.0618909    .0608475
                    x3 |  -.0560831   .0316618    -1.77   0.077    -.1182146    .0060484
                 _cons |   .5389422   .0288013    18.71   0.000     .4824239    .5954605
          ------------------------------------------------------------------------------
          
          
            +-------------------------------------------------------------+
            |       |          Block  Residual                     Change |
            | Block |       F     df        df   Pr > F       R2    in R2 |
            |-------+-----------------------------------------------------|
            |     1 |    1.05      1       998   0.3047   0.0011          |
            |     2 |    0.00      1       997   0.9783   0.0011   0.0000 |
            |     3 |    3.14      1       996   0.0768   0.0042   0.0031 |
            +-------------------------------------------------------------+
          
          Block 1: x1
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(1, 998)       =      0.22
                 Model |  .019210197         1  .019210197   Prob > F        =    0.6384
              Residual |  86.7356242       998  .086909443   R-squared       =    0.0002
          -------------+----------------------------------   Adj R-squared   =   -0.0008
                 Total |  86.7548344       999  .086841676   Root MSE        =     .2948
          
          ------------------------------------------------------------------------------
                    y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |  -.0154276   .0328145    -0.47   0.638    -.0798209    .0489657
                 _cons |   .5101567   .0189626    26.90   0.000     .4729456    .5473677
          ------------------------------------------------------------------------------
          
          Block 2: x2
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(2, 997)       =      1.33
                 Model |  .231617627         2  .115808814   Prob > F        =    0.2638
              Residual |  86.5232168       997  .086783567   R-squared       =    0.0027
          -------------+----------------------------------   Adj R-squared   =    0.0007
                 Total |  86.7548344       999  .086841676   Root MSE        =    .29459
          
          ------------------------------------------------------------------------------
                    y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |  -.0180751   .0328344    -0.55   0.582    -.0825075    .0463573
                    x2 |   .0502113   .0320949     1.56   0.118    -.0127699    .1131926
                 _cons |   .4867433   .0241461    20.16   0.000     .4393603    .5341262
          ------------------------------------------------------------------------------
          
          Block 3: x3
          
                Source |       SS           df       MS      Number of obs   =     1,000
          -------------+----------------------------------   F(3, 996)       =      1.27
                 Model |  .329407656         3  .109802552   Prob > F        =    0.2849
              Residual |  86.4254268       996  .086772517   R-squared       =    0.0038
          -------------+----------------------------------   Adj R-squared   =    0.0008
                 Total |  86.7548344       999  .086841676   Root MSE        =    .29457
          
          ------------------------------------------------------------------------------
                    y2 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                    x1 |  -.0191964   .0328493    -0.58   0.559    -.0836581    .0452653
                    x2 |   .0504149   .0320934     1.57   0.117    -.0125636    .1133933
                    x3 |  -.0344932    .032492    -1.06   0.289    -.0982539    .0292675
                 _cons |   .5048412   .0295566    17.08   0.000     .4468409    .5628415
          ------------------------------------------------------------------------------
          
          
            +-------------------------------------------------------------+
            |       |          Block  Residual                     Change |
            | Block |       F     df        df   Pr > F       R2    in R2 |
            |-------+-----------------------------------------------------|
            |     1 |    0.22      1       998   0.6384   0.0002          |
            |     2 |    2.45      1       997   0.1180   0.0027   0.0024 |
            |     3 |    1.13      1       996   0.2887   0.0038   0.0011 |
            +-------------------------------------------------------------+
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            Code:
            foreach y in y1 y2 {
                  local indvars
            
                  foreach var in x1 x2 x3 x4 x5 {
                         local indvars `indvars' `var'
                         display "Model: `y' `indvars'
                         reg `y' `indvars'
                 }
            }

            Comment


            • #7
              Many thanks to Bruce, Daniel, and Nick.

              Nick -- if I declared local dependent and indepdent variable sets before the loop, would this coding work?

              Code:
              local yvar y1 y2
              local xvar x1 x2 x3
              foreach y in local yvar {
                    local indvars
                    foreach var in local xvar {
                           local indvars `indvars' `var'
                           display "Model: `y' `indvars'
                           reg `y' `indvars'
                   }
              }
              Originally posted by Nick Cox View Post
              Code:
              foreach y in y1 y2 {
              local indvars
              
              foreach var in x1 x2 x3 x4 x5 {
              local indvars `indvars' `var'
              display "Model: `y' `indvars'
              reg `y' `indvars'
              }
              }

              Comment


              • #8
                A problem with #4 is that the local macro indvars needs to be cleared before first use. This is done in the second block of code but not the first.

                #7 No. That won't work. in local yvar is legal until you get inside the loop when you find that Stata doesn't understand you. Read the help for foreach to see why. The keywords in and of have utterly different meaning.

                Comment


                • #9
                  Hi Nick, thanks for the response!

                  I will read the help -foreach-.

                  But a quick question in your code:

                  1. What does -local indvars- in the first loop do?
                  2. What does -local indvars `indvars' `var' do in the second do?

                  I want to understand the local macro indvars being cleared at which step.

                  Thanks!

                  Originally posted by Nick Cox View Post
                  A problem with #4 is that the local macro indvars needs to be cleared before first use. This is done in the second block of code but not the first.

                  #7 No. That won't work. in local yvar is legal until you get inside the loop when you find that Stata doesn't understand you. Read the help for foreach to see why. The keywords in and of have utterly different meaning.

                  Comment


                  • #10
                    The questions may be quick, but the answers won't be.

                    #8 already addresses your first question.

                    I've written various tutorial reviews in this territory that discuss the main ideas in some detail.

                    https://journals.sagepub.com/doi/pdf...36867X20976340

                    https://journals.sagepub.com/doi/pdf...6867X211063415



                    Comment


                    • #11
                      #5 is a great answer. I was not aware of -nestreg-, but it looks well suited to this problem. Seems worth it just for the cross-model comparisons at the end. I can imagine looping doesn't scale well once you have many models that all have to be interpreted. Those summary statistics at the end seem really useful for this kind of thing.

                      Nick, thanks for the more idiomatic code in #6 and for pointing out the issue with the first block in #4. Good catch, as usual.

                      Comment


                      • #12
                        Hi Nick, the first article really helped me to understand the local and loop structure in Stata better.

                        So quick questions with my own responses after reading your co-authored first article: can you please correct if there is a misunderstanding?

                        1. What does -local indvars- in the first loop do?
                        2. What does -local indvars `indvars' `var'- do in the second do?

                        1) you are declaring a local variable called "indvars" so that this is "live" in the innerloop. So, after each inner loop's completion, you are clearing this local indvars, because the final indvars from the last step of the inner loop will look like:

                        -`x1 x2 x3'-

                        2) The second line -local indvars `indvars' `var'- you are adding a list of string (independent variables) for the loop local variable.

                        Am I understanding this correctly?

                        Originally posted by Nick Cox View Post
                        The questions may be quick, but the answers won't be.

                        #8 already addresses your first question.

                        I've written various tutorial reviews in this territory that discuss the main ideas in some detail.

                        https://journals.sagepub.com/doi/pdf...36867X20976340

                        https://journals.sagepub.com/doi/pdf...6867X211063415


                        Comment


                        • #13
                          More or less. It's more subtle in either case.


                          Code:
                          local indvars
                          can be interpreted in two ways:

                          1. Setting the local to an empty string.

                          2. Abolishing the local macro altogether.

                          Stata doesn't really distinguish between the two cases. In the rest of life, if I have a box or bag with nothing in it, it's still a box or bag, which just happens to be empty. But to Stata an empty local macro doesn't exist....

                          If I go

                          Code:
                          local beasts `frog' toad
                          and the local macro frog doesn't exist, that's perfectly legal. Stata just substitutes an empty string, or otherwise put ignores the reference.


                          Code:
                          local indvars `indvars' `var'
                          is just adding one item at a time -- whatever is in local macro var -- to what is already in local macro indvars. (As said, that might be nothing, if the latter does not yet exist.)

                          The wording

                          adding a list of string (independent variables) for the loop local variable
                          doesn't capture this exactly, It does capture what happens as this is done repeatedly in a loop.

                          Also, Stata developers and programmers (should) never use the term local variable. In Stata a variable is (in other terms) a column or field in a dataset, and nothing else. (In Mata, matters are different.) Macros in Stata are not properly called variables. Naturally, they resemble in many ways variables in other languages, and people with more knowledge and experience of other languages are tempted to call them variables, but that's private or personal temptation, not correct as a matter of Stata usage. Call this pedantic if you like, but pedants who pay attention to detail get to write good Stata code more easily and effectively, and the others will get into messes more often.

                          More at https://www.stata.com/statalist/arch.../msg01258.html

                          Comment


                          • #14
                            Originally posted by Daniel Schaefer View Post
                            #5 is a great answer. I was not aware of -nestreg-, but it looks well suited to this problem. Seems worth it just for the cross-model comparisons at the end. I can imagine looping doesn't scale well once you have many models that all have to be interpreted. Those summary statistics at the end seem really useful for this kind of thing.
                            Thanks Daniel. Note too that -nestreg- works with other types of models. Here's an example of using it with -logit-. The default here will give Wald tests comparing one block to the next, but I prefer likelihood ratio tests, so added the lrtable option.

                            Code:
                            * Example of -nestreg- with -logit-
                            * First, generate two dichotomous outcomes to illustrate
                            generate byte event1 = y1 > .75
                            generate byte event2 = y2 < .25
                            foreach event in event1 event2 {
                              nestreg, lrtable: logit `event' (x1) (x2) (x3)    
                            }
                            Output:
                            Code:
                            . foreach event in event1 event2 {
                              2.   nestreg, lrtable: logit `event' (x1) (x2) (x3)        
                              3. }
                            
                            Block 1: x1
                            
                            Iteration 0:  Log likelihood = -551.07993  
                            Iteration 1:  Log likelihood = -551.03233  
                            Iteration 2:  Log likelihood = -551.03233  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(1)    =   0.10
                                                                                    Prob > chi2   = 0.7577
                            Log likelihood = -551.03233                             Pseudo R2     = 0.0001
                            
                            ------------------------------------------------------------------------------
                                  event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |  -.0804124   .2606281    -0.31   0.758    -.5912341    .4304092
                                   _cons |  -1.112351   .1498459    -7.42   0.000    -1.406043   -.8186581
                            ------------------------------------------------------------------------------
                            
                            Block 2: x2
                            
                            Iteration 0:  Log likelihood = -551.07993  
                            Iteration 1:  Log likelihood = -550.42783  
                            Iteration 2:  Log likelihood = -550.42752  
                            Iteration 3:  Log likelihood = -550.42752  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(2)    =   1.30
                                                                                    Prob > chi2   = 0.5208
                            Log likelihood = -550.42752                             Pseudo R2     = 0.0012
                            
                            ------------------------------------------------------------------------------
                                  event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |  -.0953021   .2611743    -0.36   0.715    -.6071942    .4165901
                                      x2 |   .2806637   .2553362     1.10   0.272    -.2197861    .7811136
                                   _cons |  -1.244902   .1933907    -6.44   0.000    -1.623941   -.8658632
                            ------------------------------------------------------------------------------
                            
                            Block 3: x3
                            
                            Iteration 0:  Log likelihood = -551.07993  
                            Iteration 1:  Log likelihood = -550.39182  
                            Iteration 2:  Log likelihood = -550.39146  
                            Iteration 3:  Log likelihood = -550.39146  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(3)    =   1.38
                                                                                    Prob > chi2   = 0.7110
                            Log likelihood = -550.39146                             Pseudo R2     = 0.0012
                            
                            ------------------------------------------------------------------------------
                                  event1 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |  -.0976551   .2613331    -0.37   0.709    -.6098585    .4145482
                                      x2 |    .281291   .2554339     1.10   0.271    -.2193503    .7819323
                                      x3 |  -.0694632   .2586687    -0.27   0.788    -.5764446    .4375181
                                   _cons |  -1.208622   .2356945    -5.13   0.000    -1.670574   -.7466688
                            ------------------------------------------------------------------------------
                            
                            
                              +----------------------------------------------------------------+
                              | Block |        LL       LR     df  Pr > LR       AIC       BIC |
                              |-------+--------------------------------------------------------|
                              |     1 | -551.0323     0.10      1   0.7577  1106.065   1115.88 |
                              |     2 | -550.4275     1.21      1   0.2714  1106.855  1121.578 |
                              |     3 | -550.3915     0.07      1   0.7883  1108.783  1128.414 |
                              +----------------------------------------------------------------+
                            
                            Block 1: x1
                            
                            Iteration 0:  Log likelihood = -573.05692  
                            Iteration 1:  Log likelihood = -573.04726  
                            Iteration 2:  Log likelihood = -573.04726  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(1)    =   0.02
                                                                                    Prob > chi2   = 0.8895
                            Log likelihood = -573.04726                             Pseudo R2     = 0.0000
                            
                            ------------------------------------------------------------------------------
                                  event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |  -.0352701   .2537572    -0.14   0.889     -.532625    .4620849
                                   _cons |  -1.028244   .1463384    -7.03   0.000    -1.315062   -.7414262
                            ------------------------------------------------------------------------------
                            
                            Block 2: x2
                            
                            Iteration 0:  Log likelihood = -573.05692  
                            Iteration 1:  Log likelihood =  -572.0825  
                            Iteration 2:  Log likelihood = -572.08193  
                            Iteration 3:  Log likelihood = -572.08193  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(2)    =   1.95
                                                                                    Prob > chi2   = 0.3772
                            Log likelihood = -572.08193                             Pseudo R2     = 0.0017
                            
                            ------------------------------------------------------------------------------
                                  event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |  -.0170504    .254303    -0.07   0.947    -.5154751    .4813743
                                      x2 |  -.3457227   .2491851    -1.39   0.165    -.8341166    .1426711
                                   _cons |  -.8694459   .1848183    -4.70   0.000    -1.231683   -.5072087
                            ------------------------------------------------------------------------------
                            
                            Block 3: x3
                            
                            Iteration 0:  Log likelihood = -573.05692  
                            Iteration 1:  Log likelihood = -571.96243  
                            Iteration 2:  Log likelihood = -571.96171  
                            Iteration 3:  Log likelihood = -571.96171  
                            
                            Logistic regression                                     Number of obs =  1,000
                                                                                    LR chi2(3)    =   2.19
                                                                                    Prob > chi2   = 0.5338
                            Log likelihood = -571.96171                             Pseudo R2     = 0.0019
                            
                            ------------------------------------------------------------------------------
                                  event2 | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                      x1 |   -.013244   .2544481    -0.05   0.958    -.5119531    .4854652
                                      x2 |  -.3461677   .2490922    -1.39   0.165    -.8343794     .142044
                                      x3 |   .1234019   .2517322     0.49   0.624    -.3699841    .6167879
                                   _cons |  -.9345228   .2278585    -4.10   0.000    -1.381117   -.4879284
                            ------------------------------------------------------------------------------
                            
                            
                              +----------------------------------------------------------------+
                              | Block |        LL       LR     df  Pr > LR       AIC       BIC |
                              |-------+--------------------------------------------------------|
                              |     1 | -573.0473     0.02      1   0.8895  1150.095   1159.91 |
                              |     2 | -572.0819     1.93      1   0.1647  1150.164  1164.887 |
                              |     3 | -571.9617     0.24      1   0.6239  1151.923  1171.554 |
                              +----------------------------------------------------------------+
                            --
                            Bruce Weaver
                            Email: [email protected]
                            Version: Stata/MP 18.5 (Windows)

                            Comment


                            • #15
                              Hi Bruce,

                              I am now understanding how useful the -nestreg- is in my original situation which is multiple y variables and multiple x variables.

                              So does this work with reg, logit, probit, reghdfe, any ?

                              Thanks!

                              Originally posted by Bruce Weaver View Post

                              Thanks Daniel. Note too that -nestreg- works with other types of models. Here's an example of using it with -logit-. The default here will give Wald tests comparing one block to the next, but I prefer likelihood ratio tests, so added the lrtable option.

                              Code:
                              * Example of -nestreg- with -logit-
                              * First, generate two dichotomous outcomes to illustrate
                              generate byte event1 = y1 > .75
                              generate byte event2 = y2 < .25
                              foreach event in event1 event2 {
                              nestreg, lrtable: logit `event' (x1) (x2) (x3)
                              }
                              Output:
                              Code:
                              . foreach event in event1 event2 {
                              2. nestreg, lrtable: logit `event' (x1) (x2) (x3)
                              3. }
                              
                              Block 1: x1
                              
                              Iteration 0: Log likelihood = -551.07993
                              Iteration 1: Log likelihood = -551.03233
                              Iteration 2: Log likelihood = -551.03233
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(1) = 0.10
                              Prob > chi2 = 0.7577
                              Log likelihood = -551.03233 Pseudo R2 = 0.0001
                              
                              ------------------------------------------------------------------------------
                              event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.0804124 .2606281 -0.31 0.758 -.5912341 .4304092
                              _cons | -1.112351 .1498459 -7.42 0.000 -1.406043 -.8186581
                              ------------------------------------------------------------------------------
                              
                              Block 2: x2
                              
                              Iteration 0: Log likelihood = -551.07993
                              Iteration 1: Log likelihood = -550.42783
                              Iteration 2: Log likelihood = -550.42752
                              Iteration 3: Log likelihood = -550.42752
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(2) = 1.30
                              Prob > chi2 = 0.5208
                              Log likelihood = -550.42752 Pseudo R2 = 0.0012
                              
                              ------------------------------------------------------------------------------
                              event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.0953021 .2611743 -0.36 0.715 -.6071942 .4165901
                              x2 | .2806637 .2553362 1.10 0.272 -.2197861 .7811136
                              _cons | -1.244902 .1933907 -6.44 0.000 -1.623941 -.8658632
                              ------------------------------------------------------------------------------
                              
                              Block 3: x3
                              
                              Iteration 0: Log likelihood = -551.07993
                              Iteration 1: Log likelihood = -550.39182
                              Iteration 2: Log likelihood = -550.39146
                              Iteration 3: Log likelihood = -550.39146
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(3) = 1.38
                              Prob > chi2 = 0.7110
                              Log likelihood = -550.39146 Pseudo R2 = 0.0012
                              
                              ------------------------------------------------------------------------------
                              event1 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.0976551 .2613331 -0.37 0.709 -.6098585 .4145482
                              x2 | .281291 .2554339 1.10 0.271 -.2193503 .7819323
                              x3 | -.0694632 .2586687 -0.27 0.788 -.5764446 .4375181
                              _cons | -1.208622 .2356945 -5.13 0.000 -1.670574 -.7466688
                              ------------------------------------------------------------------------------
                              
                              
                              +----------------------------------------------------------------+
                              | Block | LL LR df Pr > LR AIC BIC |
                              |-------+--------------------------------------------------------|
                              | 1 | -551.0323 0.10 1 0.7577 1106.065 1115.88 |
                              | 2 | -550.4275 1.21 1 0.2714 1106.855 1121.578 |
                              | 3 | -550.3915 0.07 1 0.7883 1108.783 1128.414 |
                              +----------------------------------------------------------------+
                              
                              Block 1: x1
                              
                              Iteration 0: Log likelihood = -573.05692
                              Iteration 1: Log likelihood = -573.04726
                              Iteration 2: Log likelihood = -573.04726
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(1) = 0.02
                              Prob > chi2 = 0.8895
                              Log likelihood = -573.04726 Pseudo R2 = 0.0000
                              
                              ------------------------------------------------------------------------------
                              event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.0352701 .2537572 -0.14 0.889 -.532625 .4620849
                              _cons | -1.028244 .1463384 -7.03 0.000 -1.315062 -.7414262
                              ------------------------------------------------------------------------------
                              
                              Block 2: x2
                              
                              Iteration 0: Log likelihood = -573.05692
                              Iteration 1: Log likelihood = -572.0825
                              Iteration 2: Log likelihood = -572.08193
                              Iteration 3: Log likelihood = -572.08193
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(2) = 1.95
                              Prob > chi2 = 0.3772
                              Log likelihood = -572.08193 Pseudo R2 = 0.0017
                              
                              ------------------------------------------------------------------------------
                              event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.0170504 .254303 -0.07 0.947 -.5154751 .4813743
                              x2 | -.3457227 .2491851 -1.39 0.165 -.8341166 .1426711
                              _cons | -.8694459 .1848183 -4.70 0.000 -1.231683 -.5072087
                              ------------------------------------------------------------------------------
                              
                              Block 3: x3
                              
                              Iteration 0: Log likelihood = -573.05692
                              Iteration 1: Log likelihood = -571.96243
                              Iteration 2: Log likelihood = -571.96171
                              Iteration 3: Log likelihood = -571.96171
                              
                              Logistic regression Number of obs = 1,000
                              LR chi2(3) = 2.19
                              Prob > chi2 = 0.5338
                              Log likelihood = -571.96171 Pseudo R2 = 0.0019
                              
                              ------------------------------------------------------------------------------
                              event2 | Coefficient Std. err. z P>|z| [95% conf. interval]
                              -------------+----------------------------------------------------------------
                              x1 | -.013244 .2544481 -0.05 0.958 -.5119531 .4854652
                              x2 | -.3461677 .2490922 -1.39 0.165 -.8343794 .142044
                              x3 | .1234019 .2517322 0.49 0.624 -.3699841 .6167879
                              _cons | -.9345228 .2278585 -4.10 0.000 -1.381117 -.4879284
                              ------------------------------------------------------------------------------
                              
                              
                              +----------------------------------------------------------------+
                              | Block | LL LR df Pr > LR AIC BIC |
                              |-------+--------------------------------------------------------|
                              | 1 | -573.0473 0.02 1 0.8895 1150.095 1159.91 |
                              | 2 | -572.0819 1.93 1 0.1647 1150.164 1164.887 |
                              | 3 | -571.9617 0.24 1 0.6239 1151.923 1171.554 |
                              +----------------------------------------------------------------+

                              Comment

                              Working...
                              X