Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cross sectional by each year and each industry

    Hi all! I am new to Stata, but I need to run quite a complicated regression. Please help me.

    I have an unbalanced panel dataset of roughly 30,000 companies in over 20 years and each firm is part of an industry. I have 48 industry and create dummy variable that contains numbers from 1 to 48 to explain the industry.

    I want to run regression for each industry in each year. Since I have 48 industry and 20 years, so if I do it manually, I will need to do it roughly (48x20) 960times which is very time consuming. I am looking for help to run it automatically & continuously by using stata.

    The regression model is linear and have 3 independent variables: Y=c+b1x1+b2x2+b3x3+e

    I only need to save the residuals of these models in one series if possible. And i would like to run the regression only in industry-year with at least 20 observations to avoid poor fit.

    Im using Stata 15.

    So, is there any way to run it continuously & automatically? Because Im afraid that I have to run it manually 960 times and save the residuals one by one. I am looking forward to details explanation for it since I am new to Stata. Your help is very much appreciated!

  • #2
    This will do it quickly and easily:

    Code:
    capture program drop one_regression
    program define one_regression
        capture regress y x1 x2 x3
        if c(rc)!= 0 {
            gen int error_code = c(rc)
        }
        else if e(N) >= 20 {
            predict residual, resid
        }
        else {
            gen int error_code = 2001
        }
        exit
    end
    
    runby one_regression, by(industry year) status
    To use this you must install the -runby- program, written by Robert Picard and me, available from SSC.

    Explanation: The program one_regression will perform the regression analysis on a single industry-year combination. If there are at least 20 observations in the estimation sample, the residuals are calculated and stored in a new variable called residual. Otherwise, or if there is some other problem encountered during the regression, then the error code encountered is stored in a new variable called error_code. The error code 2000 refers to no observations at all, and 2001 refers to insufficient observations. Any other error code is an unanticipated problem and if -tab error_code- at the end shows any others, it behooves you to look up those error codes and find out what went wrong with those industry-years.

    Added:

    With a fake data set of about the size you describe, on my setup (Stata 15.1 MP2 on a mid-range computer) this code takes about 10 seconds to run all 960 regressions.

    A couple of comments. You refer to saving the residuals in a "series." Stata data sets don't have series. They have observations and variables. I believe you want a new variable to contain the residuals. That is what the code above well give you.

    You also refer to using only samples of at least 20 observations to avoid "poor fit." Actually the problem is quite the opposite. With small samples you are likely to overfit your data, producing a regression that models the noise as much as any signal, and is therefore not reproducible. By contrast, with a large sample, if the linear model you have proposed is not a correct specification of the real world data generation process, you may well find that the model fits poorly.

    You should not even think about coding 960 regressions manually. Even if you have all the time in the world to do it, the probability of coding all 960 with no errors is close to zero. This sort of repetitive task must always be automated.
    Last edited by Clyde Schechter; 20 Dec 2018, 23:56.

    Comment


    • #3
      Thank you very very much!! You save my day.

      Comment


      • #4
        Here is another way to do it
        Code:
        ssc install asreg
        bys industry year: asreg y x1 x2 x3, fit min(20)
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          This should work too:

          Code:
          ssc install rangestat
          rangestat (reg) y x1 x2 x3, by(industry) interval(year 0 0)
          gen residual = y - b_x1 * x1 - b_x2 * x2 - b_x3 * x3 if reg_nobs >= 20
          Last edited by Nick Cox; 21 Dec 2018, 04:06.

          Comment


          • #6
            Thank you for your replies! Very grateful to have the forum in here

            Comment


            • #7
              Dear Statalist experts,
              I have issues with the codes provided. Stata will not save the residuals and I receive a completely erased dataset.

              I would like to run the following regression by each industry (sic) code and fiscal year (fyear), clustering the SE by firm (cik).
              Code:
              regress TA deltaRev PPE OI, r cluster(cik)
              Data:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(TA deltaRev PPE OI) double fyear long cik int sic
                 5.938          .   .6464909   .14989895 2004   20 3823
                 3.226  .07233756   .5972169    .1917293 2005   20 3823
                 6.116  .20700587   .4640628    .1777639 2006   20 3823
                 5.727  .29084608    .359753    .2034076 2007   20 3823
                  .885  .20728125  .35117126   .22027737 2008   20 3823
                 10.03 -.25580212    .357317   .17579173 2009   20 3823
                35.485          .  .34852985    .0843642 2004 1750 3720

              This is the code I run:

              Code:
              capture program drop one_regression
              program define one_regression
              capture regress TA deltaRev PPE OI, r cluster(cik)
                  if c(rc)!= 0 {
                      gen int error_code = c(rc)
                  }
                  else if e(N) >= 1 {
                      predict residual, resid
                  }
                  else {
                      gen int error_code = 2001
                  }
                  exit
              end
              
              runby one_regression, by(sic fyear)


              Code:
              --------------------------------------
              Number of by-groups    =         4,925
              by-groups with errors  =         4,925
              by-groups with no data =             0
              Observations processed =       102,421
              Observations saved     =             0
              --------------------------------------
              As you can see, there are 0 saved observations. All dataset is cleared from memory as a result of running the code.


              And for this one
              Code:
              rangestat (reg) TA deltaRev PPE OI, by(sic) interval(fyear 2010 2017)
              gen residual = y - b_x1 * x1 - b_x2 * x2 - b_x3 * x3 
              no result for all obs: reg TA deltaRev PPE OI
              
              varlist required
              r(100);
              Last edited by Ingrid Lambert; 29 Jun 2020, 11:50.

              Comment


              • #8
                Dear Statalist experts,
                I require your help with a loop.

                I would like to run the following regression:

                Code:
                regress LnTA LndeltaRev PPE LnOI, r cluster(cik)
                
                predict AbnormalAccruals, residuals

                by fiscal year (fyear) and 2-digit SIC code (sic), clustering the standard errors at the firm level (cik),
                meaning I want the loop to do something like this, and then store residuals:

                Code:
                regress LnTA LndeltaRev PPE LnOI, r cluster(cik) for sic==10 & year==2010
                regress LnTA LndeltaRev PPE LnOI, r cluster(cik) for sic==11 & year==2010
                ..
                regress LnTA LndeltaRev PPE LnOI, r cluster(cik) for sic==10 & year==2017
                regress LnTA LndeltaRev PPE LnOI, r cluster(cik) for sic==11 & year==2017

                The SIC codes in my dataset have 3-4 numbers, so to achieve "2-digit SIC code", taking only two first digits, I will create a new variable:
                Code:
                tostring sic, gen(ssic) format(%04.0f)
                gen sic1 = substr(ssic, 1, 2)
                encode ssic, gen(sic1)



                I leave a reproducible sample below.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(LnTA LndeltaRev PPE LnOI) long cik int sic str4 ssic long sic1
                 3.6576984 -1.3917106   .3831964 -2.1608682 1750 3720 "3720" 30
                  3.276918 -1.9947414   .3436112 -2.2884407 1750 3720 "3720" 30
                  4.681205 -3.1388006   .3654359 -2.1650372 1750 3720 "3720" 30
                  4.203199          .   .3808138 -2.1508079 1750 3720 "3720" 30
                 3.9740584          .   .4577558  -2.895932 1750 3720 "3720" 30
                end
                label values sic1 sic1
                label def sic1 2 "01", modify
                label def sic1 21 "28", modify
                label def sic1 22 "29", modify
                Last edited by Ingrid Lambert; 25 Aug 2020, 02:49.

                Comment


                • #9
                  Originally posted by Ingrid Lambert View Post
                  Dear Statalist experts,
                  I require your help with a loop.

                  I would like to run the following regression:

                  Code:
                  regress LnTA LndeltaRev PPE LnOI, r cluster(cik)
                  
                  predict AbnormalAccruals, residuals

                  by fiscal year (fyear) and 2-digit SIC code (sic), clustering the standard errors at the firm level (cik),
                  meaning I want the loop to do something like this, and then store residuals:

                  If I use this code, the program does not store any results whatsoever:
                  Code:
                  gen res = .
                  capture program drop myregression
                  program define myregression
                      regress LnTA LndeltaRev PPE LnOI, r cluster(cik)
                  
                      foreach v of varlist LndeltaRev PPE LnOI {
                          gen b_`v' = _b[`v']
                          gen se_`v' = _se[`v']
                  predict ehat1, res
                  replace res = ehat1 if e(sample)   }
                     gen r2 = e(r2)
                     gen n_obs = e(N)
                    
                     exit
                  end
                  
                  runby myregression, by(sic1 fyear)
                  It works if I remove two ehat1 lines from the loop. But I need to obtain and store those residuals.

                  Comment


                  • #10
                    Originally posted by Ingrid Lambert View Post
                    Update: solved

                    Code:
                    capture program drop myregression
                    program define myregression
                        regress LnTA LndeltaRev PPE LnOI, r cluster(cik)
                    
                        foreach v of varlist LndeltaRev PPE LnOI {
                            gen b_`v' = _b[`v']
                            gen se_`v' = _se[`v']
                      }
                       gen r2 = e(r2)
                       gen n_obs = e(N)
                      predict residu  if e(sample), resid
                       exit
                    end
                    
                    runby myregression, by(sic1 fyear)
                    
                    label var residu "Abnormal Accruals"
                    
                    regress residu LnPopul $controls2 i.sic i.fyear, r cluster(cik)

                    Comment


                    • #11
                      Thank you for closing the thread by sharing the solution you found. This will likely help others who encounter a similar problem in the future.

                      Comment


                      • #12
                        Originally posted by Attaullah Shah View Post
                        Here is another way to do it
                        Code:
                        ssc install asreg
                        bys industry year: asreg y x1 x2 x3, fit min(20)
                        Dear Prof. Attaullah Shah,

                        My regression model 1 is: reported operating revenue (y) = expected operating revenue (x) + unexpected revenue (residual). Then, my regression model 2 is: unexpected revenue (y) = x1+x2. It means that I need to use the residual in model 1 and define it as y in model2. When I use "bys industry year: asreg y x, fit" command, I then cannot use "predict unexpected revenue, residual" command. So, may I know how to define the residual in model as the dependent variable y in model 2? I am really looking forward to your reply.

                        Best wishes,
                        Freya

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post
                          Thank you for closing the thread by sharing the solution you found. This will likely help others who encounter a similar problem in the future.
                          Dear Prof. Clyde Schechter,

                          I have a panel data with 3,961 observations, and now I need to regress my panel data for cross-sectionally for each industry-year. There are 123 industries and 8 year peirods, and when I use the command:

                          capture program drop one_regression
                          program define one_regression
                          capture regress OR_at_1 AT_1 OR_at_2 MTB_1 AR_1_at_2 AR_at_1
                          if c(rc)!= 0 {
                          gen int error_code = c(rc)
                          }
                          else if e(N) >= 7 {
                          predict residual, resid
                          }
                          else {
                          gen int error_code = 2001
                          }
                          exit
                          end

                          runby one_regression, by(sic fyear) status


                          Then I will get 123*8=984 numbers of beta1, beta2. But I want to output the table like below, namely I need to output the aggregate coefficient (beta1and beta2). Could you please provide me a command which can help me achieve it?

                          Variables OR_at_1
                          AT_1 -0.484***
                          OR_at_2 0.033
                          MTB_1 0.093***
                          AR_1_at_2
                          AR_at_1

                          Best Wishes,
                          Freya

                          Comment


                          • #14
                            I'm sorry, but I don't understand what you are asking. There is no mention of beta1 and beta2 in your post, nor even anywhere in this thread. What are they? Nor do I understand in what sense you want to aggregate the coefficients. Please clarify.

                            Comment


                            • #15
                              @Freya Yu
                              asreg reports the residuals by creating a new variable in the dataset with the name _residuals when you use the option fit. You do not need to use the predict command, nor it is supported by asreg. See this example from the help file of asreg.

                              Code:
                                 webuse grunfeld, clear
                               
                                 bys company: asreg invest mvalue kstock, fit
                              
                              
                              . list company year invest mvalue kstock _residuals in 1/10
                              
                                   +--------------------------------------------------------+
                                   | company   year   invest   mvalue   kstock   _residuals |
                                   |--------------------------------------------------------|
                                1. |       5   1935    39.68    157.7    183.2      23.5255 |
                                2. |       2   1935    209.9   1362.4     53.8    70.008197 |
                                3. |       7   1935    24.43      138    100.2     10.12914 |
                                4. |       4   1935    40.29    417.5     10.5   -2.8379427 |
                                5. |       1   1935    317.6   3078.5      2.8    1.7099041 |
                                   |--------------------------------------------------------|
                                6. |      10   1935     2.54    70.91      4.5   -5.0751803 |
                                7. |       8   1935    12.93    191.5      1.8   -7.0507822 |
                                8. |       6   1935    20.36      197      6.5   -.17514458 |
                                9. |       3   1935     33.1   1170.6     97.8   -87.044934 |
                               10. |       9   1935    26.63    290.6      162   -3.1887568 |
                                   +--------------------------------------------------------+
                              Last edited by Attaullah Shah; 12 Jul 2023, 08:36.
                              Regards
                              --------------------------------------------------
                              Attaullah Shah, PhD.
                              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                              FinTechProfessor.com
                              https://asdocx.com
                              Check out my asdoc program, which sends outputs to MS Word.
                              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                              Comment

                              Working...
                              X