Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cross-sectional regression for each ind and year

    Dear All,

    I need to estimate a model using a cross-sectional regression for each industry and year over the period 2009 to 2014. The data is a panel data.
    I used the following code:

    bysort year industry: gen nobs = _N
    forval y = 2009(1)2014 {
    forval i = 1(1)50 {
    di "year = `y' and industry = `i'"
    reg part1 part2 part3 part4 part5 if industry== `i' & year==`y' & nobs>10, noconstant
    predict r if industry== `i' & year==`y' & nobs>10, resid
    }
    }

    However, every time I run this regression I have an error (although I have few missing observations) :
    no observations
    r(2000);

    I would really appreciate any advice regarding the code.
    Many thanks,
    Nour

  • #2
    Even if you don't have a lot of missing data, since an observation is omitted from a regression whenever any of its variables has a missing value, you can depopulate the estimation sample of a regression very quickly with missing values that are scattered over multiple observations. One approach is to -capture- the regression, verify that any error code generated came from the no observations (or insufficient observations) problem and move on.

    There is another problem with the code you show. The variable r is created on the first iteration of the loops in the -predict- command. On the next iteration, however, since r already exists, Stata will halt with an error message.

    All in all, rather than fixing these problems, it is easier to take a different approach, based on the -runby- command. It was written by Robert Picard and me and is available from SSC.

    Code:
    capture program drop one_regression
    program define one_regression
        if _N > 10 {
            capture noisily reg part1 part2 part3 part4 part5, noconstant
            if c(rc) == 0 { // REGRESSION WENT OK
                predict r
            }
            else if inlist(c(rc), 2000, 2001) { // NO OR INSUFFICIENT OBSERVATIONS
                gen r = .
            }
            else { // THERE WAS AN UNEXPECTED PROBLEM
                gen comment = "Unexpected error `c(rc)''"
            }
        }
        exit
    end
    
    runby one_regression, by(year industry) status
    This code will create your variable r whenever there are ten observations for a year and industry and the missing values are such that the regression can still run (which means at least 5 observations with no missing values). If there are fewer than 10 observations for the year and industry in the first place, no regression is attempted. If the regression is attempted but fails for insufficient, or no, observations, r is just set to missing value and the program will move on to the next cross-section. If some other error occurred in the regression command, Stata create a variable called comment in the results data set that gives the error code encountered by the regression. (If no unexpected errors are encountered, there will be no variable named comment in the results data set.)

    Note: As you did not provide any example data, this code is untested and may contain typos or other errors.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      I believe your problem is that you expect that
      Code:
      reg part1 part2 part3 part4 part5 if industry== `i' & year==`y' & nobs>10, noconstant
      will skip over the regression if nobs<=10. But that's not true. What your syntax says is to select observations for which
      Code:
      industry== `i' & year==`y' & nobs>10
      for input to the regression. For any combination of industry and year with 10 or fewer observations, nobs will be 10 or less, nobs>10 will be false, the if condition will be false, so precisely zero observations will be selected for the regression, which then reports "no observations".

      With that in mind, my suggested rewrite of your code (before I saw Clyde's, which is more general) was
      Code:
      generate r = .
      forval y = 2009(1)2014 {
      forval i = 1(1)50 {
          count if industry== `i' & year==`y'
          local nobs = r(N)
          di "year = `y' and industry = `i' - `nobs' observations"
          if `nobs'>10 {
              reg part1 part2 part3 part4 part5 if industry== `i' & year==`y', noconstant
              predict temp if industry== `i' & year==`y', resid
              replace r = temp if industry== `i' & year==`y
              drop temp
          }
      }
      }

      Comment


      • #4
        Besies the useful suggestions by Clyde and William, I shall present one additional alternative. This is one of the circumstances where asreg (can be downloaded from SSC) can be extremely useful as the majority of the options needed in these cross-sectional regressions are built-in in asreg. Also, if the dataset is huge, asreg can do the calculations really fast. In the following example, I generate 10000 firm ids, 100 years of data, and 2000 industries. Then create a dummy dataset of dependent and independent variables.

        Code:
        *Install asreg
        ssc install asreg
        
                         *Create dummy data
        *-------------------------------------------------------
        clear
        set obs 10000
        gen id = _n
        expand 100
        bys id : gen year = _n + 2000
        
        * Assign firms to 20 industries
        gen industry = mod(id, 200)
        
        * generate the indepdent variables
        gen X1 = uniform()
        gen X2 = uniform()
        gen X3 = uniform()
        gen X4 = uniform()
        gen X5 = uniform()
        
        * generate the dependent variable
        gen Y = X1+X2+X3+X4+X5+uniform()
        
        *-------------------- End of data creation --------------------
        
        * Estimate the cross-sectional regressions by year and industry without constants
        * and requiring a minimum number of 10 observations
        
        bys year industry: asreg Y X1 X2 X3 X4 X5, noconstant min(10) fit
        
        
        * The output
        sort id year
        
        . list _* in 1/10
        
             +--------------------------------------------------------------------------------------------------------------------+
             | _Nobs         _R2      _adjR2       _b_X1       _b_X2       _b_X3       _b_X4       _b_X5     _fitted   _residuals |
             |--------------------------------------------------------------------------------------------------------------------|
          1. |    50   .99414453   .99349392    1.117555   1.0120311   1.1659788   1.2520295   1.4295816   3.0761385    .13705089 |
          2. |    50   .99133815   .99037572   .99674607   1.1379237   1.0330171   1.3135693   1.4085448   2.7777534    .15558878 |
          3. |    50   .99080417   .98978241   1.1132044   1.0794346   1.2143631   1.2363202   1.2605224   3.5254449   -.27433341 |
          4. |    50   .99186627   .99096252   1.0131388   1.3196789   1.3350055   1.3306511   .97870559    3.352102   -.32389564 |
          5. |    50   .99215726   .99128584   .79051572   1.2737956   1.4860206   1.3597887   1.1724052   2.7097915    .17741647 |
             |--------------------------------------------------------------------------------------------------------------------|
          6. |    50    .9925616   .99173511    1.310976   .93526185   1.4995484   1.1305786   1.1415273   3.4689234    .00873105 |
          7. |    50   .99226795   .99140884   1.0690777   1.0773158   1.3332567   1.1422246   1.4142997   2.3266319    .46817844 |
          8. |    50   .99446031   .99384479   1.1892227   1.3108992   1.1456367     1.02304   1.3413501   2.3013637   -.23703334 |
          9. |    50   .99198082    .9910898   1.3537537    .9732253   1.1805684    1.282457   1.0703961   1.9032206    -.1425427 |
         10. |    50   .99230707    .9914523   .97337861   1.3049502   1.1527862   1.1849222   1.3373371   3.3733523    .06372241 |
             +--------------------------------------------------------------------------------------------------------------------+
        On Stata 15.1, SE, the calculations took 2 seconds

        Please note: Option fit generates two variables. The first variable is _residual, that is equivilant to predict _residuals, res after OLS. The second variable _fitted reports the fitted values.

        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          Here's yet another way to do it. We do more work before looping in order to simplify the loop. Not tested.

          Code:
          generate r = .
          bysort year industry : gen N = sum(!missing(industry, year, part1, part2, part3, part4, part5)) 
          by year industry : gen OK = N[_N] > 10 
          egen g = group(year industry) if OK 
          su g, meanonly 
          
          forval i = 1/`r(max)' {
              reg part1 part2 part3 part4 part5 if g == `i', noconstant
              predict temp if g == `i', resid
              replace r = temp if g == `i' 
          }

          Comment

          Working...
          X