Cross-sectional regression for each ind and year

Nour Kh

Join Date: Dec 2018

Posts: 4
#1

Cross-sectional regression for each ind and year

16 Dec 2018, 15:54

Dear All,

I need to estimate a model using a cross-sectional regression for each industry and year over the period 2009 to 2014. The data is a panel data.
I used the following code:

bysort year industry: gen nobs = _N
forval y = 2009(1)2014 {
forval i = 1(1)50 {
di "year = `y' and industry = `i'"
reg part1 part2 part3 part4 part5 if industry== `i' & year==`y' & nobs>10, noconstant
predict r if industry== `i' & year==`y' & nobs>10, resid
}
}

However, every time I run this regression I have an error (although I have few missing observations) :
no observations
r(2000);

I would really appreciate any advice regarding the code.
Many thanks,
Nour
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

16 Dec 2018, 18:39

Even if you don't have a lot of missing data, since an observation is omitted from a regression whenever any of its variables has a missing value, you can depopulate the estimation sample of a regression very quickly with missing values that are scattered over multiple observations. One approach is to -capture- the regression, verify that any error code generated came from the no observations (or insufficient observations) problem and move on.

There is another problem with the code you show. The variable r is created on the first iteration of the loops in the -predict- command. On the next iteration, however, since r already exists, Stata will halt with an error message.

All in all, rather than fixing these problems, it is easier to take a different approach, based on the -runby- command. It was written by Robert Picard and me and is available from SSC.

Code:

capture program drop one_regression program define one_regression if _N > 10 { capture noisily reg part1 part2 part3 part4 part5, noconstant if c(rc) == 0 { // REGRESSION WENT OK predict r } else if inlist(c(rc), 2000, 2001) { // NO OR INSUFFICIENT OBSERVATIONS gen r = . } else { // THERE WAS AN UNEXPECTED PROBLEM gen comment = "Unexpected error `c(rc)''" } } exit end runby one_regression, by(year industry) status

This code will create your variable r whenever there are ten observations for a year and industry and the missing values are such that the regression can still run (which means at least 5 observations with no missing values). If there are fewer than 10 observations for the year and industry in the first place, no regression is attempted. If the regression is attempted but fails for insufficient, or no, observations, r is just set to missing value and the program will move on to the next cross-section. If some other error occurred in the regression command, Stata create a variable called comment in the results data set that gives the error code encountered by the regression. (If no unexpected errors are encountered, there will be no variable named comment in the results data set.)

Note: As you did not provide any example data, this code is untested and may contain typos or other errors.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

16 Dec 2018, 18:52

I believe your problem is that you expect that

Code:

reg part1 part2 part3 part4 part5 if industry== `i' & year==`y' & nobs>10, noconstant

will skip over the regression if nobs<=10. But that's not true. What your syntax says is to select observations for which

Code:

industry== `i' & year==`y' & nobs>10

for input to the regression. For any combination of industry and year with 10 or fewer observations, nobs will be 10 or less, nobs>10 will be false, the if condition will be false, so precisely zero observations will be selected for the regression, which then reports "no observations".

With that in mind, my suggested rewrite of your code (before I saw Clyde's, which is more general) was

Code:

generate r = . forval y = 2009(1)2014 { forval i = 1(1)50 { count if industry== `i' & year==`y' local nobs = r(N) di "year = `y' and industry = `i' - `nobs' observations" if `nobs'>10 { reg part1 part2 part3 part4 part5 if industry== `i' & year==`y', noconstant predict temp if industry== `i' & year==`y', resid replace r = temp if industry== `i' & year==`y drop temp } } }
Comment

Attaullah Shah

Join Date: Aug 2014
Posts: 1669

17 Dec 2018, 08:23

Besies the useful suggestions by Clyde and William, I shall present one additional alternative. This is one of the circumstances where asreg (can be downloaded from SSC) can be extremely useful as the majority of the options needed in these cross-sectional regressions are built-in in asreg. Also, if the dataset is huge, asreg can do the calculations really fast. In the following example, I generate 10000 firm ids, 100 years of data, and 2000 industries. Then create a dummy dataset of dependent and independent variables.

Code:

*Install asreg
ssc install asreg

                 *Create dummy data
*-------------------------------------------------------
clear
set obs 10000
gen id = _n
expand 100
bys id : gen year = _n + 2000

* Assign firms to 20 industries
gen industry = mod(id, 200)

* generate the indepdent variables
gen X1 = uniform()
gen X2 = uniform()
gen X3 = uniform()
gen X4 = uniform()
gen X5 = uniform()

* generate the dependent variable
gen Y = X1+X2+X3+X4+X5+uniform()

*-------------------- End of data creation --------------------

* Estimate the cross-sectional regressions by year and industry without constants
* and requiring a minimum number of 10 observations

bys year industry: asreg Y X1 X2 X3 X4 X5, noconstant min(10) fit


* The output
sort id year

. list _* in 1/10

     +--------------------------------------------------------------------------------------------------------------------+
     | _Nobs         _R2      _adjR2       _b_X1       _b_X2       _b_X3       _b_X4       _b_X5     _fitted   _residuals |
     |--------------------------------------------------------------------------------------------------------------------|
  1. |    50   .99414453   .99349392    1.117555   1.0120311   1.1659788   1.2520295   1.4295816   3.0761385    .13705089 |
  2. |    50   .99133815   .99037572   .99674607   1.1379237   1.0330171   1.3135693   1.4085448   2.7777534    .15558878 |
  3. |    50   .99080417   .98978241   1.1132044   1.0794346   1.2143631   1.2363202   1.2605224   3.5254449   -.27433341 |
  4. |    50   .99186627   .99096252   1.0131388   1.3196789   1.3350055   1.3306511   .97870559    3.352102   -.32389564 |
  5. |    50   .99215726   .99128584   .79051572   1.2737956   1.4860206   1.3597887   1.1724052   2.7097915    .17741647 |
     |--------------------------------------------------------------------------------------------------------------------|
  6. |    50    .9925616   .99173511    1.310976   .93526185   1.4995484   1.1305786   1.1415273   3.4689234    .00873105 |
  7. |    50   .99226795   .99140884   1.0690777   1.0773158   1.3332567   1.1422246   1.4142997   2.3266319    .46817844 |
  8. |    50   .99446031   .99384479   1.1892227   1.3108992   1.1456367     1.02304   1.3413501   2.3013637   -.23703334 |
  9. |    50   .99198082    .9910898   1.3537537    .9732253   1.1805684    1.282457   1.0703961   1.9032206    -.1425427 |
 10. |    50   .99230707    .9914523   .97337861   1.3049502   1.1527862   1.1849222   1.3373371   3.3733523    .06372241 |
     +--------------------------------------------------------------------------------------------------------------------+

On Stata 15.1, SE, the calculations took 2 seconds

Please note: Option fit generates two variables. The first variable is _residual, that is equivilant to predict _residuals, res after OLS. The second variable _fitted reports the fitted values.

Regards
--------------------------------------------------
Attaullah Shah, PhD.
Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
FinTechProfessor.com
https://asdocx.com
Check out my asdoc program, which sends outputs to MS Word.
For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

18 Dec 2018, 01:31

Here's yet another way to do it. We do more work before looping in order to simplify the loop. Not tested.

Code:

generate r = .
bysort year industry : gen N = sum(!missing(industry, year, part1, part2, part3, part4, part5)) 
by year industry : gen OK = N[_N] > 10 
egen g = group(year industry) if OK 
su g, meanonly 

forval i = 1/`r(max)' {
    reg part1 part2 part3 part4 part5 if g == `i', noconstant
    predict temp if g == `i', resid
    replace r = temp if g == `i' 
}

Announcement