Individual Cross sectional regressions by firm and obtaining residuals... Please help me...

Paul Bahk

Join Date: Jun 2020

Posts: 9
#1

Individual Cross sectional regressions by firm and obtaining residuals... Please help me...

10 Jun 2020, 20:18

Hello Everyone!
I am totally new to STATA, and I have a brief question about running individual cross sectional regression and obtaining residuals.

Basically my data set ONLY has 1 single year. so it is a crosssectional data set. (1300 unique firms)

My regression equation is : reg executivepayavg size roa lev ret mb for mown option salesgrowth netsales

I would like to run cross sectional regression BY EACH FIRM and obtain the error term (residuals) so that is 1300 cross sectional regressions.

I tried using Bysort firmcode: reg executivepayavg size roa lev ret mb for mown option salesgrowth netsales but it didn't work.. Firmcode is the unique firm specific code on my data set.

Can anyone help me out please???? Thank you so much!
Tags: None
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#2

11 Jun 2020, 00:40

If your data are at the firm level you can't run a regression for each firm as that would only give you one observation per regression.

To obtain residuals after a regression you can use predict with the residual option.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

11 Jun 2020, 00:58

Paul:
as Wouter wisely replied, one regression per firm does not make any sense.
Conversely, if you really have cross-sectional data (ie, one observation per firm), you may want to try something along the following lines:

Code:

 sysuse auto.dta
(1978 Automobile Data)

. reg price mpg

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(1, 72)        =     20.26
       Model |   139449474         1   139449474   Prob > F        =    0.0000
    Residual |   495615923        72  6883554.48   R-squared       =    0.2196
-------------+----------------------------------   Adj R-squared   =    0.2087
       Total |   635065396        73  8699525.97   Root MSE        =    2623.7

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -238.8943   53.07669    -4.50   0.000    -344.7008   -133.0879
       _cons |   11253.06   1170.813     9.61   0.000     8919.088    13587.03
------------------------------------------------------------------------------

. predict residual, res

. list make residual if _n<=5

     +---------------------------+
     | make             residual |
     |---------------------------|
  1. | AMC Concord     -1898.385 |
  2. | AMC Pacer       -2442.857 |
  3. | AMC Spirit      -2198.385 |
  4. | Buick Century   -1659.174 |
  5. | Buick Electra    157.3545 |
     +---------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Paul Bahk

Join Date: Jun 2020
Posts: 9

11 Jun 2020, 02:12

Originally posted by Carlo Lazzaro View Post

Code:

 sysuse auto.dta
(1978 Automobile Data)

. reg price mpg

Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(1, 72) = 20.26
Model | 139449474 1 139449474 Prob > F = 0.0000
Residual | 495615923 72 6883554.48 R-squared = 0.2196
-------------+---------------------------------- Adj R-squared = 0.2087
Total | 635065396 73 8699525.97 Root MSE = 2623.7

------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -238.8943 53.07669 -4.50 0.000 -344.7008 -133.0879
_cons | 11253.06 1170.813 9.61 0.000 8919.088 13587.03
------------------------------------------------------------------------------

. predict residual, res

. list make residual if _n<=5

+---------------------------+
| make residual |
|---------------------------|
1. | AMC Concord -1898.385 |
2. | AMC Pacer -2442.857 |
3. | AMC Spirit -2198.385 |
4. | Buick Century -1659.174 |
5. | Buick Electra 157.3545 |
+---------------------------+

.

Thank you so much for your reply Carlo!
You and Wouter are absolutely correct... I made a mistake. It's not a cross sectional data in the firm level with one year observation. It's a panel data. Sorry for my mistake.

So, if I am using a panel data, and I would like to run time series regressions per INDIVIDUAL FIRMS, and obtain residuals, will that be the following code?

Bysort firmcode: reg executivepayavg size roa lev ret mb for mown option salesgrowth netsales
predict residual, res

or is the bysort not necessary for this case???

also can you explain what the code "list make residual if _n<=5" means??

Thank you so much!

Comment

Wouter Wakker

Join Date: Nov 2018

Posts: 621
#5

11 Jun 2020, 02:50

Code:

bysort firmcode: reg executivepayavg size roa lev ret mb for mown option salesgrowth netsales predict residual, res

This would indeed run a regression for each firm, but will not get you the right residuals, since you run predict only once. If you do it in this way, predict will take the estimates of the parameters from the last regression (so only one firm), and predict residuals for all firms based on the estimates from on firm (out of sample prediction).

To get the right residuals you could do this (replace y and x with your actual variables):

Code:

levelsof firmcode, local(firms) gen residuals = . foreach firm of local firms { reg y x if firmcode == `firm' predict resid if firmcode == `firm', r replace residuals = resid if firmcode == `firm' drop resid }

Last edited by Wouter Wakker; 11 Jun 2020, 02:55.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#6

11 Jun 2020, 03:16

Paul:
Wouter has already helpfully replied to your main questions.
Hence I would only explain that:

Code:

list make residual if _n<=5

asks Stata for showing -make- and -residual- for the first 5 observations.

Last edited by Carlo Lazzaro; 11 Jun 2020, 03:19.

Kind regards,
Carlo
(Stata 19.0)
Comment
Paul Bahk

Join Date: Jun 2020

Posts: 9
#7

11 Jun 2020, 07:53

Originally posted by Wouter Wakker View Post

Code:

bysort firmcode: reg executivepayavg size roa lev ret mb for mown option salesgrowth netsales predict residual, res

This would indeed run a regression for each firm, but will not get you the right residuals, since you run predict only once. If you do it in this way, predict will take the estimates of the parameters from the last regression (so only one firm), and predict residuals for all firms based on the estimates from on firm (out of sample prediction).

To get the right residuals you could do this (replace y and x with your actual variables):

Code:

levelsof firmcode, local(firms) gen residuals = . foreach firm of local firms { reg y x if firmcode == `firm' predict resid if firmcode == `firm', r replace residuals = resid if firmcode == `firm' drop resid }

Thank you so much for your reply Wouter!
May I ask just one last question please???

1. On the reg y x part it seems like when I include clustering standard errors the code doesn't run. ex, reg y x, vce (cluster firmcode) == `firm' is this something natural??

2. I first tried the code with a very rough version of my regression equation. reg executive-compensation size
I confirmed that the code works, and then included my original regression equation, which worked fine in my previous analysis.
However, when I include my original equation in the code you've provided, every single variable becomes omitted due to collinearity. does this mean that there was something wrong with my x variables in the first place??/

Thank you!
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#8

11 Jun 2020, 08:16

1. You are already running regressions on one firm at the time only, so there is no point in clustering on firm.

2. It is hard to say what is going on as I cannot see your data or your regression results. One possible explanation is that you your x-variables have between variation but no within variation. In this case pooled OLS will run without problems but doing the regression for one firm at the time is not possible. If this is not the case I would advice to provide an example of your data and the results if your regressions. Please do read the FAQ if you have not already done so, which provides detailed information on how to share example data and Stata output.
Comment
Paul Bahk

Join Date: Jun 2020

Posts: 9
#9

11 Jun 2020, 19:22

Originally posted by Carlo Lazzaro View Post

Paul:
Wouter has already helpfully replied to your main questions.
Hence I would only explain that:

Code:

list make residual if _n<=5

asks Stata for showing -make- and -residual- for the first 5 observations.

Thank you Carlo!
My last question is... if I run the regression analysis and use the code you have provided "predict residuals" the residuals are automatically generated on my data set.
However, there are residuals for EVERY YEAR and by FIRM LEVEL. So for example, if I run 1300 firm individual regressions for the 2011-2012 period, I want 1300 residuals. However, the code is giving me 2600 residuals . (1300 firm x 2 year period = 2600)

Is there a way I can get one residual for each firm? so that is the whole residual for a firm for the 2011-2012 period.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#10

12 Jun 2020, 00:40

Paul:
I do not think that what you're after can be accomplished, as the idiosyncratic error is both time and panel unit dependent.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement