Cross-sectional regression

lal mohan kumar

Join Date: May 2019
Posts: 265

Cross-sectional regression

19 Mar 2020, 23:58

Dear all,
I would like to get the residual of cross-sectional regression for every industry-year combination. In particular, I would like to run a regression of cash flow on assets and sales for each industry-year combination. After getting the coefficients including constant, I want to plug them to get the predicted cash flow and use the difference between actual cash flow and predicted cash flow (residuals). Following is my data and I ran the following commands.

Code:

input str1 firm float (cashflow assets sales) int year str1 industry
"a" 100 500 300 1991 1
"a" 125 550 410 1992 1
"a" 129 550 350 1993 1
"a" 118 450 216 1994 1
"a" 96 600 175 1995 1
"b" 350 1500 600 1991 1
"b" 560 1675 850 1992 1
"b" 730 1300 755 1993 1
"b" 900 1800 1065 1994 1
"b" 1050 2000 1800 1995 1
"c"  60 120 155 1991 2
"c"  -10  120 180 1992 2
"c"  50 160 168 1993 2
"c"  200 150 260 1994 2
"c"  -60 140 200 1995 2
"d" 155  230 200 1991 2
"d" 255 398 400 1992 2
"d" 179 398 268 1993 2
"d" 196 423 318 1994 2
"d" 165 300 215 1995 2
end

Code:

encode firm,gen(id)
*for setting the panel      
xtset id year
*for saving the data
save "C:\Users\vishnu\Desktop\demo.dta"
*running the regression as year-industry combination
statsby, by( year industry ) : regress cashflow assets sales                
*since the above regression replaces the old data with coefficients , I need to merge the results of regression with old data
merge 1:m industry year using "C:\Users\vishnu\Desktop\demo.dta."
* plugging coefficients to get predicted values              
gen predicted_cashflow=_b_cons+(_b_assets*assets)+(_b_sales*sales)
*calculating the residual        
gen residual cashflow=cashflow-predicted_cashflow

Question 1 , What is the simple procedure to get the residuals with the industry-year combination in a set up like above? My codes are very inefficient.
Question 2, How to put additional restriction regarding the minimum no: of observations required to run the regression(say, for every industry-year combination with minimum 3 observations.

If my question is vague or ambiguous, please let me know.
Expecting the help of forum in this issue.

Last edited by lal mohan kumar; 20 Mar 2020, 00:12.

Tags: None

lal mohan kumar

Join Date: May 2019

Posts: 265
#2

20 Mar 2020, 10:03

Dear all,
I am extremely sorry to post my question again. My post couldn't elicit any response and I really don't know what went wrong with my post(I have posted both, an example data set and codes that I ran). Let me explain my question further.
I would like to get the residual of cross-sectional regression for every industry-year combination. In particular, I would like to run a regression of cash flow on assets and sales for each industry-year combination. For instance, in the year 2005 , I have 10 companies belonging to 2 industries, say 5 companies in the mining and 5 companies in the pharmaceutical industry. In the year 2005, I need to run a regression for 10 companies for these 2 industries(mining and pharmaceutical). Since there are only 2 industries, each company in their respective industry will have a common beta.
After getting the coefficients including constant, I want to plug them to get the predicted cash flow and use the difference between actual cash flow and predicted cash flow (residuals). I have posted a sample data and codes I ran in my previous post(above this chat). I am not posting it again to avoid verbatim

Question 1, What is the simple procedure to get the residuals with the industry-year combination in a set up like above?
Question 2, How to put additional restriction regarding the minimum no: of observations required to run the regression(say, for every industry-year combination with minimum 3 observations).

Once again sorry for posting the same question. If my question is still vague or ambiguous, please let me know.
Expecting the help of forum in this issue.

Last edited by lal mohan kumar; 20 Mar 2020, 10:14.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2429

20 Mar 2020, 10:14

Hi lal mohan kumar
Not sure about an efficient code, but here is a more efficient code, but perhaps you can adapt the code below for your purposes:

Code:

webuse nlswork, clear
marksample touse
markout `touse' ind_code occ_code ln_wage grade union tenure hours
drop if `touse'!=1

levelsof ind_code , local(ind)
levelsof occ_code , local(occ)
gen glob_res=.
foreach i of local ind {
    foreach o of local occ {
        qui:count if ind_code==`i' & occ_code==`o'
        if r(N)>100 {
            capture drop res
            reg ln_wage grade union tenure hours if ind_code==`i' & occ_code==`o'
            predict res, res
            sum res if ind_code==`i' & occ_code==`o'
            replace glob_res=res if ind_code==`i' & occ_code==`o'
        }
    }
}

HTH
Fernando

Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

20 Mar 2020, 11:48

Thanks, Fernando! I tried to use the code you suggested to use in my sample data set but mean of my residuals are showing bizarre values
Let me replicate what I have done.

Code:

 input str1 firm float (cashflow assets sales) int year float industry "a" 100 500 300 1991 1 "a" 125 550 410 1992 1 "a" 129 550 350 1993 1 "a" 118 450 216 1994 1 "a" 96 600 175 1995 1 "b" 350 1500 600 1991 1 "b" 560 1675 850 1992 1 "b" 730 1300 755 1993 1 "b" 900 1800 1065 1994 1 "b" 1050 2000 1800 1995 1 "c"  60 120 155 1991 2 "c"  -10  120 180 1992 2 "c"  50 160 168 1993 2 "c"  200 150 260 1994 2 "c"  -60 140 200 1995 2 "d" 155  230 200 1991 2 "d" 255 398 400 1992 2 "d" 179 398 268 1993 2 "d" 196 423 318 1994 2 "d" 165 300 215 1995 2 end

In the above data set, I have tagged industry as "float". Then I ran the following commands

Code:

 marksample touse markout `touse' cashflow assets sales year industry drop if `touse'!=1 levelsof year , local(year) levelsof industry, local(industry) gen glob_res=. foreach i of local year { foreach o of local industry { qui:count if year==`i' & industry==`o' if r(N)>0 { capture drop res reg cashflow assets sales if year==`i' & industry==`o' predict res, res sum res if year==`i' & industry==`o' replace glob_res=res if year==`i' & industry==`o' } } }

In the above codes I presume that,

Code:

if r(N)>0

stands for minimum number of observations should be greater than 0. I ran the above commands and descriptivr statistics of the residual indicates mean= -415.38. I think this value is incorrect since expected mean of residual should be 0. Have I made any mistake ? Where am I incorrect

Last edited by lal mohan kumar; 20 Mar 2020, 11:52.

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2429
#5

20 Mar 2020, 12:04

your code got mixed up so i cant really copy to try replicatting it.
But, question, In my code i have

sum res if year==`i' & industry==`o'

When you do that (from running the code), is the mean zero?
Comment

lal mohan kumar

Join Date: May 2019
Posts: 265

21 Mar 2020, 00:42

Dear Fernando
Extremely sorry for the late reply. I was doing the calculations.

sum res if year==`i' & industry==`o' When you do that (from running the code), is the mean zero?

Yes. The problem of mean of "res" not equal to 0 is when I use the command

Code:

summarize res

Let me replicate what I have done so far. This time I am pasting the sample data and codes I ran in the plain format so that they don't get mixed up.

firm	cashflow	assets	sales	year	industry
a	100	500	300	1991	1
a	125	550	410	1992	1
a	129	550	350	1993	1
a	118	450	216	1994	1
a	96	600	175	1995	1
b	350	1500	600	1991	1
b	560	1675	850	1992	1
b	730	1300	755	1993	1
b	900	1800	1065	1994	1
b	1050	2000	1800	1995	1
c	60	120	155	1991	2
c	-10	120	180	1992	2
c	50	160	168	1993	2
c	200	150	260	1994	2
c	-60	140	200	1995	2
d	155	230	200	1991	2
d	255	398	400	1992	2
d	179	398	268	1993	2
d	196	423	318	1994	2
d	165	300	215	1995	2

marksample touse
markout `touse' cashflow assets sales year industry
drop if `touse'!=1
levelsof year , local(year)
levelsof industry, local(industry)
gen glob_res=.
foreach i of local year {
foreach o of local industry {
qui:count if year==`i' & industry==`o'
if r(N)>0 {
capture drop res
reg cashflow assets sales if year==`i' & industry==`o'
predict res, res
sum res if year==`i' & industry==`o'
replace glob_res=res if year==`i' & industry==`o'
}
}
}

After running this code my table is as follows

firm	cashflow	assets	sales	year	industry	glob_res	res
a	100	500	300	1991	1	2.84E-14	-346.25
a	125	550	410	1992	1	8.53E-14	-391.563
a	129	550	350	1993	1	5.68E-14	-387.563
a	118	450	216	1994	1	-5.68E-14	-257.938
a	96	600	175	1995	1	1.28E-13	-490.875
b	350	1500	600	1991	1	0	-1502.5
b	560	1675	850	1992	1	0	-1538.59
b	730	1300	755	1993	1	-1.14E-13	-841.25
b	900	1800	1065	1994	1	-1.14E-13	-1374.38
b	1050	2000	1800	1995	1	-2.27E-13	-1505.63
c	60	120	155	1991	2	-1.42E-14	148.125
c	-10	120	180	1992	2	0	78.125
c	50	160	168	1993	2	-1.42E-14	81.875
c	200	150	260	1994	2	0	245.9375
c	-60	140	200	1995	2	5.68E-14	5.68E-14
d	155	230	200	1991	2	0	88.4375
d	255	398	400	1992	2	-2.84E-14	-47.8125
d	179	398	268	1993	2	0	-123.813
d	196	423	318	1994	2	0	-141.969
d	165	300	215	1995	2	0	0

Questions
1. Are my residuals in the column headed "res" correct? In my subsequent analysis, I use these residuals as my dependent variable. Thus can I say for the year 1991, for the firm "a" in the industry 1 has a residual of -346.25!
2. Why the average of column "res" is not approximately equal to 0?
3. My question regarding industry and year combination was about adding the minimum number of observations, does the code if r(N)>0 ,implies minimum number of observations should be greater than 0?
4.What is the difference between glob_res and res?

I know I have asked a lot. I am sorry for that. I am forced to ask since I couldn't answer these questions myself

Announcement

Cross-sectional regression

Comment

Comment

Comment

Comment

Comment