Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cross-sectional regression

    Dear all,
    I would like to get the residual of cross-sectional regression for every industry-year combination. In particular, I would like to run a regression of cash flow on assets and sales for each industry-year combination. After getting the coefficients including constant, I want to plug them to get the predicted cash flow and use the difference between actual cash flow and predicted cash flow (residuals). Following is my data and I ran the following commands.

    Code:
    input str1 firm float (cashflow assets sales) int year str1 industry
    "a" 100 500 300 1991 1
    "a" 125 550 410 1992 1
    "a" 129 550 350 1993 1
    "a" 118 450 216 1994 1
    "a" 96 600 175 1995 1
    "b" 350 1500 600 1991 1
    "b" 560 1675 850 1992 1
    "b" 730 1300 755 1993 1
    "b" 900 1800 1065 1994 1
    "b" 1050 2000 1800 1995 1
    "c"  60 120 155 1991 2
    "c"  -10  120 180 1992 2
    "c"  50 160 168 1993 2
    "c"  200 150 260 1994 2
    "c"  -60 140 200 1995 2
    "d" 155  230 200 1991 2
    "d" 255 398 400 1992 2
    "d" 179 398 268 1993 2
    "d" 196 423 318 1994 2
    "d" 165 300 215 1995 2
    end
    Code:
    encode firm,gen(id)
    *for setting the panel      
    xtset id year
    *for saving the data
    save "C:\Users\vishnu\Desktop\demo.dta"
    *running the regression as year-industry combination
    statsby, by( year industry ) : regress cashflow assets sales                
    *since the above regression replaces the old data with coefficients , I need to merge the results of regression with old data
    merge 1:m industry year using "C:\Users\vishnu\Desktop\demo.dta."
    * plugging coefficients to get predicted values              
    gen predicted_cashflow=_b_cons+(_b_assets*assets)+(_b_sales*sales)
    *calculating the residual        
    gen residual cashflow=cashflow-predicted_cashflow
    Question 1 , What is the simple procedure to get the residuals with the industry-year combination in a set up like above? My codes are very inefficient.
    Question 2, How to put additional restriction regarding the minimum no: of observations required to run the regression(say, for every industry-year combination with minimum 3 observations.

    If my question is vague or ambiguous, please let me know.
    Expecting the help of forum in this issue.

    Last edited by lal mohan kumar; 20 Mar 2020, 00:12.

  • #2
    Dear all,
    I am extremely sorry to post my question again. My post couldn't elicit any response and I really don't know what went wrong with my post(I have posted both, an example data set and codes that I ran). Let me explain my question further.
    I would like to get the residual of cross-sectional regression for every industry-year combination. In particular, I would like to run a regression of cash flow on assets and sales for each industry-year combination. For instance, in the year 2005 , I have 10 companies belonging to 2 industries, say 5 companies in the mining and 5 companies in the pharmaceutical industry. In the year 2005, I need to run a regression for 10 companies for these 2 industries(mining and pharmaceutical). Since there are only 2 industries, each company in their respective industry will have a common beta.
    After getting the coefficients including constant, I want to plug them to get the predicted cash flow and use the difference between actual cash flow and predicted cash flow (residuals). I have posted a sample data and codes I ran in my previous post(above this chat). I am not posting it again to avoid verbatim


    Question 1, What is the simple procedure to get the residuals with the industry-year combination in a set up like above?
    Question 2, How to put additional restriction regarding the minimum no: of observations required to run the regression(say, for every industry-year combination with minimum 3 observations).

    Once again sorry for posting the same question. If my question is still vague or ambiguous, please let me know.
    Expecting the help of forum in this issue.
    Last edited by lal mohan kumar; 20 Mar 2020, 10:14.

    Comment


    • #3
      Hi lal mohan kumar
      Not sure about an efficient code, but here is a more efficient code, but perhaps you can adapt the code below for your purposes:
      Code:
      webuse nlswork, clear
      marksample touse
      markout `touse' ind_code occ_code ln_wage grade union tenure hours
      drop if `touse'!=1
      
      levelsof ind_code , local(ind)
      levelsof occ_code , local(occ)
      gen glob_res=.
      foreach i of local ind {
          foreach o of local occ {
              qui:count if ind_code==`i' & occ_code==`o'
              if r(N)>100 {
                  capture drop res
                  reg ln_wage grade union tenure hours if ind_code==`i' & occ_code==`o'
                  predict res, res
                  sum res if ind_code==`i' & occ_code==`o'
                  replace glob_res=res if ind_code==`i' & occ_code==`o'
              }
          }
      }
      HTH
      Fernando

      Comment


      • #4
        Thanks, Fernando! I tried to use the code you suggested to use in my sample data set but mean of my residuals are showing bizarre values
        Let me replicate what I have done.

        Code:
         input str1 firm float (cashflow assets sales) int year float industry "a" 100 500 300 1991 1 "a" 125 550 410 1992 1 "a" 129 550 350 1993 1 "a" 118 450 216 1994 1 "a" 96 600 175 1995 1 "b" 350 1500 600 1991 1 "b" 560 1675 850 1992 1 "b" 730 1300 755 1993 1 "b" 900 1800 1065 1994 1 "b" 1050 2000 1800 1995 1 "c"  60 120 155 1991 2 "c"  -10  120 180 1992 2 "c"  50 160 168 1993 2 "c"  200 150 260 1994 2 "c"  -60 140 200 1995 2 "d" 155  230 200 1991 2 "d" 255 398 400 1992 2 "d" 179 398 268 1993 2 "d" 196 423 318 1994 2 "d" 165 300 215 1995 2 end
        In the above data set, I have tagged industry as "float". Then I ran the following commands
        Code:
         marksample touse markout `touse' cashflow assets sales year industry drop if `touse'!=1 levelsof year , local(year) levelsof industry, local(industry) gen glob_res=. foreach i of local year { foreach o of local industry { qui:count if year==`i' & industry==`o' if r(N)>0 { capture drop res reg cashflow assets sales if year==`i' & industry==`o' predict res, res sum res if year==`i' & industry==`o' replace glob_res=res if year==`i' & industry==`o' } } }
        In the above codes I presume that,
        Code:
        if r(N)>0 
        stands for minimum number of observations should be greater than 0. I ran the above commands and descriptivr statistics of the residual indicates mean= -415.38. I think this value is incorrect since expected mean of residual should be 0. Have I made any mistake ? Where am I incorrect
        Last edited by lal mohan kumar; 20 Mar 2020, 11:52.

        Comment


        • #5
          your code got mixed up so i cant really copy to try replicatting it.
          But, question, In my code i have

          sum res if year==`i' & industry==`o'

          When you do that (from running the code), is the mean zero?

          Comment


          • #6
            Dear Fernando
            Extremely sorry for the late reply. I was doing the calculations.
            sum res if year==`i' & industry==`o' When you do that (from running the code), is the mean zero?
            Yes. The problem of mean of "res" not equal to 0 is when I use the command
            Code:
            summarize res
            Let me replicate what I have done so far. This time I am pasting the sample data and codes I ran in the plain format so that they don't get mixed up.
            firm cashflow assets sales year industry
            a 100 500 300 1991 1
            a 125 550 410 1992 1
            a 129 550 350 1993 1
            a 118 450 216 1994 1
            a 96 600 175 1995 1
            b 350 1500 600 1991 1
            b 560 1675 850 1992 1
            b 730 1300 755 1993 1
            b 900 1800 1065 1994 1
            b 1050 2000 1800 1995 1
            c 60 120 155 1991 2
            c -10 120 180 1992 2
            c 50 160 168 1993 2
            c 200 150 260 1994 2
            c -60 140 200 1995 2
            d 155 230 200 1991 2
            d 255 398 400 1992 2
            d 179 398 268 1993 2
            d 196 423 318 1994 2
            d 165 300 215 1995 2
            marksample touse
            markout `touse' cashflow assets sales year industry
            drop if `touse'!=1
            levelsof year , local(year)
            levelsof industry, local(industry)
            gen glob_res=.
            foreach i of local year {
            foreach o of local industry {
            qui:count if year==`i' & industry==`o'
            if r(N)>0 {
            capture drop res
            reg cashflow assets sales if year==`i' & industry==`o'
            predict res, res
            sum res if year==`i' & industry==`o'
            replace glob_res=res if year==`i' & industry==`o'
            }
            }
            }

            After running this code my table is as follows
            firm cashflow assets sales year industry glob_res res
            a 100 500 300 1991 1 2.84E-14 -346.25
            a 125 550 410 1992 1 8.53E-14 -391.563
            a 129 550 350 1993 1 5.68E-14 -387.563
            a 118 450 216 1994 1 -5.68E-14 -257.938
            a 96 600 175 1995 1 1.28E-13 -490.875
            b 350 1500 600 1991 1 0 -1502.5
            b 560 1675 850 1992 1 0 -1538.59
            b 730 1300 755 1993 1 -1.14E-13 -841.25
            b 900 1800 1065 1994 1 -1.14E-13 -1374.38
            b 1050 2000 1800 1995 1 -2.27E-13 -1505.63
            c 60 120 155 1991 2 -1.42E-14 148.125
            c -10 120 180 1992 2 0 78.125
            c 50 160 168 1993 2 -1.42E-14 81.875
            c 200 150 260 1994 2 0 245.9375
            c -60 140 200 1995 2 5.68E-14 5.68E-14
            d 155 230 200 1991 2 0 88.4375
            d 255 398 400 1992 2 -2.84E-14 -47.8125
            d 179 398 268 1993 2 0 -123.813
            d 196 423 318 1994 2 0 -141.969
            d 165 300 215 1995 2 0 0

            Questions
            1. Are my residuals in the column headed "res" correct? In my subsequent analysis, I use these residuals as my dependent variable. Thus can I say for the year 1991, for the firm "a" in the industry 1 has a residual of -346.25!
            2. Why the average of column "res" is not approximately equal to 0?
            3. My question regarding industry and year combination was about adding the minimum number of observations, does the code if r(N)>0 ,implies minimum number of observations should be greater than 0?
            4.What is the difference between glob_res and res?

            I know I have asked a lot. I am sorry for that. I am forced to ask since I couldn't answer these questions myself

            Comment

            Working...
            X