How to get the R2 from the regression

Issa Almaharmeh

Join Date: May 2015

Posts: 72
#16

19 May 2015, 10:42

Thank you for your help. what I need is to calculate the R2 for each firm in each year. to calculate the weight variable I need to create a new variable that represent the total market value of each industry in each week. To do so I write this code: by industry_id week , sort: egen industry_mv = total( weeklymv).
is this code OK or should I use another one.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

#17

19 May 2015, 11:15

Your code

Code:

by industry_id week , sort: egen industry_mv = total( weeklymv)

calculates the total market value of all firms in a specific industry in a specific week. Do I understand that you then want to follow this up with

Code:

gen weight = weeklymv/industry_mv

Since you want to calculate the R2 for each firm in each year, you will probably want to use -statsby- to drive your iterated regressions. So rather than the code shown under "NOW DO YOUR REGRESSION" in #15 it will be something more like this:

Code:

gen int year = yofd(date) // CREATE YEAR VARIABLE
statsby e(r2_o), by(firm_id year) saving(r2_by_firm_by_year, replace): ///
    xtreg weekly_return weekly_market_return L1.weekly_market_return ///
    weighted_mean_industry_return L1.weighted_mean_industry_return

// MAYBE DO OTHER THINGS HERE

// WHEN YOU WANT TO FINALIZE THE CALCULATION F SYNCHRONIZATION
use r2_by_firm_by_year, clear
rename _stat_1 r2
gen synch = log(R2/(1-R2))
// ...

Comment

Issa Almaharmeh

Join Date: May 2015

Posts: 72
#18

20 May 2015, 06:01

Hi Clyde
statsby e(r2_o), by(firm_id year) saving(r2_by_firm_by_year, replace): /// xtreg weekly_return weekly_market_return L1.weekly_market_return /// weighted_mean_industry_return L1.weighted_mean_industry_return
I need your help again. I calculate every thing however when I tried to run the above code to calculate R2 an error appear in the screen no observations
an error occurred when statsby executed regress. what do you think cause this error.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#19

20 May 2015, 09:09

Well, it means that for some combination of firm_id and year in your data set, there were no observations that had non-missing values for all of the variables in the regression. You will have to go through your data to find out which firm_ids and years (there may be more than one such combination) are responsible, and why. I'd do something like this:

Code:

by firm_id year, sort: egen obs_count = total(!missing(weekly_return, weekly_market_return /// L1.weekly_market_return weighted_mean_industry_return L1.weighted_mean_industry_return)) tab firm_id year if obs_count == 0
1 like
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#20

21 May 2015, 03:28

Hi Clyde its me again
I tried to solve the problem but it still exist. However when I try to run the regression with just using the following code it work perfectly. could you help me in this please.

statsby e(r2), by(firm_id year industry_id firmname)saving("C:\Users\lza023\Desktop\clyde\r22 .dta", replace): regress weekly_return weekly_market_return weighted_mean_industry_return

I think if we create a new two variables( one for weekly market return and one for weekly industry return) with lagged values we can side step this problem.

Last edited by Issa Almaharmeh; 21 May 2015, 03:32.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#21

26 May 2015, 09:18

Hi
Could you help me on this.
How can I create a new two variables that represent the last week market return and last week industry return. I have tried this code however it incorrect because this code generate a new variable with the previous value even if it from not directly from last week.
gen Lag.weekly_market_return = weekly_market_return[_n-1]
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

#22

05 Aug 2019, 13:01

Hello,

I would like to calculate for the variable LLR and LLP one R-squared per year (a yearly R-squared) for the period 1995-2018.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id LLR LLP)
2007  1  .011630812    .018261097
2017  2  .005206013   .0009052709
2018  2  .005082208   .0004451569
2015  2  .004713984             0
2016  2  .005010014   .0005392081
2014  2  .007704217  .00026062978
2014  3           0             0
2015  3           0             0
2011  4  .010910933    .002482072
2010  4  .014617286      .0030025
2012  4  .011325285  .00013020562
2008  4  .010616484    .003585204
2009  4  .016729187    .016583314
2007  4  .007800257    .001282144
2006  4  .007627713    .000923452
2010  5  .017166357             0
2011  6  .021382164     .01136998
2008  6  .008355642    .003057135
2012  6   .01443208    .004534268
2016  6  .007447635    .001197504
2006  6  .007908663     .00720697
2017  6  .007080396             0
2007  6  .008464329   .0009984137
2005  6  .007931727    .007931727
2013  6  .016162274    .005791808
2014  6  .012107523   .0006828538
end

For the first step I tryed it with the collapse command.

Code:

collapse(mean) Z_score NPA LLR LLP, by (year)

Do you have any advice calculate now the R-squared per year?

Thank you very much!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#23

05 Aug 2019, 13:37

I think you've started down the wrong direction here. If you collapse the data -by(year)- you are left with only one observation per year, so you cannot regress anything separately each year.

If I understand what you want, your need is to do a separate regression of LLR and LLP on each year's data, and you want to keep the resulting values of R² in your data set. The easiest way to do that is:

Code:

rangestat (reg) LLR LLP, by(year) interval(LLR . .)

To do this, you need to install the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

In addition to saving the R² in new variable reg_r2, it will also save the number of observations in the regression, and the constant and LLP coefficients with their standard errors. If you really don't care about anything but the R², you can always just drop the others.

Applying this to your data you will get mostly missing values, because most of your years have only one or two associated observations in the example data, so no regression can be done for them. Presumably that will not be a problem in your real data set.
2 likes
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

#24

06 Aug 2019, 03:35

Thank you very much for the answer.

My model has four proxies, Z_score NPA LLR LLP. I would like to have the R2 of one proxxy among the remaining proxies. If I run the command for the variable Z_score and NPA (sorry, for not being consistent with the var in my example):

Code:

rangestat (reg) Z_score NPA LLR LLP, by(year) interval(Z_score . .)
rangestat (reg) Z_score NPA LLR LLP, by(year) interval(NPA . .)

I'm getting the same R2 results for both proxies - but I would like to have different R2 per proxy per year, as they are different.
The var r2_Z_y and r2_NPA_y are calculated with your recommended code and provide the same R2 for both proxies.

I calculated the var r2Z and r2NPA with the following command:

Code:

rangestat (reg) NPA LLR LLP, by(year) interval(Z_score . .)
rangestat (reg) Z_score LLR LLP, by(year) interval(NPA . .)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id) double(r2_Z_y r2_NPA_y r2Z r2NPA)
2007  1  .04292201416867168  .04292201416867155  .24540329410061493  .012139066731453374
2018  2 .027077205487478023  .02707720548747762 .054693615906753534  .005906411071376012
2014  2  .08566086394481906  .08566086394481863   .1668113368144799   .02088940270461034
2016  2  .04419999596636129 .044199995966361295  .08320280306320736  .002741493673789138
2015  2  .06083513561600463  .06083513561600479  .09546694490679361 .0011404215331245503
2017  2 .009036756095033894 .009036756095033934  .06295331827651317 .0001311581364094393
2015  3  .06083513561600463  .06083513561600479  .09546694490679361 .0011404215331245503
2014  3  .08566086394481906  .08566086394481863   .1668113368144799   .02088940270461034
2012  4  .23470561712117832   .2347056171211783   .3006329192901721    .1158844102105314
2009  4  .44657116404401265   .4465711640440104  .42396466949624134   .33783959978501665
2008  4  .21507077328341231  .21507077328341367   .3730466284194536   .15567357030252635
2011  4  .34858644725672916   .3485864472567291  .37157247935137044   .19935044857404371
2007  4  .04292201416867168  .04292201416867155  .24540329410061493  .012139066731453374
2010  4  .41871550471256525   .4187155047125658   .3986605814540165   .29012966330277556
2006  4 .017296440215681108  .01729644021568104  .11861571059604607 .0035124780274363495
2010  5  .41871550471256525   .4187155047125658   .3986605814540165   .29012966330277556
2017  6 .009036756095033894 .009036756095033934  .06295331827651317 .0001311581364094393
2013  6  .13652960537420314  .13652960537420275   .2479562858147592  .037229684886455235
2007  6  .04292201416867168  .04292201416867155  .24540329410061493  .012139066731453374
2009  6  .44657116404401265   .4465711640440104  .42396466949624134   .33783959978501665
2010  6  .41871550471256525   .4187155047125658   .3986605814540165   .29012966330277556
2016  6  .04419999596636129 .044199995966361295  .08320280306320736  .002741493673789138
2014  6  .08566086394481906  .08566086394481863   .1668113368144799   .02088940270461034
2006  6 .017296440215681108  .01729644021568104  .11861571059604607 .0035124780274363495
2018  6 .027077205487478023  .02707720548747762 .054693615906753534  .005906411071376012
2011  6  .34858644725672916   .3485864472567291  .37157247935137044   .19935044857404371
end

I would like to explain on risk proxy with the remaining risk proxies.

I am just not sure with my code, because the r2NPA are so low and I do know, that the NPA R2 values are higher.

Thank you!

Last edited by Katharina Maier; 06 Aug 2019, 03:37.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#25

06 Aug 2019, 18:21

Part of the difficulty is that you do not understand my code; the other part is that I do not understand what you want.

The -rangestat- command takes some getting used to, particularly how the -interval()- option works. I can see why you would think that changing which variable you give as the first argument in -interval()- would change the results: in most situations it would. But in this case it doesn't because the -interval()- option's second and third arguments are both missing value. This is the, admittedly unintuitive, way to tell -rangestat- to ignore the interval() option. -interval(variable . .) literally means include observations where variable takes on any value whatsoever--i.e. it does nothing. When either the second or third argument is non-missing, then -interval(variable low high)- tells Stata to use all and only observations where low <= variable and variable <= high. In that situation, changing the variable in the first argument would mean doing the calculations on different observations.

It might seem more natural to just omit the -interval()- option altogether when no selection is intended, but the syntax of -rangestat- does not permit that. So you can see now that both of your -rangestat- commands calculate exactly the same thing: they just regress NPA on the rest of the variables, and save the regression results.

Now, here's what I don't understand about what you want. First, I don't understand the term proxies as you are using it. I suspect this is some kind of financial jargon--not my discipline. From what I see statistically they are just variables in regressions. More important, it seems that what you are looking for, in each of your -rangestat- commands is something that is particularly focused on Z_score in one case and on NPA in the other. But I have no clue what statistics you are actually looking for with relation to those variables. Z_score is actually the outcome variable in both regressions, so there really aren't any statistics to be gathered about Z_score itself: it will not have any coefficients, and I really have no clue what you are looking for here. And there is no r2NPA: there is an overall R² for the regression. So I suspect you want something rather different. Perhaps you want to regress Z_score against just NPA, without LLA and LLP and get the R² from that regression. Or maybe you want to do the regression of Z_score aginst NPA LLA and LLP, and then do it again and get the difference in R² between those two regressions? Or maybe something else I haven't thought of?
1 like
Comment

Katharina Maier

Join Date: May 2019
Posts: 29

#26

07 Aug 2019, 13:45

Thank you for the answer and sorry for being so unclear - I am a bit in a rush due to my thesis' deadline ..

The Z_score NPA LLR LLP are variables of my model. May I ask you to have look at the attached paper, page 5, Figure 2 b ? I need the different R-squares of each different risk proxy per year to make the graphic.
Here is my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id LLR LLP NPA Z_score)
2007  1  .011630812    .018261097   .015062087 -1.5716723
2014  2  .007704217  .00026062978    .01757166 -1.6909508
2015  2  .004713984             0    .02558387  -1.714116
2016  2  .005010014   .0005392081    .01931906 -1.9513158
2017  2  .005206013   .0009052709    .01653177  -1.886552
2018  2  .005082208   .0004451569   .012838008 -1.8675916
2014  3           0             0            0  -3.228743
2015  3           0             0            0  -3.197962
2006  4  .007627713    .000923452  .0026934016 -1.8396757
2007  4  .007800257    .001282144  .0011002329 -1.9295822
2008  4  .010616484    .003585204   .013777885  -1.963218
2009  4  .016729187    .016583314    .02593907 -1.9244528
2010  4  .014617286      .0030025   .017214797 -1.9180492
2011  4  .010910933    .002482072   .018298596  -1.586883
2012  4  .011325285  .00013020562    .01927564  -1.660184
2010  5  .017166357             0    .02333609 -1.6759908
2005  6  .007931727    .007931727            0 -2.2908888
2006  6  .007908663     .00720697            0 -1.8879156
2007  6  .008464329   .0009984137    .02762278 -1.6747932
2008  6  .008355642    .003057135   .019206095  -1.449307
2009  6   .02964397     .03421423    .09687161  -.8901772
2010  6  .029940894    .035797488     .1661957  -.1181851
2011  6  .021382164     .01136998    .17405207  -.9856699
2012  6   .01443208    .004534268     .1530683 -1.1408491
2013  6  .016162274    .005791808    .15281764 -1.3834363
2014  6  .012107523   .0006828538    .13720109 -1.4684038
end

Just as example, I did for a former calculation for the full period R2 this:

Code:

xtreg Z_score NPA LLR LLP i.year, fe vce(cluster id)
xtreg NPA Z_score LLR LLP i.year, fe vce(cluster id)
xtreg LLR  NPA Z_score LLP i.year, fe vce(cluster id)
xtreg LLP LLR  NPA Z_score i.year, fe vce(cluster id)

So I am running four regressions and in each regression it is another dependent variable.

This time I need for the Z-score as dependent variable (and NPA, LLR, LLP as explanatory var) the R2 per year;
Then I need the NPA as dependent variable (and Z-score, LLR, LLP as explanatory var) the R2 per year
etc

Goal is to make a graph with the R2 values on the y-axis and the years (1995-2018) on the y-axis.

Thank you for your help - and I hope this time it was clearer

Attached Files

Bank Risk Proxies and the Crisis of 2007_09.pdf (448.6 KB, 1 view)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#27

07 Aug 2019, 14:07

I think I understand what you are looking for now. I think the following code will do it:

Code:

local vbles LLR LLP NPA Z_score levelsof year, local(years) foreach v of varlist `vbles' { local outcome `v' local ind_vars: subinstr local vbles "`v'" "" gen r2_`v' = . foreach y of local years { capture regress `outcome' `ind_vars' if year == `y' if c(rc) == 0 { // SUCCESSFUL REGRESSION replace r2_`v' = e(r2) if year == `y' } else if !inlist(c(rc), 2000, 2001) { // UNEXPECTED REGRESSION ERROR display as error "Unanticipated error when year == `y'" exit c(rc) } } }

Notes.

1. This code does not work with your example data because in your example data no year has enough observations to support a regression with three predictors. Presumably this probably will not occur with your real data. But if the code doesn't work, when you post back, be sure to include a new set of example data that contains a minimum of 6 observations per year.

2. I have written the code anticipating that although most years will have enough data to support the regressions, some won't. When years with too few observations for the regressions are encountered, they will be skipped over and you will not be given any warnings or error messages: you will be able to recognize them because all the r2 variables will have missing values for those years. If, however, any other error arises during the attempt to do regression, you will get an error message and execution will terminate so that you can identify and fix the problem, and then start over.
2 likes
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#28

07 Aug 2019, 15:30

I agree with Clyde's approach--that will get you the R-squared for the regressions of each indicator on the other indicators by year, which is what I understand your guiding paper to be doing. The final step to make a graph would be

Code:

sort year id by year: gen graph = _n==1 line r2_* year if graph

In other words, although the r2 values are being saved for every id in each year, you will only need to plot one point per year.
1 like
Comment
Katharina Maier

Join Date: May 2019

Posts: 29
#29

08 Aug 2019, 02:31

Thank you so much, Clyde and Kye. It worked perfectly! Can't express how thankful I am
During my whole thesis and working first time with Stata - I learned so much from the Statalist members.
I really appreciate your support!

Last edited by Katharina Maier; 08 Aug 2019, 02:36.
Comment

Ayub UOM

Join Date: Feb 2018
Posts: 83

#30

13 Sep 2019, 03:22

Hello Clyde Schechter
i have read the above discussions and tried to calculate my variable SYNCHi (a measure of annual synchronicity for firm i.) In estimating our model we require that daily return data be available for at least 200 trading days in each fiscal year.
But i am confused in the # 6 that how can i generate my date variable?
i want to calculate R2 for each firm in each year, by using daily data. although i have split my date variable (trddt) into 3 parts (month date year) but still i am confused how can i generate year in # 6 above

Code:

 gen year = yofd(date)

This is the format of my data set trddt is string variable which i have split in 3 parts i.e data1 is month ,date2 is day, and date3 is year

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long firm_id str10 trddt float(firm_return market_return) byte(date1 date2) int date3
2 "7/25/2006"  -.006623  .012842  7 25 2006
2 "9/20/2006"   .047026   .00358  9 20 2006
2 "7/12/2006"   .047782 -.000361  7 12 2006
2 "3/7/2006"   -.035316 -.029588  3  7 2006
2 "6/30/2006"         0  -.00171  6 30 2006
2 "3/6/2006"   -.012844 -.002697  3  6 2006
2 "6/5/2006"    .028369  .017466  6  5 2006
2 "12/8/2006"   -.04906  -.03576 12  8 2006
2 "1/16/2006"  -.029213  -.01354  1 16 2006
2 "3/29/2006"   .016897   .00541  3 29 2006
2 "3/1/2006"    .005495  .004738  3  1 2006
2 "1/18/2006"   .022523  .020927  1 18 2006
2 "4/20/2006"  -.023495 -.004301  4 20 2006
2 "8/8/2006"        .02  .032161  8  8 2006
2 "4/13/2006"  -.032984 -.025997  4 13 2006
2 "3/22/2006"   .036395  .011777  3 22 2006
2 "6/19/2006"   .001795  .016827  6 19 2006
2 "5/29/2006"         0  .032838  5 29 2006
2 "6/28/2006"         0 -.000707  6 28 2006
2 "4/12/2006"  -.030523 -.005528  4 12 2006
2 "10/31/2006"   .02375  .009524 10 31 2006
2 "2/24/2006"   .010811   .00931  2 24 2006
2 "12/20/2006"    .0349  .014965 12 20 2006
2 "9/8/2006"    .010432    .0015  9  8 2006
2 "6/7/2006"   -.007156  -.07001  6  7 2006
2 "5/10/2006"  -.027417  .018669  5 10 2006
2 "8/11/2006"   .001709  .004782  8 11 2006
2 "9/7/2006"   -.008863 -.015603  9  7 2006
2 "7/26/2006"  -.011667 -.000868  7 26 2006
2 "12/6/2006"  -.015748 -.013642 12  6 2006
2 "3/24/2006"  -.010017 -.003826  3 24 2006
2 "2/21/2006"   .012456  .019256  2 21 2006
2 "6/13/2006"  -.024436  .004039  6 13 2006
2 "1/17/2006"   .027778   .00371  1 17 2006
2 "12/13/2006" -.002317  .003021 12 13 2006
2 "5/18/2006"   .016447  .005524  5 18 2006
2 "10/12/2006" -.022069  -.00773 10 12 2006
2 "4/7/2006"   -.020086  .007383  4  7 2006
2 "9/22/2006"  -.003906 -.007956  9 22 2006
2 "12/12/2006"   .01251  .002495 12 12 2006
2 "9/11/2006"   .041298  .006474  9 11 2006
2 "10/17/2006"  .012658 -.001869 10 17 2006
2 "12/11/2006"  .099742  .040086 12 11 2006
2 "12/1/2006"   .010017  .008993 12  1 2006
2 "11/29/2006"  .003687   .01667 11 29 2006
2 "8/15/2006"   .049236  .019047  8 15 2006
2 "11/1/2006"  -.007326  .005581 11  1 2006
2 "8/9/2006"     .00713  -.00176  8  9 2006
2 "1/12/2006"   .004338  .017924  1 12 2006
2 "3/20/2006"   .045788  .013699  3 20 2006
end

when i run the below command i just got per firm one value for R2 not per firm per year.
so do i need to mergre this value in my main dataset by using (merge m:1 year code using.......) command or not ? or i am doing some mistakes, because i need per firm per year value of R2 so that I can calculate my DV .

Code:

gen year = yofd( date2 )
statsby e(r2), by(firm_id year) saving(regression_resultsa, replace): regress firm_return market_return
use regression_resultsa, clear
rename _stat_1 r2
gen funny_statistic = log(r2/(1-r2))

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long firm_id float(year r2 funny_statistic)
 2 1960   .4400587  -.2409238
 4 1960   .3260496   -.726107
 5 1960   .4442286 -.22401793
 6 1960   .4421526  -.2324303
 7 1960  .24893935 -1.1042771
 8 1960   .3595278  -.5774141
 9 1960   .4568741  -.1729334
10 1960   .2068534  -1.343998
11 1960   .2573787 -1.0596377
12 1960   .4286664 -.28729412
14 1960   .4194842   -.324891
16 1960   .5378363  .15163514
17 1960 .006158102  -5.083809
18 1960   .3026383  -.8347657
19 1960   .4388093 -.24599595
20 1960  .04612292  -3.029225
21 1960   .5403796  .16187085
22 1960   .5623184  .25057647
23 1960   .4775431 -.08988801
24 1960     .35695 -.58862656
25 1960   .3736776  -.5164719
26 1960   .4445098 -.22287884
27 1960   .5481623   .1932483
28 1960   .3831651  -.4761356
29 1960   .5094335  .03773851
30 1960   .1908997 -1.4441748
31 1960   .4573157  -.1711536
32 1960   .5858448   .3468142
33 1960  .43767145 -.25061777
34 1960  .19768927 -1.4007995
35 1960  .22116867  -1.258869
36 1960   .4440883   -.224586
37 1960  .47868785 -.08530027
38 1960  .16475293 -1.6232806
39 1960    .554401  .21846873
40 1960   .4472794  -.2116692
42 1960   .4545721 -.18221404
43 1960   .4725165 -.11004477
45 1960   .4447936  -.2217294
46 1960  .39308295  -.4343715
48 1960   .4174713  -.3331627
49 1960   .4001679  -.4047656
50 1960  .52243215  .08978887
55 1960    .511385  .04554797
56 1960  .30922085  -.8037644
58 1960     .34058  -.6607108
59 1960  .56699234  .26959038
60 1960  .52493787   .0998343
61 1960   .3612084 -.57012314
62 1960   .4895909 -.04164248
end

thank you in advance for your time

Last edited by Ayub UOM; 13 Sep 2019, 03:46.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment