How to get the R2 from the regression

Issa Almaharmeh

Join Date: May 2015

Posts: 72
#1

How to get the R2 from the regression

14 May 2015, 04:57

Dear All I try to run a regression for each firm with weekly data and I need to use the resulted R2 from the regression for each firms to get the yearly R2 and transform it to log(R2/1-R2). I'm quite new Stata learner. So please any suggestion.
The model is as follow:
Weekly firm return i,t = a + B weekly Market return+ e
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

14 May 2015, 08:43

You don't say how you want to store these statistics once you get them. But here's a start. Let me assume you have variables firm_id, date (weekly), firm_return, and market_return.

Code:

xtset firm_id date statsby e(r2), by(firm_id): regress firm_return market_return preserve use regression_results, clear rename _eq2_stat1 r2 gen funny_statistic = log(r2/(1-r2))

This code will replace the data in memory by a new data set containing the firm_id, and its correspondingR2, and log(R2/(1-R2)). If you want to keep your existing data in memory, then look at the -saving()- option of -statsby-. Also, depending on what else you want to do, it might be a good idea to save _b, also. Anyway, -statsby- is the simplest way to get what you need here. Do read the corresponding section of the manual for more details so you can customize this code.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#3

14 May 2015, 10:32

Sorry Clyde I want to store the yearly average of R2 for each firm. Then transform it by log(r2/1/r2) which will be my dependent variable.
the next thing when I try to xtset the time variable and firm_id this message appear repeated time values within panel. is this will create a problem or no?
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#4

14 May 2015, 10:37

I just checked my time variable and it's format as follow: %tdnn/dd/CCYY is that OK?
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#5

14 May 2015, 10:52

I have run the code however its seems to me that it calculate the pooled R2 for each firm in all the years. I'm sorry if it wasn't clear from the beginning, but I need the r2 value for each firm in each year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

14 May 2015, 14:18

the next thing when I try to xtset the time variable and firm_id this message appear repeated time values within panel. is this will create a problem or no?

Well, it will keep you from -xtset-ting the data, but actually that command wasn't really necessary and still isn't. By itself, it isn't necessarily a problem. But you originally described your data as being "weekly," which would normally be understood to mean one observation per week per firm. If you really have weekly data, then there shouldn't be repeated time values within a panel. So perhaps your data are not what you think they are and you need to investigate or correct them before doing anything else. In particular, it is possible that the observations having the same firm_id and date are duplicates of each other and you need to remove all but one from each set of duplicates. Or it may be that your data set is "contaminated" with extraneous data that crept in some time during its creation. You need to find out. At the end of the day, if the data are correct and there properly are multiple observations with the same firm_id and date, that message is no obstacle to proceeding.

I just checked my time variable and it's format as follow: %tdnn/dd/CCYY is that OK?

As far as getting the R2's is concerned, it doesn't matter what the format is. The format only affects how Stata shows the values of the variable in the output of certain commands (e.g. -list-, -browse-). Now, that format will result in sensible looking displays if your variable itself is a Stata internal daily date. In the code I write below, that needs to be true. So I would -list- or -browse- my date variable to make sure that what you see makes sense as dates in the time period you are analyzing.

I have run the code however its seems to me that it calculate the pooled R2 for each firm in all the years. I'm sorry if it wasn't clear from the beginning, but I need the r2 value for each firm in each year.

Yes, the code I wrote earlier was intended to calculate the pooled R2 for each firm in all the years, which is what I understood you to want. I apparently didn't notice the word "yearly." Sorry about that. So, to do it for each firm in each year is just a small modification of the earlier code:

Code:

gen year = yofd(date) statsby e(r2), by(firm_id year) saving(regression_results, replace): regress firm_return market_return use regression_results, clear rename _eq2_stat1 r2 gen funny_statistic = log(r2/(1-r2))

So the key changes are, -xtset- is gone: it was never actually needed anyway. The first line of the new code calculates the year for each observation. The -statsby- command now has firm_id and year in its -by()- option. I added the -saving(regression_results, replace)- option: with this, -statsby- will not overwrite the memory in data but will save it as a separate data set, which you can then do whatever you like with. -preserve- is gone: that probably shouldn't have been there in the first place, but if you need it you can put it back in.

If you are going to be doing a lot of work with financial data, it is well worth your while to invest time in not just learning, but mastering, the -statsby- command, as well as -rolling-. These seem to come up a great deal in that line of work.
1 like
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#7

14 May 2015, 21:31

Dear Clyde
Thank a lot for your help I will try the code tomorrow morning.
Thanks again for your cooperation.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#8

16 May 2015, 04:00

Hi Clyde I tried the code and it work perfectly. Thank you for this. Another issue may by more complicated than this one is that the firms are located in different industries so the firms with same industry have the same industry ID is there any way to calculate the annual standard deviation of the weekly return for all the firms inside the same industry and print the results near each firm in each year. In addition how can I know the number of firms inside each industry and print it near each firm in each industry. So my data will look like this
firm _id / industry_id / number of firms in the industry / standard deviation of industry weekly return.
these data will be for each firm in each year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

16 May 2015, 10:19

So, I assume you have a variable called weekly_return and you have weekly observations on each firm. Then you can do this:

Code:

// SD OVER A YEAR OF WEEKLY RETURNS gen year = yofd(date) egen sd_weekly_return = sd(weekly_return), by(industry_id year) // NUMBER OF FIRMS PER INDUSTRY by industry_id year firm_id (date), sort: gen firm_count = 1 if _n == 1 by industry_id year: replace firm_count = sum(firm_count) by industry_id year: replace firm_count = firm_count[_N]

Note: It is unclear to me whether the number of firms in an industry can change from year to year in your data. The above code assumes it can.

This will generate variables in your data set. I don't know what you have in mind when you refer to printing.

If you are going to be using Stata with any frequency and seriousness, you should master the material in the manual sections on -by- and -egen-. They are indispensable data management tools.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#10

18 May 2015, 03:44

Thank you Clyde for your help. your comments and codes are very helpful.
I just wonder if there is any code to drop any firm off my sample\year if the firm's weekly return equal zero for 7 weeks or more during any year. In other words for each firm to be included in the sample\ year it should be traded at least 45 weeks during the year, so if the weekly firm's return is equal zero for any 7 weeks during the year that firms should be doped out of the sample in that year.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#11

18 May 2015, 08:47

Code:

egen zero_return_weeks = total(weekly_return == 0), by(firm year) drop if zero_return_weeks >= 7

This assumes that 7 weeks during any year means 7 weeks during any calendar year, not 7 weeks during any 365 day interval.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#12

18 May 2015, 10:17

Hi

I tried the first part of the code however it show an error
firm ambiguous abbreviation
(error in option by())
r(111);
I have a column represent the firms weekly return, and some firms have a weekly return equal zero.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#13

18 May 2015, 10:41

sorry I think I sort it, the end of the first part of the code need a slight adjustment as follows : by(firm_id year). thank you very much for your cooperation and priceless help. you cannot imagine how you make me very happy, really I cannot find any words to thank you.
Comment
Issa Almaharmeh

Join Date: May 2015

Posts: 72
#14

18 May 2015, 22:47

Hi
thank you again for your help. I have another model which need a complex code. The model is as follow:
weekly firm i return = a+ B1 weekly market return+B2 last week market return+B3 industry return+B4 last week industry return+e.

I have the weekly market return and the weekly firm return. I need to calculate the weekly industry return which is the weighted average return for all the firm with the same industry excluding firm i return ( the weight is based on market value, and I have the weekly market value for each firm) . Another condition on the regression is that any firm that traded less than 45 weeks(meaning if it has 0 return for more 7 weeks) should be excluded from the regression. I need to use the resulted R2 from the previous regression and transform it to what we called stock price synchronicity which is the part of stock return that cannot be explained by market return and industry return. So SYNCH for firm i in year t = Log(R2/1-R2).
Any suggestions please
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

19 May 2015, 09:21

So let's say that you have a variable, industry_id that denotes the industry, and firm_id identifying firms. You have a variable, called date, which is a weekly date. You have a variable called weight that you are using to weight your industry average. Your weekly return variable is called weekly_return

Code:

xtset firm_id date // CALCULATE WEIGHTED MEAN EXCLUDING INDEX FIRM by industry_id week, sort: egen weighted_total_industry_return = total(weekly_return*weight) by industry_id week: egen sum_of_weights = total(weight) // BY DIVIDING WEIGHTED TOTAL BY SUM OF WEIGHTS // AFTER SUBTRACTING OUT THE FIRM'S CURRENT WEEKLY RETURN AND WEIGHT gen weighted_mean_industry_return= /// (total_industry_return - weekly_return*weight)/(sum_of_weights - weight) // EXCLUDE FIRMS WITH 0 RETURN FOR 7 OR MORE WEEKS by firm_id, sort: egen weeks_of_zero_return = total(weekly_return == 0) drop if weeks_of_zero_return > 7 // NOW DO YOUR REGRESSION xtreg weekly_return weekly_market_return L1.weekly_market_return /// weighted_mean_industry_return L1.weighted_mean_industry_return local synch = log(e(r2_o)/(1-e(r2_o))) display as text "SYNCH = " `synch'

Note: I could not tell from your explanation whether you intended to do this as a single regression over all industries and firms (which is what the code above does) or whether you want to do this separately for each industry. If the latter, you can wrap the regression in -statsby-.

I also could not tell what kind of regression you want to do. You have panel data and you need to use an appropriate model that accounts for the within-firm dependence of observations. What I wrote above is a random effects regression. But you may prefer fixed-effects. And, in either case, you might also want to use the vce(cluster firm_id) option. But I will leave those decisions to you, as I don't know this subject matter well enough to offer an informed opinion.

Also, your explanation did not say which R2 statistic you are interested in: with panel regressions there are at least three. There is the within-cluster R2, the between cluster R2, and the overall R2. The above code uses the overall R2. You may want to change that to one of the others depending on what this is all about. See the help for -xtreg- for how to access those from e().
Comment

Announcement

How to get the R2 from the regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment