Estimating discretionary accruals using the modified Jones (1991) - Differences in regression

Martim Pinto

Join Date: Dec 2015

Posts: 13
#1

Estimating discretionary accruals using the modified Jones (1991) - Differences in regression

22 Jan 2016, 17:27

Hello Statalist members

I'm trying to compute the following model

where:
TA = Total Accruals
delta Rev = Sales - Sales_n-1
delta AR = Receivables - Receivables_n-1
PPE = Property, plant and equipment
Assets are all lagged 1 year

I got this code below from the topic http://www.statalist.org/forums/foru...ied-jones-1991.
I'm running it to the 1 SIC code instead of 2 SIC code, since 2 SIC code would take a lot of time (since this one already takes a lot of time).
So, before I ran it:
I adjusted all the variables accordingly to CPI index of 2014.
I winsorized all variables at 1%.
Then I run the code below, where the final variable (-disc_accruals-) give me what I want (discretionary accruals), but I think that the values that I obtain are not what are supposed.
Should I adjust the variables and winsorize only in the final variable (-disc_accruals-) or that will not influence it?

Code:

clear use Wn2014WSvarsLAGGED sort sedol year by sedol: gen delta_rev = Net_Sales - Net_Sales_1 by sedol: gen delta_ar = Net_Receivables - Net_Receivables_1 gen SIC1 = real(substr(sic,1,1)) encode sedol , generate(numsedol) xtset numsedol year gen disc_accruals= . gen TAccruals = (NIncome_Before_Extra_Items - Net_CashFlow_Oper_Acti)/Total_Assets_1 gen x1 = 1/Total_Assets_1 gen x2 = (delta_rev - delta_ar)/Total_Assets_1 gen x3 = ppegross/Total_Assets_1 forvalues j = 1/`=_N' { capture noisily { reg TAccruals x1 x2 x3 if SIC1 == SIC1[`j'] & year == year[`j'] & _n != `j' if e(N) >= 10 { replace disc_accruals= TAccruals - (_b[x1] * x1 + _b[x2] * x2 + _b[x3] * x3) in `j' } } }

After asking a colleague for my problem, he gaves me the code below:
What can I say is that, if I was getting strange values with the code above, with this one I got even more strange values. I see that the code above don't compute if has less than 10 estimable observations for this group (10 One-Digit-SIC per year). The code below don't has any specification to it, so I think that It doesn't do what is supposed (something is missing).

Code:

use Wn2014WSvarsLAGGED sort sedol year by sedol: gen delta_rev = Net_Sales - Net_Sales_1 by sedol: gen delta_ar = Net_Receivables - Net_Receivables_1 by sedol: gen TAccruals = (NIncome_Before_Extra_Items - Net_CashFlow_Oper_Acti)/Total_Assets_1 by sedol: gen C = (delta_rev - delta_rec)/Total_Assets_1 by sedol: gen D = ppegross/Total_Assets_1 sort sedol year *Regression by sedol: reg TAccruals C D *the yhat correspond to the total accruals estimated by the regression predict yhat, xb *the discretionary accruals is the difference between the total accruals reported by the enterprise and the total accruals estimated by the regression gen disc_accruals = TAccruals - yhat

Sorry for the long text, I tried to be the most clear as possible, if its need more information about something please be free to ask for it, I would really appreciate any help since I am completely lost doing it (master thesis in finance)

Best regards
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#2

22 Jan 2016, 18:14

First, a disclaimer: I have no idea what this model means or is about. I'm just looking at the two sets of code and comparing them.

There are a number of differences between them.

1. The regression command in your colleague's code does not contain a term for 1/Total_Assets. It contains only the other two regressors. So it's an entirely different model, and appears to differ from the equation at the top of your post.

2. His regress command is carried out once for each sedol (which is not the same thing as SIC1, is it?) and includes all observations for that sedol, in all years. Your regress command is carried out once for each observation in the data set and uses only those observations with the same SIC1 and year as that observation, but also excludes that observation itself. So it's a very different estimation set.

3. His calculation of disc_accruals is different from yours (and, of the two, I believe yours is more likely to be correct). The problem is that whereas you calculate _b[x1] * x1 + _b[x2] * x2 + _b[x3] * x3 separately for each regression, he first runs all of the regressions with his -by sedol: reg...- command and then runs -predict- just once. The problem with that is that predict is applying only the coefficients from the very last regression to all of the data, regardless of which regression they participated in. It is highly unlikely that his approach is correct--it really doesn't make sense unless there is something very special about the last value of sedol, such that its regression coefficients should be used for all the others.

So those are three very major differences between the two approaches shown.

Reading your code, it looks to me as if it does reflect the equation you are aiming to model. I can criticize several things about the code, but none of them seem to lead to errors:

1. -by sedol:- in front of your -gen- commands accomplishes nothing and can be omitted as nothing in the -gen- commands makes reference to the blocks defined by sedol. For this reason, the -sort sedol year- command is also superfluous.

2. The -encode sedol- command and the subsequent -xtset- command are not needed because you don't use any -xt- commands subsequently (unless they are elsewhere in your code and not shown here.)

I can't comment on the right place for winsorizing in this problem. I never use it myself, and have never heard a convincing reason why it is ever appropriate--but I do realize that it is commonly used in some fields, so I won't pursue that issue here.
1 like
Comment
Martim Pinto

Join Date: Dec 2015

Posts: 13
#3

22 Jan 2016, 19:25

Clyde Schechter thank you so so so much
I really appreciate your help and effort, I admire such wisdom, you post >4 post per day in this forum, you are a genius, kudus for you.
I think that your comment leads me to a problem:

for example, this data:

Country Enterprise SIC Year Assets

A 1 5 2000 10

1 5 15

A 1 5 2002 20

2 6 30

B 2 6 2001 50

2 6 60

Where,
Enterprise is a number to identify the enterprise, it is a unique number, and it has always the same SIC (Standard Industrial Classification) and Country.
Taking into account that the Enterprise number and SIC has always the same Country.

I want to know if it is possible to fill the table, so, the Year would be previously to the last one (or next to the first one), but taking into account the Enterprise (when we are talking about a new Enterprise, it will start over again). Just for information, I don't have a minimum Year, but i have a maximum (which is 2014).
If it was necessary to delete all the data previously to 2000 it would be fine, then just would be necessary to repeat the year from 2000 to 2014 all over the database.

Then I would like to fill the Country, taking into account that each Enterprise always refere to a specific country, how could I do it?

Sorry if it is off topic, it just came to my mind after the solution of Clyde Schechter about my problem.

Thanks once again
Best regards
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#4

22 Jan 2016, 20:59

So to fill in the years, if I understand the situation correctly, you want:

Code:

gen long obs_no = _n // CAPTURE CURRENT SORT ORDER OF THE DATA // REPLACE MISSING YEAR BY PREVIOUS ENTRY PLUS 1 by Enterprise (obs_no), sort: replace Year = Year[_n-1] + 1 if missing(Year) & _n > 1 // NOW REPLACE REMAINING MISSING YEARS BY SUBSEQUENT YEAR - 1 gsort Enterprise -obs_no by Enterprise: replace Year = Year[_n-1] - 1 if missing(Year) & _n > 1

Note: I'm not entirely sure I understood your description of the situation with years, so if the pattern of missing Years is not the way it appears in the example you showed, this may not work. If that happens, please repost some more example data showing a case where this code gets it wrong.

And for the country, where it appears :

Code:

by Enterprise (Country), sort: replace Country = Country[_N]

Note: This assumes that Country is a string variable, so that missing sorts first and non-missing last. If Country is actually a numeric variable with value labels, then replace Country[_N] by Country[1].

By the way, for future reference, the best way to post example data is by using the -dataex- command. If you do not have it already, you can get it by running -ssc install dataex-. Read -help dataex- for how to use it to create code that will replicate selected variables and observations from your data set, and then you post that code on the forum. It makes it very easy for someone else to replicate your data and try out code on it. The problem with posting tables like the ones you did in #3 is that they do not copy-paste correctly into the Stata data editor, so it is extra work to create the data to try out the code.

Last edited by Clyde Schechter; 22 Jan 2016, 21:03.
1 like
Comment
Martim Pinto

Join Date: Dec 2015

Posts: 13
#5

25 Jan 2016, 08:08

Clyde Schechter I'm sorry in late reply, it was due to some tests that I was doing with my data, and I can't say thanks enough for your help, it did exactly what I pretend, also thank you so much for the explanation of each step to help to understand better what is going on.
Thank you very very very much
Big thumbs up for you

It would be nice to implement some kind of donation system, because the knowledge on this field is very rare and it would be a way to keep the people who know it, helping others.

Best regards
Martim Ruben Pinto
Comment

Country	Enterprise	SIC	Year	Assets
A	1	5	2000	10
	1	5		15
A	1	5	2002	20
	2	6		30
B	2	6	2001	50
	2	6		60

Announcement

Estimating discretionary accruals using the modified Jones (1991) - Differences in regression

Comment

Comment

Comment

Comment