a complicated dummy variable capturing values pertaining to years that are not the year identifier

Mike Kraft

Join Date: Dec 2014

Posts: 328
#1

a complicated dummy variable capturing values pertaining to years that are not the year identifier

17 Jun 2015, 12:07

Dear All

I have data that shows firms required to restate "correct" their financial reports by a regulatory for being misstated before.

The data show that a firm may disclose to the public that it has restated previous years reports in, say 2005, called disclosure_yr in my dataset, but this does not necessarily mean that the 2005 report is a misstated report. The 2005 in my example would be the year when this has been only disclosed. The data show to which years misstatements belong. So in my example, this can be for years from 2001 to 2003, the 2001 year is called begin_year in my dataset and the 2003 is the end_year. All years from 2001 to 2003 were previously misstated. There is a variable called res that is 1 when the firm discloses (in the disclosure_date) that it has restated previously reports (within the being_year to end_year range), and zero otherwise. But the 1 value is assigned in the disclosure_yr

The data looks like:

Firm_id disclosure_yr begin_yr end_yr res

2178 2006 2005 2005 0

2491 2005 2002 2005 1

2491 2007 2004 2006 1

3116 2002 2000 2001 1

...... ...... ......... ........ 0

In the above data, firm id 3116 has disclosed to the public in 2002 that years 2000 and 2001 were previously misstated.

I am struggling with creating a dummy variable that is :
equal to 1 for the misstated years (i.e. years between begin_yr and end_yr) , which might be tracked by res=1 in a later disclosure year, and zero otherwise
In other words, I want to assign a value of 1 for only the years that were previously misreported rather than the year when this issue has been disclosed. Note that res=1 here in the data corresponds with the disclosure-yr.
Then, I want to sort my data by the firm_id and disclosure_yr to be ready for merge with another data set.

I know the direct way for creating a dummy such that
gen dummy=0
replace dummy=1 if res=1

However this will not achieve my target because the res values (1/0) are not assigned on the basis of the misstated years, it is only for the disclosure year which I do not want to.

Any smart ideas ?
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

17 Jun 2015, 16:22

Is this at least a start on what you need?

Code:

clear
input Firm_id    disclosure_yr    begin_yr    end_yr    res
2178    2006    2005    2005    0
2491    2005    2002    2005    1
2491    2007    2004    2006    1
3116    2002    2000    2001    1
end
expand end_yr-begin_yr+1
sort Firm_id disclosure_yr
by Firm_id disclosure_yr: generate res_yr = begin_yr+_n-1 if res
list, clean noobs

Code:

    Firm_id   disclo~r   begin_yr   end_yr   res   res_yr  
       2178       2006       2005     2005     0        .  
       2491       2005       2002     2005     1     2002  
       2491       2005       2002     2005     1     2003  
       2491       2005       2002     2005     1     2004  
       2491       2005       2002     2005     1     2005  
       2491       2007       2004     2006     1     2004  
       2491       2007       2004     2006     1     2005  
       2491       2007       2004     2006     1     2006  
       3116       2002       2000     2001     1     2000  
       3116       2002       2000     2001     1     2001

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#3

18 Jun 2015, 09:50

Dear Willam
Many thanks for your reply.
I run the code, but it appears to be generating a res_yr that is consistent with all years between begin_yr and end_yr except end_yr (i.e. not inclusive of the last period).
I want all years between the begin and end including both the begin and end to be also included.
Do you know how to fix that ?

Thanks
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

18 Jun 2015, 10:14

In the example I posted, the code generated all years between begin_yr and end_yr including end_yr. How do your code and data differ from the code and data I posted? Is it possible you omitted the "+1" at the end of the expand command? Can you create and post a reproducible example like mine that fails?
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#5

18 Jun 2015, 10:36

I attached the data, and my code is below:

use AA_Res2000-2014AccFra.dta,clear
compress
format file_date %d
format res_begin_date %d
format res_end_date %d
gen file_yr=year(file_date) // this is the year when restatement is disclosed to the public
gen yrbegin=year( res_begin_date) // this is the first year restated
gen yrend=year( res_end_date) // this is the last year restated
rename company_fkey cik

destring cik,replace

duplicates tag cik file_yr ,generate(newvariable2)
drop if newvariable2>0
drop newvariable2

expand yrend-yrbegin+1

sort cik file_yr
by cik file_yr: generate res_yr = yrbegin+_n-1 if res_accounting

xtset cik file_yr

************************************************** ************************************************** *****
Now, try to sort by cik and res_yr such that:
sort cik res_yr

and you will see clearly that the end year is not incorporated.

Look forward to hearing from you and all participants
Attached Files

AA_Res2000-2014AccFra.dta (226.3 KB, 1 view)
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

18 Jun 2015, 12:15

Your problem is that, for example, cik 2491 has had two filings with restatements, and your sort leaves the rows for the two filings intermingled, apparently causing you to overlook the observation with the end year you are looking for. If you replace the xtset in the code you supplied (which fails, and causes the do-file to stop) with the following code, you'll see results like those shown, and will see clearly that the end year is incorporated correctly.

Code:

capture xtset cik file_yr

sort cik file_date res_yr
list cik file_date res_yr yrbegin yrend res_yr, noobs sepby(cik file_date)

Code:

  +---------------------------------------------------------+
  |     cik   file_date   res_yr   yrbegin   yrend   res_yr |
  |---------------------------------------------------------|
  |    2178   08mar2006        .      2005    2005        . |
  |---------------------------------------------------------|
  |    2491   03nov2005     2002      2002    2005     2002 |
  |    2491   03nov2005     2003      2002    2005     2003 |
  |    2491   03nov2005     2004      2002    2005     2004 |
  |    2491   03nov2005     2005      2002    2005     2005 |
  |---------------------------------------------------------|
  |    2491   01nov2007     2004      2004    2006     2004 |
  |    2491   01nov2007     2005      2004    2006     2005 |
  |    2491   01nov2007     2006      2004    2006     2006 |
  |---------------------------------------------------------|
  |    3116   02apr2002     2000      2000    2001     2000 |
  |    3116   02apr2002     2001      2000    2001     2001 |
  |---------------------------------------------------------|
  |    3116   31dec2003     2003      2003    2003     2003 |
  |---------------------------------------------------------|
  |    3116   20may2005        .      2005    2005        . |
  |---------------------------------------------------------|
  |    3116   07aug2012     2011      2011    2012     2011 |
  |    3116   07aug2012     2012      2011    2012     2012 |
  |---------------------------------------------------------|
  |    3116   01mar2013     2011      2011    2011     2011 |
  |---------------------------------------------------------|

Comment

Mike Kraft

Join Date: Dec 2014

Posts: 328
#7

18 Jun 2015, 13:45

If I run this code:

use AA_Res2000-2014AccFra.dta,clear
compress
format file_date %d
format res_begin_date %d
format res_end_date %d
gen file_yr=year(file_date) // this is the year when restatement is disclosed to the public
gen yrbegin=year( res_begin_date) // this is the first year restated
gen yrend=year( res_end_date) // this is the last year restated
rename company_fkey cik

destring cik,replace

expand yrend-yrbegin+1

sort cik file_yr
by cik file_yr: generate res_yr = yrbegin+_n-1 if res_accounting | res_fraud // this will generate a res_yr for firms that have res_acc or res_fraud

capture xtset cik file_yr

sort cik file_date res_yr
drop if res_yr<2000 // I restric the sample to firms that start to file in 2000 (I got earlier res_years because those who filed in 2000 would have misreporting years before 2000).

drop if res_yr==. // this will drop firms that did not restate in a year (they have restated because of clerical errors which I do not include here)

Then,
browse if cik=2178

The data shows that this example firm has two restatements filed in 2003 (one which covers 2001 - 2002) and (another covering 2002-2002) ,
however the res_yr variable captures years 2001, 2002 and 2004 . I do not understand why 2004 is out there ?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

18 Jun 2015, 14:27

Your problem is that you assumed I would understand enough accounting to infer that a firm might have multiple disclosures in a single year, and your original example only showed disclosure years, not the disclosure dates you subsequently revealed that might have suggested the possibility. If you now change

Code:

sort cik file_yr by cik file_yr: generate res_yr = yrbegin+_n-1 if res_accounting | res_fraud

to

Code:

sort cik file_date by cik file_date: generate res_yr = yrbegin+_n-1 if res_accounting | res_fraud

things should work as you expect, with the warning that this code will fail if a firm has multiple disclosures on a single date.

You're welcome.
Comment
Mike Kraft

Join Date: Dec 2014

Posts: 328
#9

18 Jun 2015, 14:50

William;
You are brilliant (now in green ) .

Your code works perfectly. Thanks for that !

If I spot something unclear, I will be back. Having said that, it seems to be working very well !

Thank for your time! Much appreciated.
Comment
Yi Zi

Join Date: Aug 2020

Posts: 26
#10

20 Mar 2022, 19:13

Hi, I am wondering if I intend to examine subsequent restatements over a three-year period (until the end of 2020), what should I do based on the above code? Thank you so much!
Comment

Announcement