Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating LRETR (long-run cash effective tax rate)

    Hi Statalisters,

    I have another question that is bothering me for quite some time and i would love to finish that puzzle in my head. I would like to calculate the variable LRETR (long-run cash effective tax rate) and im not getting the results that im hoping for.

    I have searched the forum and got similar posts, but they do not seem that different to my approach:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1600054-long-term-effective-tax-rate-over-5-years

    https://www.statalist.org/forums/for...ctive-tax-rate


    The variable is from the Paper of An, Zhang 2013 and is defined as follows:

    "LRETR is the ratio of the sum of the cash tax paid during the last 3 years to the sum of the difference between pretax income and special items during the last 3 years"

    What i do is the following:

    Code:
    rangestat (sum) txpd, interval(fyear -2 0) by(gvkey)            
    label var txpd_sum "Sum of the cash tax paid during the last 3 years"
    
    bys gvkey fyear: gen diff_pi_spi = pi-spi    
    label var diff_pi_spi "Difference between pretax income and special items" 
    
    
    rangestat (sum) diff_pi_spi, interval(fyear -2 0) by(gvkey)                
    label var diff_pi_spi_sum "Sum of the difference between pretax income and special items during the last 3 years"
    
    bys gvkey fyear: gen LRETR = txpd_sum / diff_pi_spi_sum
    label var LRETR "long-run cash effective tax rate LRETR"
    The variables i used should be the right ones, because the authors copied the variable from another paper, where the original authors exactly say which compustat items they used (and my dataset is from compustat)

    ->The problem i have is, that the results (see stats and regression) are not consistent with the literature and I know that this is normal to a certain degree, but =0 is a bit extreme and also the high SD makes me wondering.

    I would be very happy if you could enlighten me. I have probably overlooked or misinterpreted something!

    I also know that i have the same problem with the DTURN variable, but thats a problem for another time and maybe a seperate post

    if there is anything else I can provide please let me know.


    Here is a sample from my compustat annual dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long gvkey int(fyear year) double(txpd pi spi)
    185319 2011 2012   3.597       .97     -.997
     18582 2018 2018  -1.045     3.277         0
    110801 2001 2002    .046    -4.668    -3.925
      8768 1999 1999  14.742    59.714    59.734
    163987 2010 2010    .891     -1.34     -1.34
     28236 2013 2013     .73   -10.332     -.564
     62285 1999 1999  15.812    38.132     38.17
    148255 2001 2001      15    -161.7     -46.6
    160642 2006 2006    .092   -24.877    -4.179
    133304 2008 2008    .245    -5.459    -2.914
    100699 2011 2012 375.743 -2893.204 -1964.008
    177446 2012 2012    .132     3.016     -.212
     30231 2003 2004    .312    20.457     -.163
      6529 2010 2010    29.3    45.392   -58.885
    105089 2015 2015 432.509   200.174   -49.142
      2829 2017 2017    .349        .4      .021
    147307 2014 2014       0    -9.084    -5.114
    108804 2017 2017  27.914  -125.914    -20.28
      3607 2010 2010       6      -550      -520
     65911 2009 2009     .23   -16.482   -14.354
    142909 2002 2002   1.764    -6.537    -1.282
    175055 2018 2018  12.397   -26.273   -24.458
    133432 2009 2009   9.899    -3.641    -1.298
     10622 2001 2002  416.09 -2813.887 -1571.083
      7601 2002 2002   2.173     -1.47         0
     65570 2008 2008    .352     16.79    -3.654
     12852 2007 2007    .054   -12.207       -12
    170527 2016 2016    73.4     -30.1    -102.7
      3946 2017 2017    78.2    -175.7    -142.6
     23777 2000 2000   1.756   -11.762    -3.649
     13354 2005 2005  31.639   -18.096     4.971
    160661 2007 2007  13.762   -15.444   -36.344
      9812 2009 2009    .236    38.445    41.843
     12877 1999 1999    .263    -2.445     -2.44
    119034 2005 2005  55.881     9.811         0
     65459 2008 2008  16.999  -941.942  -906.271
    164755 2010 2010  11.048    80.792    -3.839
    160949 2007 2007  30.321   -64.261  -118.217
     62694 2004 2004    .001    -1.077      -.61
     63938 2008 2009   1.267    10.549         0
    157755 2011 2011  -2.994     5.572     -1.89
     60931 2000 2000   5.027   -65.254         0
    117781 2012 2012   7.103   -50.621   -14.121
    175342 2012 2012    .493    -89.81   -38.641
     27816 2002 2002  12.474   -34.187   -45.817
     63724 1999 1999    27.6    -5.795    -5.094
     10658 2005 2005    .354    -2.756         0
    126718 2017 2017   2.131     -2.14    -4.187
    118401 2010 2010   5.872   -24.014   -10.913
    106687 2009 2009    .039      -.61     -.163
     13163 2007 2008    .243     8.093      .113
     15050 2004 2005    .951    -6.334         0
     15274 2000 2000       .  -156.727  -190.882
     27845 2016 2016  3692.7     -2832   -2541.8
    218399 2016 2017  93.073   334.848   -63.118
     64853 2009 2010  20.219    84.834    -3.547
     64630 2006 2006   9.875  -184.482    -36.35
     14145 2003 2003    .546     -.636         0
    112413 2008 2008   5.057   -60.039   -58.062
     12904 2001 2002   -.445   -22.334   -15.067
     10443 2002 2002      27        28         7
    125074 2018 2018   6.415    14.798      .442
     27490 2018 2018  86.605     30.57   225.982
     64103 2010 2010   3.988    11.443    -4.735
     26560 2016 2016   440.7     549.4      52.7
     63800 2013 2013       8       -48       -38
      3366 2000 2000    .819     1.045         0
     12658 2007 2007    .909      .555         0
     19113 1998 1999 115.207     -4.38         0
      9548 2000 2000     133       299       157
     27925 2008 2008     5.5  -195.514   -157.26
     13645 2003 2003   -.861      .972         0
     31005 2018 2018  22.356  -208.283  -224.671
     12708 2006 2006   2.416     -.942    -3.155
     65275 2012 2012  12.823    11.752       -.9
     27490 2015 2015  90.163    -4.408     -.616
     25972 2002 2002   5.914     9.885     1.913
     31099 2011 2011   -.247     1.053         0
    174390 2007 2007     .92    -2.702    -1.415
     66059 2006 2006    .248     -8.95    -1.435
    178796 2009 2009   1.721    -43.06         0
     24171 2000 2001     4.7  -167.421      -110
    101973 2017 2017    .586     3.184         0
     18672 2013 2013   3.036    54.287    57.366
      6809 1999 1999    .669      .131      .163
    100590 2018 2018 718.996  3759.844         .
    185194 2016 2016    .548   -16.882     -.554
     27794 2017 2017   8.494    -4.664      .553
      2061 2005 2005       0      8.07    -1.619
    176238 2011 2011   -.852   -86.768        .3
     11162 2006 2006    .876   -11.328    -2.021
    145083 2000 2000    .612     2.321         0
     26892 2016 2016       0    -6.668         0
      2581 2005 2005    .005      .455         0
     65899 2007 2007  36.615    55.987    73.409
     17934 2012 2012   1.626    34.767    -4.221
    105442 2005 2005   1.014     4.855         0
     13003 2003 2003  19.067    27.866    -4.765
     62396 2011 2011   4.235    14.333    11.171
     10053 2002 2002      16         9        14
    end
    My Statistics:
    Click image for larger version

Name:	stat.jpg
Views:	1
Size:	11.3 KB
ID:	1739026


    Stats from the original Paper:
    Click image for larger version

Name:	StatKim2011a.JPG
Views:	1
Size:	22.0 KB
ID:	1739027






    My REG (you see LRETR is 0)
    Click image for larger version

Name:	T7.JPG
Views:	1
Size:	171.9 KB
ID:	1739025



    The results from the Paper that i try to replicate:
    Click image for larger version

Name:	T7AnZhang.JPG
Views:	1
Size:	101.8 KB
ID:	1739024




  • #2
    I don't know if the example data you give is representative of your larger data set. If so, it exhibits some pathology that may underlie your questionable results. If not, you will need to post back with a better example. The pathology is that while you are taking sums over three year periods, in the data you show, there is not even a single gvkey for which 3 years worth of data are present. You are actually just looking at 1 year's worth of data in most situations. There are even a couple of gvkeys where you have no data at all because in the only year where the gvkey is instantiated, the value of txpd or spi is missing.

    If this is not what's going on in your real data set, then please post back with a more representative example. I would do something like this:
    Code:
    sort gvkey year
    dataex
    That way you will show extended data on some gvkeys over time. Then we can see if, even with a fully representative data example, perhaps there are enough missing years, or years with missing values for txpd or spi, that you still are not really calculating three-year sums.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I don't know if the example data you give is representative of your larger data set. If so, it exhibits some pathology that may underlie your questionable results. If not, you will need to post back with a better example. The pathology is that while you are taking sums over three year periods, in the data you show, there is not even a single gvkey for which 3 years worth of data are present. You are actually just looking at 1 year's worth of data in most situations. There are even a couple of gvkeys where you have no data at all because in the only year where the gvkey is instantiated, the value of txpd or spi is missing.

      If this is not what's going on in your real data set, then please post back with a more representative example. I would do something like this:
      Code:
      sort gvkey year
      dataex
      That way you will show extended data on some gvkeys over time. Then we can see if, even with a fully representative data example, perhaps there are enough missing years, or years with missing values for txpd or spi, that you still are not really calculating three-year sums.
      Hi Clyde,
      thanks for your quick reply. My data example is a bit stupidly chosen, please excuse me for not thinking that the small section is unusable.
      I have sorted by gvkey and fyear and now set the dataex count to 500.
      I hope that is enough, if not i will be happy to provide more example data.
      And you are right, there are quite some missing values in my dataset. Ive looked it up:
      My full sample is an annual dataset of 142.000 observations for the US from 2000 to 2017.
      The variables have the following missing value numbers in the compustat annual sample:
      txpd = 36.570
      txpd_sum = 0
      pi = 814
      spi = 2.310
      diff_pi_spi = 2.311
      diff_pi_spi_sum = 0
      LRETR = 1.011

      When i later merge this dataset with my other Datasets and make the final regression and statistic (as i posted above), than there are the following missing values:
      Whole Dataset: 52.000 obs
      txpd = 7,703
      txpd_sum = 0
      pi = 21
      spi = 611
      diff_pi_spi = 611
      diff_pi_spi_sum = 0
      LRETR = 55

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long gvkey int(fyear year) double(txpd pi spi)
      1004 1998 1999      4.4    59.786         0
      1004 1999 2000     11.3    49.526         0
      1004 2000 2001      3.2     20.22      -5.4
      1004 2001 2002     1.96   -98.229       -86
      1004 2002 2003     3.46    -19.49     -5.36
      1004 2003 2004      .74     1.707    -2.033
      1004 2004 2005     .591    21.639     2.162
      1004 2005 2006    1.303    45.496    -3.893
      1004 2006 2007    1.948    87.421      .633
      1004 2007 2008   11.412   116.088     -.205
      1004 2008 2009   29.106   119.909     4.183
      1004 2009 2010   30.149    64.188    -4.302
      1004 2010 2011    9.812   108.503    -1.536
      1004 2011 2012   11.418    93.509   -13.864
      1004 2012 2013     24.1      82.2     -21.1
      1004 2013 2014     17.3     105.3         0
      1004 2014 2015    105.6       -83     -48.4
      1004 2015 2016     35.7      59.3       -.4
      1004 2016 2017        .      74.3       2.6
      1004 2017 2018       17      77.2         0
      1010 1999 1999     27.9     126.1         0
      1010 2000 2000     17.3     119.9         0
      1010 2001 2001     -3.1       161         0
      1010 2002 2002      2.1     125.5      56.2
      1010 2003 2003      3.2     382.9     361.5
      1013 1999 1999    7.563   160.383  -148.977
      1013 2000 2000    218.5    1460.4     925.1
      1013 2001 2001    -24.1   -1920.7   -1645.5
      1013 2002 2002   -259.4    -882.2    -561.6
      1013 2003 2003   -142.7     -82.1     -43.5
      1013 2004 2004      1.2      33.2      -8.8
      1013 2005 2005        9      92.7     -10.2
      1013 2006 2006      5.4      56.5     -22.4
      1013 2007 2007     12.9     116.6      28.2
      1013 2008 2008      3.5     -38.2     -29.2
      1013 2009 2009      4.5    -468.8    -452.1
      1013 2010 2010      6.7      85.6      -3.6
      1019 1999 1999    1.338     2.053         0
      1019 2000 2000    1.651     3.217         0
      1019 2001 2001    1.508     2.507         0
      1019 2007 2007     .861     2.649      .028
      1019 2008 2008    2.075     4.369         .
      1019 2009 2009    1.614      3.79     -.008
      1019 2010 2010     .952      1.76         0
      1019 2011 2011     .326     2.325         0
      1019 2012 2012    1.543     3.902         0
      1019 2013 2013    1.415     2.804         0
      1019 2014 2014     .999      2.59         0
      1019 2015 2015    1.004     1.019         0
      1019 2016 2016     .481      .673         0
      1019 2017 2017     .603      .389         0
      1019 2018 2018     .492     1.714         .
      1021 1999 1999     .003     -2.34      -.75
      1021 2000 2000     .028     -.803       -.3
      1021 2001 2001      .02    -1.725     -.906
      1021 2002 2002     .016      .098      .035
      1021 2003 2003      .02     -.207         0
      1021 2004 2004     .014     1.297         0
      1021 2005 2005     .025     1.229     -.325
      1021 2006 2006      .01     1.002         0
      1021 2007 2007     .183    -4.473     -3.76
      1021 2008 2008     .183    -8.931    -5.616
      1034 1999 1999   11.766    57.628    -1.175
      1034 2000 2000    19.11    75.684    -5.647
      1034 2001 2001   20.845   -35.061   -84.692
      1034 2002 2002    3.116  -156.282  -219.278
      1034 2003 2003    2.935    19.251   -32.927
      1034 2004 2004   11.564  -253.598   -282.81
      1034 2005 2005   20.293    43.778    -9.173
      1034 2006 2006   64.439    92.509   -19.856
      1034 2007 2007   -1.939     9.351     6.228
      1036 1999 1999    2.224   136.061   -23.698
      1036 2000 2000   37.096    95.322   -56.853
      1037 1998 1999     .027    -3.452         0
      1037 1999 2000        0    -1.121         0
      1037 2000 2001        0     1.176         0
      1037 2001 2002     .325      .484     -.154
      1038 1998 1999    3.267   -26.516         0
      1038 1999 2000   -9.531   -81.247       -12
      1038 2000 2001   -6.583  -135.826         0
      1038 2001 2002   -3.579   -10.868    -3.754
      1038 2002 2003   -9.757   -19.802         0
      1038 2003 2004     3.88     4.117   -28.927
      1043 1999 1999        .     -.126         0
      1045 1999 1999       71      1006         .
      1045 2000 2000       49      1287        98
      1045 2001 2001      -28     -2756      -565
      1045 2002 2002     -646     -3860      -708
      1045 2003 2003     -575     -1308       202
      1045 2004 2004        3      -761       135
      1045 2005 2005        7      -861      -195
      1045 2006 2006        1       231      -102
      1045 2007 2007        7       504        74
      1045 2008 2008      -14     -2071      -884
      1045 2009 2009        6     -1752      -259
      1045 2010 2010      -32      -506         .
      1045 2011 2011        1     -1979      -886
      1045 2012 2012        6     -2445     -2595
      1045 2013 2013       12     -2180     -3402
      1045 2014 2014        7      3212      -956
      1045 2015 2015       20      4616     -1674
      1045 2016 2016       16      4299      -772
      1045 2017 2017       20      3084      -734
      1045 2018 2018       18      1884      -802
      1050 1999 1999     .504      .296         .
      1050 2000 2000     .254    -1.032         0
      1050 2001 2001     .673     -.141      -1.3
      1050 2002 2002     .335     -.339         .
      1050 2003 2003    -.183    -1.034      .213
      1050 2004 2004     -.33    -1.176         0
      1050 2005 2005      .03      .212     -.173
      1050 2006 2006      .35     4.862         0
      1050 2007 2007    5.244    10.666      -.74
      1050 2008 2008    -.429     7.636         0
      1050 2009 2009    2.433   -17.911    -17.11
      1050 2010 2010    -.247     3.676      .396
      1050 2011 2011    4.793    11.723         0
      1050 2012 2012    2.637    15.363     -.141
      1050 2013 2013    2.237     6.455   -12.024
      1050 2014 2014    8.665    16.214    -2.669
      1050 2015 2015     5.08    -3.096   -23.702
      1050 2016 2016    6.415   -32.964   -62.878
      1050 2017 2017    3.048     1.409    -3.953
      1050 2018 2018   10.205     2.497     -4.06
      1056 1999 1999      3.3    16.907      -3.5
      1056 2000 2000        2      20.6     -.107
      1056 2001 2001     .227    32.672      -2.5
      1056 2002 2002      1.1   -13.731   -21.176
      1056 2003 2003      1.4    12.883      -1.1
      1056 2004 2004   12.498    32.921     -4.22
      1056 2005 2005   13.991    33.021    -6.098
      1056 2006 2006     25.7    45.317    -3.214
      1072 1998 1999   26.084    60.742         0
      1072 1999 2000   44.785   232.126       5.4
      1072 2000 2001   210.27    841.26         0
      1072 2001 2002  103.748    -7.006   -60.146
      1072 2002 2003   26.231   -10.438        -3
      1072 2003 2004   14.375  -131.671  -114.933
      1072 2004 2005   25.755    88.044       2.8
      1072 2005 2006   31.774    119.62         0
      1072 2006 2007   47.346   217.566    -1.537
      1072 2007 2008   58.167   203.365    -2.811
      1072 2008 2009   51.169    97.139   -32.776
      1072 2009 2010   31.181   176.357     1.613
      1072 2010 2011   81.505   334.259    -8.575
      1072 2011 2012   91.709   179.905      -100
      1072 2012 2013    43.12  -113.322   -266.25
      1072 2013 2014   27.514   163.357         0
      1072 2014 2015   56.389   218.599         0
      1072 2015 2016   22.919   132.152   -45.318
      1072 2016 2017   55.642   174.949    -36.89
      1072 2017 2018   66.354   192.313      -2.7
      1078 1999 1999  882.957  3404.888    -184.2
      1078 2000 2000 1085.083  3824.407   138.507
      1078 2001 2001  984.079  1900.148   -1643.9
      1078 2002 2002 1032.287  3691.413    -621.7
      1078 2003 2003  897.354  3745.417   -842.24
      1078 2004 2004  675.729    4136.6  -566.006
      1078 2005 2005  746.504   4628.92  -473.131
      1078 2006 2006 1281.711   2284.37     -2742
      1078 2007 2007  951.648  4478.648      -945
      1078 2008 2008      772  5863.286  -626.008
      1078 2009 2009      635  7200.774    449.13
      1078 2010 2010      810  5721.834   -2055.2
      1078 2011 2011 1781.602  5208.642   -3435.5
      1078 2012 2012 1366.581  6274.614 -3155.973
      1078 2013 2013     1039      2534       -45
      1078 2014 2014      448      2531      -346
      1078 2015 2015      631      3200      -224
      1078 2016 2016      620      1433     -2067
      1078 2017 2017      570      2256     -1043
      1078 2018 2018      740      2873      -832
      1081 1999 1999        .      -374      -305
      1081 2000 2000       71       574       -22
      1081 2001 2001       27       455         .
      1081 2002 2002       27      -221       -10
      1081 2003 2003      -10        21      -155
      1081 2004 2004       15      -261      -371
      1081 2005 2005      -40      -595      -440
      1081 2006 2006       -4         6       223
      1082 1998 1999     .119     -.964         0
      1082 1999 2000    -.251     3.864         0
      1082 2000 2001      .25      .621         0
      1082 2001 2002     .203    -1.525         0
      1082 2002 2003    -.352    -2.759         0
      1082 2003 2004    -.186    -7.078         0
      1082 2004 2005     .058    -2.437     -.218
      1082 2005 2006     .142     -.331         0
      1082 2006 2007    -.021    -1.536     -.155
      1082 2007 2008        0     -1.96     1.459
      1082 2008 2009        0    -7.155    -1.874
      1082 2009 2010        0    -4.351         0
      1082 2010 2011     .002    -2.967         0
      1084 1999 1999        .     -3.34         0
      1084 2000 2000        .     -8.92         0
      1084 2001 2001        0    -3.285     -.262
      1084 2002 2002        0     -.063         0
      1084 2003 2003        0     -.179      .003
      1084 2004 2004        0      .147      .331
      1084 2005 2005        0      .083       .25
      1084 2006 2006        0     -.126      .078
      1084 2007 2007        0     2.062     2.741
      1084 2008 2008        0     -.794         0
      1084 2009 2009        0     -.727         0
      1084 2010 2010        0     -.596         0
      1084 2011 2011        0    -1.695         0
      1084 2012 2012        0    -1.664         0
      1084 2013 2013        0    -4.128         0
      1084 2014 2014        0     -.983     -.007
      1084 2015 2015        0    -4.037    -2.313
      1084 2016 2016        0    -1.133     -.246
      1084 2017 2017        0    -2.747     -.005
      1084 2018 2018        0     1.754         0
      1094 1999 1999    3.841      9.75         0
      1094 2000 2000     3.92    10.231         0
      1094 2001 2001    3.642     6.786     -.994
      1094 2002 2002    1.405     7.378         0
      1094 2003 2003    4.234    13.366         0
      1094 2004 2004    2.875    17.434         0
      1094 2005 2005     .195    12.802     -.619
      1094 2006 2006     .797    13.319     -.658
      1094 2007 2007    3.822    15.493         0
      1094 2008 2008    3.127    22.314         0
      1094 2009 2009    6.505    12.759       .55
      1094 2010 2010    4.666    10.203    -4.661
      1094 2011 2011    8.307    16.962     -1.06
      1094 2012 2012    9.402     24.74    -1.645
      1094 2013 2013   11.054     34.55    -3.244
      1094 2014 2014   14.645    44.674    -3.244
      1094 2015 2015   25.459    53.865     2.506
      1094 2016 2016   16.076    53.854     1.074
      1094 2017 2017    7.912    17.361   -13.872
      1094 2018 2018    5.272  -292.822  -257.825
      1095 1999 1999      .67    13.826    27.874
      1095 2000 2000   70.669   254.891   277.828
      1095 2001 2001   35.824   -61.237    -6.344
      1099 1999 1999     .801     4.475         0
      1099 2000 2000    1.605     1.587    -2.869
      1104 1999 1999     .046     -.183         0
      1104 2000 2000      .07     1.096         0
      1104 2001 2001     .011     1.712     -.398
      1104 2002 2002     .025      .097     -.555
      1104 2003 2003     .851     2.342     -.175
      1104 2004 2004    1.391      5.34     -.085
      1104 2005 2005    1.716     4.766      -1.5
      1104 2006 2006     2.18     6.349         0
      1104 2007 2007    2.819     6.303         0
      1104 2008 2008      1.7     6.676      .265
      1104 2009 2009    1.334     3.453       .46
      1104 2010 2010     .465     2.911        .1
      1104 2011 2011    1.293     4.026     -.125
      1104 2012 2012     .832     4.997         0
      1104 2013 2013    1.821     5.494         0
      1104 2014 2014    1.422     6.803       -.3
      1104 2015 2015    2.412     6.614       -.4
      1104 2016 2016     1.78     7.497         0
      1104 2017 2017     1.57     6.493         0
      1104 2018 2018     .888     5.531         0
      1107 1999 1999    2.755    94.395         0
      1107 2000 2000    1.095    80.519         .
      1107 2001 2001     .807    76.487         .
      1107 2002 2002     .374    92.492         0
      1109 1999 1999      .01      -.43         0
      1109 2000 2000        0      .199         0
      1109 2001 2001     .066     -.618         0
      1109 2002 2002        0    -1.325         0
      1109 2003 2003        0     -.524         0
      1109 2004 2004        0     -.082         0
      1109 2005 2005        0     -.021         0
      1109 2006 2006        0    -2.715         0
      1109 2007 2007        0     1.255     3.234
      1109 2008 2008     .026    -3.908         0
      1111 1998 1999    2.814    24.215       -.6
      1111 1999 2000    6.333   -38.736     -70.2
      1111 2000 2001    6.753    32.544         0
      1111 2001 2002    3.041     83.12         0
      1111 2002 2003    5.491   103.407        -2
      1111 2003 2004   10.463   115.992         0
      1111 2004 2005   12.178   197.663         0
      1111 2005 2006    4.698    45.856         0
      1111 2006 2007    3.677   109.825         0
      1111 2007 2008   48.393   530.868         0
      1115 1999 1999     .971   -43.665   -35.115
      1115 2000 2000        .    13.741    -10.34
      1117 1999 1999        0    -2.294         0
      1117 2000 2000        0    -1.162      .984
      1117 2001 2001        .      .122         0
      1117 2002 2002        .    -3.631    -1.601
      1117 2003 2003        0      .881      -.14
      1117 2004 2004     .013      1.66         0
      1117 2005 2005     .045     5.527         0
      1117 2006 2006     .182     5.925         0
      1117 2007 2007     .142     2.817     -.047
      1117 2008 2008     .038    -2.912         0
      1117 2009 2009      .05     3.454         0
      1117 2010 2010     .011     -.653         0
      1117 2011 2011        0     -.766         0
      1117 2012 2012     .092     2.881         0
      1117 2013 2013        0     1.657         0
      1117 2014 2014        0     2.523         0
      1117 2015 2015     .025     1.386         0
      1117 2016 2016      .05     4.272         0
      1117 2017 2017        0    -4.785    -5.457
      1117 2018 2018        0     -.472         0
      1121 1999 1999     .895      9.07         0
      1121 2000 2000    2.814    13.922         0
      1121 2001 2001     .322    -7.062         0
      1121 2002 2002   -2.314     2.216         0
      1121 2003 2003    1.353     9.482         0
      1121 2004 2004    2.651     13.92         0
      1121 2005 2005    8.655    25.352     6.281
      1121 2006 2006    4.941    15.773         0
      1121 2007 2007    9.134    25.514         0
      1121 2008 2008    3.768   -10.558         0
      1121 2009 2009    -.848     6.501         0
      1121 2010 2010     .532    12.701         0
      1121 2011 2011    5.597    35.762         0
      1121 2012 2012    12.65    44.229         0
      1121 2013 2013    9.949     34.55         0
      1121 2014 2014    8.169      9.78         0
      1121 2015 2015    6.197    -2.045         0
      1121 2016 2016    2.589     5.204    -1.732
      1121 2017 2017     .459    -3.427    -6.005
      1121 2018 2018     .787     3.454       .61
      1154 1999 1999     .015     -.635         0
      1154 2000 2000        0      .337         0
      1154 2001 2001     .008      .423         0
      1154 2002 2002     .788     2.839         0
      1161 1999 1999   15.466    78.414   393.829
      1161 2000 2000   46.009  1262.938   345.899
      1161 2001 2001    68.22   -75.044  -118.175
      1161 2002 2002  -14.853 -1258.426  -376.125
      1161 2003 2003   -7.309  -316.315    13.893
      1161 2004 2004    33.55    78.331   -51.456
      1161 2005 2005   39.875     33.69  -125.681
      1161 2006 2006       17      -115      -452
      1161 2007 2007       26     -3321     -1657
      1161 2008 2008       11     -2313      -948
      1161 2009 2009       14       408      1399
      1161 2010 2010       12       509       657
      1161 2011 2011        9       491       150
      1161 2012 2012        9     -1217     -1087
      1161 2013 2013        9       -74        17
      1161 2014 2014        7      -398      -365
      1161 2015 2015        3      -646      -164
      1161 2016 2016       20      -458      -252
      1161 2017 2017       20        62        -9
      1161 2018 2018       -8       328       -57
      1164 1999 1999      106      7101       -54
      1164 2000 2000       28      4904      -433
      1164 2001 2001      148      2432      -913
      1164 2002 2002     -471     -9190     -8016
      1164 2003 2003       28     23019     22109
      1164 2004 2004       97     -3508     -3401
      1165 1999 1999        .    14.382    18.753
      1165 2000 2000        .    -2.681         0
      1165 2001 2001        .    -2.416      .206
      1165 2002 2002        .    -2.537         0
      1165 2003 2003        .    -1.126         0
      1165 2004 2004        .     -.955         0
      1165 2005 2005        .    -1.119         0
      1165 2006 2006        .    -2.014         0
      1166 1999 1999    -.509    30.965    -3.881
      1166 2000 2000     5.99   178.538         0
      1166 2001 2001   23.531    21.524         0
      1166 2002 2002     .008   -15.871         0
      1166 2003 2003    2.606     2.975     -8.16
      1166 2004 2004    4.905   108.605         0
      1166 2005 2005   16.689    12.609   -51.525
      1166 2006 2006    7.777   163.195         0
      1166 2007 2007   20.925   197.968   -14.675
      1166 2008 2008   11.745    95.836     -.706
      1166 2009 2009   30.304   -91.053   -53.668
      1166 2010 2010   12.939    378.78   -19.651
      1166 2011 2011     51.8   457.759   127.157
      1166 2012 2012   69.128    87.991    -4.088
      1166 2013 2013    8.947  1461.154  1346.658
      1166 2014 2014   14.266   187.417     -.097
      1166 2015 2015    9.952   164.978   -19.449
      1166 2016 2016    7.835   145.364    -3.305
      1166 2017 2017    4.653    549.45   341.564
      1166 2018 2018   22.279   197.574     -.204
      1173 1998 1999    1.057      .581         0
      1173 1999 2000     .139      .482         0
      1173 2000 2001     .165      .753      .131
      1173 2001 2002     .417      1.71         0
      1173 2002 2003     .613      .984       .15
      1173 2003 2004     .054      .077    -1.813
      1173 2004 2005     .302      .447      .464
      1173 2005 2006        0     2.209         0
      1173 2006 2007        0      .809     -.168
      1173 2007 2008      .05    -5.322    -1.519
      1173 2008 2009        0    -5.294      .736
      1173 2009 2010        0     4.089      .288
      1173 2010 2011     .067      1.08      .003
      1173 2011 2012        0      .828     -.025
      1173 2012 2013     .053     3.442     1.994
      1183 1998 1999  -52.079   -16.009    -9.585
      1183 1999 2000     .105   -17.342   -15.974
      1183 2000 2001     .697   -50.139    -9.097
      1183 2001 2002     .293    -5.488     -.549
      1183 2002 2003        0    -2.489         0
      1183 2003 2004     .064    -3.363         0
      1183 2004 2005     .067    -3.134         0
      1183 2005 2006     .045     -.445      1.96
      1183 2006 2007     .055    -7.601    -3.851
      1183 2007 2008     .017   -11.822    -8.033
      1186 1999 1999        .   -21.357         0
      1186 2000 2000        .    -6.982         0
      1186 2001 2001        .    -8.094         0
      1186 2002 2002        .     4.023         0
      1186 2003 2003      -.6    -7.462     4.499
      1186 2004 2004      -.2    44.984         0
      1186 2005 2005     -6.2    34.217         0
      1186 2006 2006      1.4   260.643     7.361
      1186 2007 2007     22.1   159.278      -2.3
      1186 2008 2008    3.802    95.991         0
      1186 2009 2009    8.792   108.038         0
      1186 2010 2010   25.199   435.203    57.526
      1186 2011 2011  110.889  -778.628      -6.9
      1186 2012 2012   56.962   435.141         0
      1186 2013 2013   56.478  -370.682  -200.764
      1186 2014 2014   51.302   189.138     -16.7
      1186 2015 2015   81.112    82.628         0
      1186 2016 2016  105.184   268.461         .
      1186 2017 2017  127.915   342.381         .
      1186 2018 2018  106.568  -259.052      -289
      1189 1999 1999        .    33.377     -12.5
      1193 1999 1999    2.586     5.968      9.83
      1193 2000 2000    2.727    13.016     5.103
      1193 2001 2001      .29   -13.676         0
      1193 2002 2002      .97   -16.895         .
      1209 1999 1999    148.3       669      13.9
      1209 2000 2000     92.7     118.1    -652.7
      1209 2001 2001     62.6       737      -7.6
      1209 2002 2002    136.5     784.5      27.6
      1209 2003 2003     79.1     565.4    -148.9
      1209 2004 2004    107.8     851.4       6.6
      1209 2005 2005    135.2     997.7       5.4
      1209 2006 2006    278.5    1049.3      44.3
      1209 2007 2007    248.7    1376.3      17.8
      1209 2008 2008    237.2    1478.8       -45
      1209 2009 2009    124.5     836.6    -332.4
      1209 2010 2010      192      1394    -101.1
      1209 2011 2011    162.5      1661     -48.5
      1209 2012 2012    255.7    1312.5    -262.7
      1209 2013 2013    325.5    1350.4    -249.1
      1209 2014 2014    160.6    1354.5    -328.3
      1209 2015 2015    392.9    1733.5    -201.5
      1209 2016 2016    440.8    2132.2     -99.4
      1209 2017 2017   1348.8    1416.1      -411
      1209 2018 2018      372      2015     -48.1
      1210 1998 1999    1.471      .906         0
      1210 1999 2000    -.597      .628         0
      1210 2000 2001    1.092     1.945         0
      1210 2001 2002     1.04     2.109         0
      1210 2002 2003     .275      .644     -.161
      1210 2003 2004     .515     3.526         0
      1210 2004 2005    1.412     3.446         0
      1210 2005 2006     .473     3.081         0
      1210 2006 2007    1.219     3.902         0
      1210 2007 2008    2.148     5.176       .11
      1210 2008 2009    2.892     6.992      .357
      1210 2009 2010    1.687     5.556      .009
      1210 2010 2011     .887     3.569         0
      1210 2011 2012    1.088     2.096         0
      1210 2012 2013     .675      2.73         0
      1210 2013 2014     .755     2.659         0
      1210 2014 2015     .737     3.415         .
      1210 2015 2016    2.827     6.153      -.26
      1210 2016 2017    1.093    -4.219    -1.634
      1210 2017 2018    1.659     2.652     1.787
      1213 1999 1999   53.628   147.388     4.603
      1213 2000 2000   10.604    23.226         0
      1213 2001 2001  -22.307   -28.904      21.5
      1213 2002 2002    25.18    26.971    -4.629
      1230 1999 1999     35.1     220.7       3.6
      1230 2000 2000      3.6     -15.7       -24
      1230 2001 2001    -18.4     -63.5      71.2
      1230 2002 2002    -22.8    -101.8        .5
      1230 2003 2003      -.4        29      74.5
      1230 2004 2004    -39.8     -20.6     -82.6
      1230 2005 2005      1.5     137.2     -14.7
      1230 2006 2006      9.5     -87.8    -214.3
      1230 2007 2007      3.5     201.6     -14.1
      1230 2008 2008      -.6    -213.2     -41.8
      1230 2009 2009     -8.8     202.9     -44.6
      1230 2010 2010       .4     405.9     -13.2
      1230 2011 2011      7.5     393.7     -45.1
      1230 2012 2012       78       514         0
      1230 2013 2013      149       816       192
      1230 2014 2014      326       975        30
      1230 2015 2015      349      1312       -32
      1230 2016 2016      459      1345      -117
      1230 2017 2017      177      1207      -118
      1230 2018 2018        0       585      -132
      1234 1999 1999     .186     2.869         0
      1234 2000 2000     .004     3.586         0
      1234 2001 2001    1.217     6.065      .428
      1234 2002 2002     -.34     5.468         0
      end
      Last edited by Max he; 06 Jan 2024, 10:04.

      Comment


      • #4
        Thank you for the better data example.

        Even in this one, which I think we can agree is probably reasonably representative of the data sample as a whole, of your 500 observations, you have two (gvkeys 1045 and 1050 in 1999) where no value of LRETR can be calculated at all, and, only 373 have a value that is actually calculated on a full 3 years of data. Moreover, you can see that the mean value of LRETR in the complete (i.e. based on a full three years) data is pretty far off from the overall mean:

        Code:
        . summ LRETR
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
               LRETR |        498     .252393    1.271926  -11.66442   11.34189
        
        . summ LRETR if txpd_count == 3 & diff_pi_spi_count == 3
        
            Variable |        Obs        Mean    Std. dev.       Min        Max
        -------------+---------------------------------------------------------
               LRETR |        373    .1761634    .7846606  -11.66442   4.983912
        (And, if you are curious the difference in the means between the complete and incomplete ones is statistically significant with p = 0.021 by t-test. If you prefer a non-parametric test, p = 0.0353 by -ranksum-.)

        I think you should scrutinize the article you are trying to replicate to see if perhaps their analyses were restricted to cases with complete data. Perhaps it wasn't even an issue for them--could it be that they had access to a data source that was entirely complete?

        This incompleteness of the data looks rather serious to me. If we look at the mean values of LRETR we get by the number of year's worth of data that are available to them:
        Code:
        . gen data_count = min(txpd_count, diff_pi_spi_count)
        
        . tabstat LRETR, by(data_count) statistics(count mean)
        
        Summary for variables: LRETR
        Group variable: data_count
        
        data_count |         N      Mean
        -----------+--------------------
                 0 |        17         0
                 1 |        50  .6017905
                 2 |        58  .5154008
                 3 |       373  .1761634
        -----------+--------------------
             Total |       498   .252393
        --------------------------------
        we see that there is a very strong trend towards lower values when more data is available, with the exception of the cases based on 0 data. Those are, by the way, also especially problematic. -rangestat- returns 0 as the result for a summing over an empty set. This is mathematically correct, but in applications such as this one probably leads to trouble. If there is no data for three years running for some gvkey, almost certainly you would be better off -replace-ing that 0 calculation with a missing value. And given the strong trend in the results based on the number of data points, I think it is questionable whether there is even any validity at all to values of LRETR that are derived from less than 3 years worth of data.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Thank you for the better data example.

          Even in this one, which I think we can agree is probably reasonably representative of the data sample as a whole, of your 500 observations, you have two (gvkeys 1045 and 1050 in 1999) where no value of LRETR can be calculated at all, and, only 373 have a value that is actually calculated on a full 3 years of data. Moreover, you can see that the mean value of LRETR in the complete (i.e. based on a full three years) data is pretty far off from the overall mean:

          Code:
          . summ LRETR
          
          Variable | Obs Mean Std. dev. Min Max
          -------------+---------------------------------------------------------
          LRETR | 498 .252393 1.271926 -11.66442 11.34189
          
          . summ LRETR if txpd_count == 3 & diff_pi_spi_count == 3
          
          Variable | Obs Mean Std. dev. Min Max
          -------------+---------------------------------------------------------
          LRETR | 373 .1761634 .7846606 -11.66442 4.983912
          (And, if you are curious the difference in the means between the complete and incomplete ones is statistically significant with p = 0.021 by t-test. If you prefer a non-parametric test, p = 0.0353 by -ranksum-.)

          I think you should scrutinize the article you are trying to replicate to see if perhaps their analyses were restricted to cases with complete data. Perhaps it wasn't even an issue for them--could it be that they had access to a data source that was entirely complete?

          This incompleteness of the data looks rather serious to me. If we look at the mean values of LRETR we get by the number of year's worth of data that are available to them:
          Code:
          . gen data_count = min(txpd_count, diff_pi_spi_count)
          
          . tabstat LRETR, by(data_count) statistics(count mean)
          
          Summary for variables: LRETR
          Group variable: data_count
          
          data_count | N Mean
          -----------+--------------------
          0 | 17 0
          1 | 50 .6017905
          2 | 58 .5154008
          3 | 373 .1761634
          -----------+--------------------
          Total | 498 .252393
          --------------------------------
          we see that there is a very strong trend towards lower values when more data is available, with the exception of the cases based on 0 data. Those are, by the way, also especially problematic. -rangestat- returns 0 as the result for a summing over an empty set. This is mathematically correct, but in applications such as this one probably leads to trouble. If there is no data for three years running for some gvkey, almost certainly you would be better off -replace-ing that 0 calculation with a missing value. And given the strong trend in the results based on the number of data points, I think it is questionable whether there is even any validity at all to values of LRETR that are derived from less than 3 years worth of data.


          what a detailed analysis of my problem, thank you Clyde!
          But I still have problems understanding how I can get the whole thing under control.
          In my study (An,Zhang 2013) there are no further hints how they could have formed this variable and also in the sample description there is no hint (I treated my sample the same way as the authors treated theirs).
          At least as far as they wrote it...

          I had written about the authors earlier, who do this in a similar way (Kim et al 2011a). They have a different definition of the variable and if I understood your objection correctly, that would make the whole thing even worse, because they define LRETR as follows:
          the sum of income tax paid (#317) over the previous five years divided by the sum of a firm's pre-tax income (Compustat item #170) less special items (Compustat item #17). We winsorize the values at zero and one.

          My question, however, is what does winsorize to 0 and 1 mean?
          From random study (In The Accounting Review), the authors define this as follows: Consistent with Dyreng et al. (2008), we constrain all four effective tax rate measures to fall within the [0,1] interval to ensure a valid economic interpretation related to tax avoidance.

          Does this help to get a grip on my problem? And what exactly does that mean? does it mean that everything within 0 and 1 is allowed (from 0.01 to 0.99) or does it mean from -0.99 to +0.99?

          The study by Dyreng et al. (2008) is apparently the "inventor" of the variable and they used these exact words, but do not specify if negative or positive values are allowed. Perhaps it is science jargon and you know what they mean by it?!

          In the study that An Zhang 2013 refers to with the LRETR variable (Kim et al 2011a) , the footnote states that (as you have already correctly surmised) "only three consecutive years of non-missing data for these items" and by that they mean the variable LRETR

          To conclude, is there a way to implement both of these? I don't want to make my data set unnecessarily smaller than it already is, but maybe one of the two restrictions is not as invasive as the other one.



          Comment


          • #6
            I can't really help you with this aspect of it. I know nothing about finance and accounting. I never heard of LRETR before. In analyzing your problem above, I simply relied on the general rule that if there is nothing obviously wrong with the code, there is a good chance that the problem is with bad data, so I looked for that, and found it. I am gratified to see that at least one of the sources you now cite makes the point that the calculation needs to be done with three years of non-missing data. To do that you will either have to find a better data set with fewer incomplete observations, or accept that your sample size is going to be appreciably smaller than you wished.

            Winsorising at 0, 1 does not sound wise to me. You will be tampering with over 20% of your data if you do that. Winsorising is often used to reduce the influence of outliers on subsequent calculations. But when we're talking about 20% of the data, we are no longer really in the realm of just outliers. If there is some more principled reason why values of LRETR > 1 should be replaced by 1, and those < 0 replaced by 0, then fine. But without a principled explanation of the reason for doing that, it looks wrong to me. That is a matter of my statistical judgment, though, and if there are substantive considerations in finance and accounting that apply, those could well overrule the statistical generalities.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              I can't really help you with this aspect of it. I know nothing about finance and accounting. I never heard of LRETR before. In analyzing your problem above, I simply relied on the general rule that if there is nothing obviously wrong with the code, there is a good chance that the problem is with bad data, so I looked for that, and found it. I am gratified to see that at least one of the sources you now cite makes the point that the calculation needs to be done with three years of non-missing data. To do that you will either have to find a better data set with fewer incomplete observations, or accept that your sample size is going to be appreciably smaller than you wished.

              Winsorising at 0, 1 does not sound wise to me. You will be tampering with over 20% of your data if you do that. Winsorising is often used to reduce the influence of outliers on subsequent calculations. But when we're talking about 20% of the data, we are no longer really in the realm of just outliers. If there is some more principled reason why values of LRETR > 1 should be replaced by 1, and those < 0 replaced by 0, then fine. But without a principled explanation of the reason for doing that, it looks wrong to me. That is a matter of my statistical judgment, though, and if there are substantive considerations in finance and accounting that apply, those could well overrule the statistical generalities.
              I can only agree with you Clyde, but to quote my supervisor: If I find a journal that does this (and its not a triple D journal), I'm fine as a master's student.

              would you write the code the same way after the explanations above?
              Code:
              replace LRETR = 1 if LRETR > 1
              replace LRETR = 0 if LRETR < 0
              And what about the values below -1 ... is there a lower bound too or do they mean by "winsor at 0 and 1" only replace the values as i did?! or do i have to to it as this as well:

              Code:
              replace LRETR = -1 if LRETR < -1
              And do you have a code in mind that allows me to count 3 consecutive years of LRETR and sort out the observations that do not match the criteria?

              I'm at least curious to try it out, it can't get any worse than a regression variable with =0

              Comment


              • #8
                If I find a journal that does this (and its not a triple D journal), I'm fine as a master's student.
                Well, that may well be the operating standard. But the truth is there is plenty of crap published even in top journals in every field. Peer review is better than no review, but only slightly so. That said, as a master's student you are probably in no position to argue with your supervisor. It's easier for me at my stage of my career to take stands. So tuck this experience away in some corner of your memory, and later in life when you are in a more independent position, remember it.

                And what about the values below -1...
                Well, any value below -1 is already below 0, and so is already replaced by the earlier code.

                [quote]would you write the code the same way after the explanations above?
                Code:
                replace LRETR = 1 if LRETR > 1
                replace LRETR = 0 if LRETR < 0

                [/code]
                Well, I probably wouldn't take this approach at all, but if I were to, this code is close. The only problem is with the first comment. Remember that in Stata, missing values are treated as larger than any real number. So a missing value of LRETR will satisfy LRETR > 1 and will be replaced by 1. But that's not part of Winsorising--you want to keep missing values missing. The most transparent way to fix this is to change the first command to -replace LRETR = 1 if LRETR > 1 & !missing(LRETR)-.

                And do you have a code in mind that allows me to count 3 consecutive years of LRETR and sort out the observations that do not match the criteria?
                Code:
                rangestat (sum) txpd (count) txpd, interval(fyear -2 0) by(gvkey)            
                label var txpd_sum "Sum of the cash tax paid during the last 3 years"
                
                bys gvkey fyear: gen diff_pi_spi = pi-spi    
                label var diff_pi_spi "Difference between pretax income and special items"
                
                
                rangestat (sum) diff_pi_spi (count) diff_pi_spi, interval(fyear -2 0) by(gvkey)                
                label var diff_pi_spi_sum "Sum of the difference between pretax income and special items during the last 3 years"
                
                bys gvkey fyear: gen LRETR = txpd_sum / diff_pi_spi_sum if txpd_count == 3 ///
                    & diff_pi_spi_count == 3
                label var LRETR "long-run cash effective tax rate LRETR"
                By the way, as an aside, in -bys gvkey fyear: gen diff_pi_spi = pi-spi-, the -bys gvkey fyear:- part is unnecessary. It does no harm, except perhaps slowing down the computation to an imperceptable or barely perceptible degree. But it isn't needed for that calculation.


                Comment


                • #9
                  I can agree with you once again. I am also absolutely happy if I can achieve similar results to the studies I try to follow.
                  I have implemented our tips and although this seems to us to be a massively invasive intervention in the data (not because I am losing many observations, but because I am changing many observations), I am achieving meaningful interpretative results in my regression and the summary statistics now also fit very roughly.
                  So I can sleep peacefully with this results, thank you very much again Clyde for your really fantastic help!!!

                  Comment

                  Working...
                  X