Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reproducing regression results: Unexpected coefficients

    Hello everybody!

    I am currently writing my master thesis in accounting and part of that is replicating the investigations in a research paper by Francis et al. (2005) (download here, in case you are interested). I am not entirely sure if this turns out to be more of a question about Stata or about statistics, but here we go:

    I tried recreating their procedure 1:1 (as far as possible). Right now I use data from the same time period as the paper and the variables that I calculated appear to be pretty close to theirs (judging by the means, medians and other quantiles they list). The last step is plugging them into a regression about which they provide the following info:

    "Our analyses are based on annual regressions […] for the period t = 1970-2001: […] To control for cross-sectional correlations, we assess the significance of the 32 annual regression results using the time-series standard errors of the estimated coefficients (Fama-MacBeth, 1973)." (p. 308)
    So I went ahead and tried:
    Code:
    xtset firm_j period_t, yearly
    asreg CostDebt Leverage Size ROA IntCov sigmaNIBE AQ_deciles, fmb
    Which gave me the following results:
    Code:
    Fama-MacBeth (1973) Two-Step procedure           Number of obs     =     86368
                                                     Num. time periods =        32
                                                     F(  6,    31)     =    314.49
                                                     Prob > F          =    0.0000
                                                     avg. R-squared    =    0.0664
                                                     Adj. R-squared    =    0.0641
    ------------------------------------------------------------------------------
                 |            Fama-MacBeth
        CostDebt | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
        Leverage |  -.0565528   .0018785   -30.10   0.000    -.0603841   -.0527216
            Size |  -.0013385   .0003525    -3.80   0.001    -.0020574   -.0006195
             ROA |  -.0386261   .0043771    -8.82   0.000    -.0475532    -.029699
          IntCov |   .0000469    .000015     3.12   0.004     .0000163    .0000775
       sigmaNIBE |   .0456804    .007419     6.16   0.000     .0305494    .0608115
      AQ_deciles |   .0023378   .0001652    14.15   0.000     .0020009    .0026747
           _cons |   .1135654   .0036623    31.01   0.000      .106096    .1210347
    ------------------------------------------------------------------------------
    Which are pretty different from the results reported in the paper (p. 309):
    Code:
        CostDebt | Coefficient     t        
    -----------------------------------
        Leverage |    -2.5       -9.76  
            Size |    -0.01      -0.55   
             ROA |    -1.65      -5.02   
          IntCov |    -0.00      -5.24   
       sigmaNIBE |     5.44      12.35   
      AQ_deciles |     0.14      13.36   
    ------------------------------------
    Considering that the period of observation is the same as the paper's, all variables are calculated strictly following the paper and a quick check showed they were distributed similarily (from what I can tell), I am very surprised by how different some of the coefficients turned out. I also tried other regression methods/commands (plain old reg, xtreg and xtfmb) and they all give me pretty much the same results.

    So I'm wondering: Am I doing something wrong on the Stata side of things? Do I use the right commands? Do they mean some completely different procedure? Ist there anything else I could look into?

    I'm using Stata 17 on Windows 10.

    Every hint will be greatly appreciated.

    Thanks for your attention, have a good one!

    Immo

  • #2
    Did you limit to 20 obs per year for an industry and windsorize at the 1% tails? Is the distribution of the DV similar to theirs?

    Comment


    • #3
      Immo:
      at page 311-12, Authors explain the approach they followed in creating the variables plugged in their regression.
      Did you follow their very same steps?
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        George Ford The 20 observation minimum applies to the first regression that is used to calculate AQ, which is an idependent variable in the regression I am asking about. They don't report any results of that 1st regression, but the results I get for AQ are close enough for me. I winsorized all variables at the 1% tails after calculating them, except for AQ. I did not winsorize it because they don't use the raw values for AQ in the regression, but their decile ranks. So I figured it wouldn't make a difference for the regression (p. 308). See the table below which shows my results compared to those reported in the paper.

        Carlo Lazzaro I followed all the steps carefully and whenever something wasn't 100% clear, I followed related paper's approaches. See the table below to compare my results to those in the paper.

        A little bit of background: The sample consits of US firms that have all the data needed to calculate the ominous AQ variable (accruals quality). It ranges from 1970-2001 and has ca. 96000 observations (compared to ca. 91000 reported for the same period in the paper). Following the paper, I drop all observations that don't have all variables required for the regression. After that I'm still left with 86000 observations (compared to 76000 reported in the paper). The table below compares the values I calculated to those reported by Francis et al. (2005), p. 307

        Variable: Mean 10% 25% Median 75% 90%
        Paper Me Paper Me Paper Me Paper Me Paper Me Paper
        AQ 0.0442 .0498852 0.0107 .0110657 0.0179 .0190715 0.0313 .0338772 0.0558 .0619193 0.0943 .1098362
        Market value equity 1206.6 1214.252 4.7 4.033469 14.3 13.0305 64.2 59.43625 374.8 356.82 1702.1 1663.572
        Assets 1283.5 1320.24 8.5 6.079 25.6 22.155 102.0 97.45 511.3 529.923 2333.6 2436.755
        Sales 1240.1 1236.908 8.9 6.054 30.7 26.674 127.6 124.499 575.2 583.826 2297.8 2284.962
        ROA 0.003 .0025771 -0.101 -.1221669 0.005 .0024974 0.042 .045052 0.076 .0847903 0.114 .134294
        Market to book ratio 2.02 2.128535 0.44 .4407023 0.77 .7760555 1.32 1.336096 2.29 2.358652 4.07 4.31252
        Cost Debt 0.099 .1075387 0.059 .0583497 0.074 .0745965 0.092 .0933688 0.114 .1178158 0.144 .1550633
        Leverage 0.276 .2791926 0.010 .0082177 0.109 .1087576 0.248 .2528269 0.381 .3844805 0.520 .5292025
        sigmaNibe 0.065 .0804085 0.011 .0106844 0.020 .0204683 0.038 .040088 0.077 .0860647 0.151 .1884758
        Earnings-price ratio 0.089 .1260663 0.026 .0258647 0.047 .0487951 0.073 .0783496 0.114 .1259355 0.166 .1882353
        IndEP 0.008 .010871 -0.045 -.0480226 -0.022 -.0223617 0.001 .0015346 0.027 .0299525 0.062 .0738549
        (the variables that are relevant for the regression are bold)

        So I think that most of the variables are at least somewhat close to those used by the authors. But maybe I'm underestimating the effects of the differences ...

        I found some papers that also repeated the analysis and I'll thoroughly check them for other evidence of what I'm doing wrong tomorrow.

        Once again, thank you for looking into this!

        Best regards,
        Immo
        Last edited by Immo Bock; 24 Mar 2022, 16:35.

        Comment


        • #5
          Immo:
          my gut-feeling is that the difference in observations (paper vs. your dataset) can explain most of the small differences in results.
          In addition, papers might be unclear/incomplete about some methodological issues due to (say) word-count constraints.
          I would not be concerned about that and simply report on this issue in your dissertation/research report/article/working-paper/whatever else.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            It's strange that the means are close (so not a scale problem) but the coefficients are very different. The correlations, obviously, are not the same, which makes me think of a coding error of some sort in creating the data. Maybe something is out of sync across the variables. Are you merging data or does all of it come from the same place?

            Are the coefficients from the first stage close?

            Might want to run the regression by year and compute the average of the coefficients. Perhaps asreg is doing something you don't want it to.

            Size variable is in logs in paper. s(NIBE) is scaled by average assets, which may not have been done for the table of means (and required 5 obs to construct).

            So, AQ is a prediction from another regression? That might be the source. Try leaving it out, using its raw form, or some other simple procedure to see if the scale of the coef becomes closer. Also, after you figure this out, you need to bootstrap second stage since you have a prediction as a regressor.

            I'd contact the authors of the paper and ask for the data (or a correlation matrix, code, ... whatever you can get). You need to get this figured out. You won't be comfortable until you do and it will make the paper difficult to publish.

            Comment


            • #7
              The data I'm using in this regression is all from the same source (Compustat). I merge the this data with stock market data from CRSP (CAPM betas based on monthly return data) for use in a another regression. However, I just removed the whole merging part from my .do file and the results are still the same.

              I also played around with different combinations of independent variables and the coefficients still did not go anywhere near those in the paper. Leaving out or changing AQ also didn't have an noticeable effect. Then I tried doing yearly regressions like this (which also was my original approach):
              Code:
              forvalues i = 1970/2001 {
                  
                  local z = `y'+1
              
                      capture noisily reg CostDebt Leverage Size ROA IntCov sigmaNIBE AQ_old_deciles if fyear == `i'
              }
              A quick look at the coefficients showed that they were within the same range as before. This and the fact that xtfmb (which is a apparently is a specialized function for Fama-Macbeth regressions) gives the exact same results as asreg with the fmb option make me think that the problem is probably not the regression.

              As for the variables:

              The size variable was calculated like this:
              Code:
              gen Size = log(at)
              (at are Assets - Total)

              s(NIBE) was calculated like this:
              Code:
              gen NIBE_scaled = NIBE /((at + L.at)/2)
              rangestat (sd) sigmaNIBE = NIBE_scaled, interval(period_t -10 0) by (firm_j)
              rangestat (count) sigmaNIBE_count = NIBE_scaled, interval (period_t -10 0) by (firm_j)
              replace sigmaNIBE=. if sigmaNIBE_count<5
              drop NIBE_scaled sigmaNIBE_count
              I played around with a few versions for s(NIBE), but the results I got from this were the ckosest to the ones in the paper.

              I already contacted the authors a while ago and they responded that they do not have any of the data or code anymore since they wrote the paper about 20 years ago.

              I am running out of ideas, but I'm sure I'll find the problem at some point.

              Once again, thank you for your time!
              Last edited by Immo Bock; 25 Mar 2022, 12:17.

              Comment


              • #8
                You cannot assume their results are legitimate and you have failed to reproduce them. That is important.

                Comment

                Working...
                X