Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Financial statement comparability (rangerun after rangestat?

    Dear All, In an earlier post (https://www.statalist.org/forums/for...fter-rangestat), I asked how to compute the measure of financial statement comparability. Thanks to Robert Picard, who offered a helpful code in doing this (using runby & rangerunm, both from SSC). However, due to the large dataset (using all listed A shares in China over the 1991-2018 year), it took more than 10 hours (according to my friend) and never ends! I just wonder if the code from Robert can be speeded up somehow. Any suggestions are highly appreciated. The following is taken from #4 of the above link (by Robert):
    Code:
    clear all
    set seed 3123
    
    * demonstration dataset, 50 firms over 40 quarters in 10 industry
    set obs 50
    gen firmid = _n
    gen industry = runiformint(1,10)
    expand 70
    bysort firmid: gen qdate = yq(1999,4) + _n
    format %tq qdate
    gen returns = runiform()
    gen earnings = runiform()
    
    * pick a quarter to calculate measure, use quarters in 2 previous years
    gen q2use = quarter(dofq(qdate)) == 4
    gen qlow  = cond(q2use, qdate - 11, 1)
    gen qhigh = cond(q2use, qdate - 4, 0)
    format %tq qlow qhigh
    
    program get_CompAcct
        reg earnings returns
        predict pearn, xb
        reg earnings2 returns2
        gen pearn2 =  _b[returns2] * returns + _b[_cons]
        count if !mi(pearn,pearn2)
        
        gen CompAcct_nobs = r(N)
        gen CompAcct = -sum(abs(pearn-pearn2)) / CompAcct_nobs
        drop pearn pearn2
    end
    
    program pair_by_quarters
        tempfile hold
        save "`hold'"
        rename (firmid returns earnings) (firmid2 returns2 earnings2)
        joinby qdate using "`hold'"
        keep if firmid != firmid2
        sort firmid firmid2 qdate
        rangerun get_CompAcct, by(firmid firmid2) interval(qdate qlow qhigh)
    end
    runby pair_by_quarters, by(industry) verbose
    
    save "results.dta", replace
    
    sort industry qdate firmid firmid2
    
    * to install, type: ssc install listsome
    listsome industry qdate firmid firmid2 CompAcct_nobs CompAcct ///
        if q2use & !mi(CompAcct), sepby(qdate)
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    It's a tall order asking people to improve on code written by Robert Picard! Unsurprisingly, I have not found any ways to materially speed this up.

    If you change program CompAcct as follows:
    Code:
    program get_CompAcct
        reg earnings returns
        predict pearn, xb
        reg earnings2 returns2
        gen pearn2 =  _b[returns2] * returns + _b[_cons]
       
        gen CompAcct_nobs = sum(!mi(pearn, pearn2))
        gen CompAcct = -sum(abs(pearn-pearn2)) / CompAcct_nobs
        drop pearn pearn2
    end
    on my set up it saves about 0.5 seconds on the demonstration data set you show, which is about a 1% improvement. But I couldn't come up with anything better than that. You can also perhaps shave another fraction of a percent off the run time by eliminating the -verbose- option from the -runby- command. That improvement will come at the price of not having any indication of what went wrong in any industry that didn't yield results (as in the example where one industry has no observations.)

    There are a few things you can consider that might get you results more quickly using the same code:

    1a. Get (or borrow, or rent in the cloud) a computer with a much faster processor and more RAM.

    1b. Split the data set into separate industries, and run them in parallel on separate computers. This doesn't reduce total computational effort but you get the results more quickly.

    2. Be patient. In my world, a 10 hour run would not be considered exceptionally long. I routinely do things that run for more than 24 hours, and have occasionally had calculations that took longer than a week to conclude. To make it easier to be patient, consider adding the -status- option to the -runby- command. That way you will get periodic progress reports, along with an estimate of the remaining time.

    Finally, if this is an analysis that will be run recurrently with different data sets, it might be worth hiring somebody to program this in a compiled language, rather than doing it in Stata.

    Added note to any Stata developers who might be following this thread: one part of the code that undoubtedly is a major time sink is the place in program pair_by_quarter where the current data are copied into a tempfile. If it were possible to do something analogous to -joinby- with frames (in the same sense that -frlink- is analogous to -merge-), this could likely be sped up considerably by using a frame instead of a tempfile.

    Comment


    • #3
      Dear Clyde, Thanks a lot, and I will follow your suggestions to see what I can do.

      PS: By the way, do you think that, if I use (ssc install) -rangestat- to obtain coefficients and residuals before going to the procedures, will it save a little time? I doubt that -reg- command calculates many unnecessary statistics.
      Last edited by River Huang; 13 Oct 2019, 17:46.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        I doubt using -rangestat- will speed things up much. You can try it on a smaller data set and time it both ways to see. But -rangestat- and -reg- do the same computations for calculating regression coefficients. They have somewhat different overhead for setup time, but that would, I think, be quite small compared to the time required for the regressions themselves.

        Comment


        • #5
          Dear Clyde, I see, and thanks again.

          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment


          • #6
            Dear Clyde, Suppose that I only want regressions with exactly 16 (quarterly) observations, is it possible (and how) to skip the regressions with fewer observations (so that I can save time)?
            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7
              So you could recode the program as:
              Code:
              program get_CompAcct
                  reg earnings returns
                  predict pearn, xb
                  gen CompAcct_nobs = sum(!mi(pearn, pearn2))
                  if CompAcct_nobs[_N] == 16 {
                      drop CompAcct_nobs
                      reg earnings2 returns2
                      gen pearn2 =  _b[returns2] * returns + _b[_cons]
                      gen CompAcct = -sum(abs(pearn-pearn2)) / CompAcct_nobs
                      drop pearn pearn2
                  }
              end
              and this would skip the regressions when the number of observations in the estimation sample would be different from 16 (more, as well as less). Whether the time savings would be appreciable depends on how many firm pairs will turn out to have other than 16 observations to contribute to the regression. If there are a lot of those, you will save a lot of time. If only a few, it won't make a noticeable difference.

              Comment


              • #8
                Dear Clyde, Thanks a lot, and I'll give it a try. On final question is that, for "each firm-year", in order to estimate the following equation using the "previous 16 quarters" of data
                Code:
                reg earnings returns
                Is it correct to modify the above code to
                Code:
                gen qlow = cond(q2use, qdate - 19, 1)
                gen qhigh = cond(q2use, qdate - 4, 0)
                or
                Code:
                gen qlow = cond(q2use, qdate - 20, 1)
                gen qhigh = cond(q2use, qdate - 5, 0)
                or others. Thanks again.
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment


                • #9
                  Dear Clyde, I am a little confused with the code you offered in #7. Before the `if' condition command, we need to
                  Code:
                  gen CompAcct_nobs = sum(!mi(pearn, pearn2))
                  However,pearn2 is calculated using the `if' command below.
                  Code:
                  if CompAcct_nobs[_N] == 16 {
                          drop CompAcct_nobs
                          reg earnings2 returns2
                          gen pearn2 =  _b[returns2] * returns + _b[_cons]
                          gen CompAcct = -sum(abs(pearn-pearn2)) / CompAcct_nobs
                          drop pearn pearn2
                      }
                  Am I wrong about this?
                  Ho-Chuan (River) Huang
                  Stata 17.0, MP(4)

                  Comment


                  • #10
                    Re #8: It depends on what you mean by the previous 16 quarters. Your second version excludes the current quarter and counts back from a year before the immediately preceding one. Your first version includes the current quarter and counts back from one year ago. Actually if you really mean the 16 quarters preceding the current one, you would set qlow to qdate-16 and qhigh to qdate-1. If you mean 16 quarters back, including the present one, it's qdate-15 and qdate.

                    Re #9. Sorry, yes you are right. It should be -gen CompAcct_nobs = sum(!mi(earnings2, returns2))-, as those are the variables used in the regression.

                    Comment


                    • #11
                      Dear Clyde, Thanks again. Do you think that the following code is OK?
                      Code:
                      program get_CompAcct
                          gen CompAcct_nobs = sum(!mi(earnings, returns, earnings2, returns2))
                          if CompAcct_nobs[_N] == 16 {
                              reg earnings returns
                              predict pearn, xb
                              reg earnings2 returns2
                              gen pearn2 =  _b[returns2] * returns + _b[_cons]
                              gen CompAcct = -sum(abs(pearn-pearn2)) / 16
                              drop pearn pearn2
                          }
                      end
                      Last edited by River Huang; 19 Oct 2019, 20:08.
                      Ho-Chuan (River) Huang
                      Stata 17.0, MP(4)

                      Comment


                      • #12
                        Yes, of course. I'm sorry. I guess I wasn't paying close enough attention. But you are absolutely right: it has to be based on observations that have all of the variables needed for both regressions.

                        Comment


                        • #13
                          Dear Clyde, Got it and many thanks. I am still confused with what you said
                          Code:
                          Your first version includes the current quarter and counts back from one year ago.
                          Actually if you really mean the 16 quarters preceding the current one,
                          you would set qlow to qdate-16 and qhigh to qdate-1.
                          If you mean 16 quarters back, including the present one, it's qdate-15 and qdate.
                          Let me make it clearer: My purpose is to obtain a measure of FSC (financial statement comparability) for each pair of firms and for "each year", using previous 16 quarters (excluding any quarter in the current year). Professor Robert Picard suggested the following setup to save time (avoiding unnecessary repetitions, I think) since, for each year, we only need to calculate once the measure.

                          Another question, though.

                          Robert suggested to calculate the measure at the fourth quarter of each year (and don't replicate the procedures for the other three quarters). This is great to save lots of time!
                          My problem is: What is the difference between the code
                          Code:
                          gen q2use = quarter(dofq(qdate)) == 4
                          gen qlow  = cond(q2use, qdate - 19, 1)
                          gen qhigh = cond(q2use, qdate - 4, 0)
                          and your suggested code
                          Code:
                          gen q2use = quarter(dofq(qdate)) == 4
                          gen qlow  = cond(q2use, qdate - 16, 1)
                          gen qhigh = cond(q2use, qdate - 1, 0)
                          In my case above, what would be your suggestion?




                          Ho-Chuan (River) Huang
                          Stata 17.0, MP(4)

                          Comment


                          • #14
                            The first block of code skips over a year and begins four quarters in the past and goes back through 19 quarters in the past. My suggested code starts one quarter in the past and goes back through 16 quarters in the past. Since you want to exclude any quarter in the present year, my suggested code would not be appropriate for you.

                            Comment


                            • #15
                              Dear Clyde, I see, and thank a lot.

                              Ho-Chuan (River) Huang
                              Stata 17.0, MP(4)

                              Comment

                              Working...
                              X