Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating moving median for industries

    Dear Statalist users:

    I am trying to replicate the industry-level returns on equity (ROE) as a moving median of past ROEs from all firms in the same industry (48 industries are classified in this case) using at least 5 years and up to 10 years of data, based on this paper by Gebhardt, Lee and Swaminathan (2001, p.143 - see full ref. below*).

    I've thought of using the mvsumm command, but since in any give year there are multiple firms in a given industry, I was unable to declare panel data in the first place due to repeated time values in the panel.

    Please could someone give me some advice on this? Many thanks in advance.

    Best,


    Chieh Lin

    * Gebhardt, W.R., C.M.C. Lee, and B. Swaminathan, 2001, Toward an implied cost of capital, Journal of Accounting Research, 39(1): 135-176.

  • #2
    Leaving aside the issue of calculating medians, what does a "moving" median even mean if you have multiple observations for the same time values?

    Comment


    • #3
      Dear Clyde,

      Thank you for the reply. The original paragraph in the paper reads "The industry target ROE is a moving median of past ROEs from all firms in the same industry. We exclude loss firms on the basis that the population of profitable firms better reflects long-term industry equilibrium rates of returns. We use at least five years, and up to ten years, of past data to compute this median."

      I have interpreted it as follows: the target ROE figure for the industry "Food" in year 2000, for instance, would be an overall median of ROEs from all firms that are part of the Food industry between 1991 and 2000. Likewise, the figure for year 2001 would be the median of ROEs of all firms pertaining to Food industry between 1992-2001, etc. Further, the latter part of the description seems to me I should also drop firms with less than 5 years of data within any 10-year window which would complicate things even more.

      I then got stuck upon realising, as you noted, there are multiple observations under any given industry title for the same time values.

      Do you think the description should have been interpreted differently? Or could this figure be calculated without using the mvsumm command?

      Thanks very much in advance!

      Best wishes,


      Chieh

      Comment


      • #4
        I haven't read the article and this is just a wild guess, but I cite you:

        The industry target ROE is a moving median of past ROEs from all firms in the same industry.
        So that seems like

        the target ROE figure for the industry "Food" in year 2000, for instance, would be an overall median of ROEs from all firms that are part of the Food industry between 1990 and 1999.

        Regarding code to do this, you should probably setup (formatted!) example data that people can work with. This is always a good idea.
        Last edited by Roberto Ferrer; 26 Mar 2015, 11:42.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Well, I can't think of anything really elegant. But let's assume you have numeric variable encoding industry, firm, year, and roe. And I assume that there is at most one observation for any firm in any given year. Then I think this will work:

          Code:
          levelsof industry, local(industries)
          levelsof year, local(years)
          
          gen moving_median_roe = .
          
          foreach ind of local industries {
              foreach y of local years {
                  // IDENTIFY 10 YEAR WINDOW FOR THIS INDUSTRY
                  gen byte to_use = (industry == `ind' & inrange(year, `=`y'-9', `y')
                  // IDENTIFY FIRMS WITH < 5 OBSERVATIONS IN THE WINDOW
                  by firm, sort: egen obs_count = total(to_use)
                  // AND EXCLUDE THEM FROM THE CALCULATIONS
                  replace to_use = 0 if obs_count < 5
                  summarize roe if to_use, detail
                  replace moving_median_roe = `r(p50)' if industry == `ind' & year == `y'
                  drop to_use obs_count
              }
          }
          The above does not, however, deal with this detail:
          We exclude loss firms on the basis that the population of profitable firms better reflects long-term industry equilibrium rates of returns
          I am not sure what "loss firm" means. Does that mean that total ROE for the firm over the 10 year window is negative, or does it mean that particular years in which a firm runs a loss are excluded, or does it mean that the firm is excluded from the calculation entirely if it ever has a negative roe? Or something else, possibly involving variables other than roe?

          Comment


          • #6
            Dear Clyde,

            Many thanks for the detailed answer! I just got around to using Stata and applied the suggested code. I ensured all year, firm, industry and roe are in numeric form, and that there are no firm-year duplicates. However, despite no error message has shown, it has not returned any moving_median_roe. Below is the outcome:

            encode ffind, gen(industry)

            destring gvkey, gen (firm)
            gvkey has all characters numeric; firm generated as long

            duplicates report firm year

            Duplicates in terms of firm year

            --------------------------------------
            copies | observations surplus
            ----------+---------------------------
            1 | 197873 0


            levelsof industry, local(industries)
            1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

            levelsof year, local(years)
            1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

            gen moving_median_roe = .

            foreach ind of local industries {
            foreach y of local years {
            // IDENTIFY 10 YEAR WINDOW FOR THIS INDUSTRY
            gen byte to_use = (industry == `ind' & inrange(year, `=`y'-9', `y')
            // IDENTIFY FIRMS WITH < 5 OBSERVATIONS IN THE WINDOW
            by firm, sort: egen obs_count = total(to_use)
            // AND EXCLUDE THEM FROM THE CALCULATIONS
            replace to_use = 0 if obs_count < 5
            summarize roe if to_use, detail
            replace moving_median_roe = `r(p50)' if industry == `ind' & year == `y'
            drop to_use obs_count
            }
            }

            end of do-file

            count if moving_median_roe != .
            0

            At first I thought the line of code on identifying 10-year window might have lacked a right-hand parenthesis, but it still doesn't work after I added it.

            Regarding the "loss firms" part, I believe it means exclusion of years in which any given firms make a loss - hence a negative roe. My current dataset has already excluded them.

            Please would you be able to give me some additional advice on this? Thank you very much again.

            Best wishes,


            Chieh Lin

            Comment


            • #7
              You were correct in noticing the missing right parenthesis. Sorry about that.

              But, there is more to it. Actually, I am skeptical that you are showing me what really happened, because I can't reproduce it. In fact, there is an error in the code that should have caused it to halt with an error message. It should not have completed the loop leaving all values of moving_median_roe missing.

              The error is that when it tries to calculate the median roe for an industry-year combination in which there are no usable observations (e.g. any industry in 1950 through 1953), to_use will always be 0, and so the -summarize- command will find no observations and will not create r(p50). So when the -replace- command is reached, it will look like -replace moving_median_roe = if industry == ....- Of course, with no expression after the equals sign, this is invalid syntax and the program will halt and say, in red, "if not found". (I know, because I tried it.)

              The fix is simple: the -summarize- and -replace- commands need to be guarded by an -if- statement to assure that they are only executed if there is at least one observation that met the inclusion criteria.

              Code:
              levelsof industry, local(industries)
              levelsof year, local(years)
              
              gen moving_median_roe = .
              
              foreach ind of local industries {
                  foreach y of local years {
                      // IDENTIFY 10 YEAR WINDOW FOR THIS INDUSTRY
                      gen byte to_use = (industry == `ind' & inrange(year, `=`y'-9', `y'))
                      // IDENTIFY FIRMS WITH < 5 OBSERVATIONS IN THE WINDOW
                      by firm, sort: egen obs_count = total(to_use)
                      // AND EXCLUDE THEM FROM THE CALCULATIONS
                      replace to_use = 0 if obs_count < 5
                      count if to_use
                      if `r(N)' > 0 {
                         summarize roe if to_use, detail
                         replace moving_median_roe = `r(p50)' if industry == `ind' & year == `y'
                      }
                      drop to_use obs_count
                  }
              }
              
              summ moving_median_roe
              I tested the above on some toy data, and it runs and produces results that look plausible. Give it a try; let me know if there are any other problems. Sorry I didn't see that problem the first time around.

              Comment


              • #8
                By the way, those -summarize- and -replace- commands will generate a lot of output that you may not really want to retain. (On the other hand, that also lets you know that the program is actually running and not just hung.) Anyway, if the output is too burdensome, you might want to consider putting the big looping part inside a -quietly- block.

                Comment


                • #9
                  Dear Clyde,

                  That's extremely helpful! I'm still fairly new to Stata, and was repeatedly copying and pasting the code into do-file before running it, which then gave the illusion that the code worked whilst in fact nothing happened, as referred to above. As soon as I ran the code directly from the command bar it worked.

                  Thanks very much again for your help. I really appreciate it.

                  Best wishes,


                  Chieh

                  Comment


                  • #10
                    Dear All,
                    I have a similar problem as Chieh. I want to compute average industry return over past 30 days. My data is as follow: date, ret (daily CRSP returns), industry (2-digit SIC). It's not a panel because I have more companies in same industry at a given date. I've tried Clyde's loop in post #7 (little bit modified, see attachment) and it doesn't work. Stata is not returning any error (in a second I have a massage: end of do-file), but the loop is not working. I would really appreciate any help from you. Thank you in advance
                    Attached Files

                    Comment


                    • #11
                      Hi Chieh and Clyde,

                      I need the exact same variables and also I am new to Stata.

                      I destringed and renamed the variables so I could copy the code mentioned above by Clyde in post#7. Unfortunately, the "if not found" error keeps popping up, although copying the entire code in the command bar. Does anyone has any suggestions or ideas?

                      Thanks in advance!
                      Mertin

                      Code:
                      levelsof industry, local(industries)
                      levelsof year, local(years)
                      
                      gen moving_median_roe = .
                      
                      foreach ind of local industries {
                          foreach y of local years {
                              gen byte to_use = (industry == `ind' & inrange(year, `=`y'-9', `y'))
                              by firm, sort: egen obs_count = total(to_use)
                              replace to_use = 0 if obs_count < 5
                              count if to_use
                              if `r(N)' > 0 {
                                 summarize roe if to_use, detail
                                 replace moving_median_roe = `r(p50)' if industry == `ind' & year == `y'
                              }
                              drop to_use obs_count
                          }
                      }
                      
                      summ moving_median_roe

                      Comment


                      • #12
                        Do not copy the code into the command bar. Run the code in a do file (type doedit and a do file editor will popup; copy, paste, and run from this). If you get an error, follow Roberto's instructions in #4 : "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"
                        Stata/MP 14.1 (64-bit x86-64)
                        Revision 19 May 2016
                        Win 8.1

                        Comment


                        • #13
                          Carole J. Wilson Clyde Schechter

                          ​Since this thread started, rangestat has been introduced by Robert Picard and friends. http://www.statalist.org/forums/foru...s-within-range

                          The rangestat equivalent of that code appears to be

                          Code:
                           
                          rangestat (median) roe (count) roe, interval(year -9 0) by(industry)
                          The count variable can be used afterwards to replace values based on too few years with missing.

                          Comment


                          • #14
                            Hello, i need help. I have 4 variables i.e.
                            VRegS VRegP VregPmake and VRegSmake.
                            Capital S stands for sample while P stands for population

                            VRegS=vehicle registration for sample

                            VRegP= vehicle registration for population

                            VregPmake=...make of a vehicle from the population

                            VRegSmake=...make of a vehicle from sample
                            Sample size is 100
                            Population is 1000

                            So, I have 100 observations VRegS, 1000 observations of VRegP, and 1000 observations VRegPmake.

                            And
                            100 missing values of VRegSmake

                            So, I want to replace VRegSmake
                            How?

                            Here is the catch:
                            The 100 makes for S can be found in the 1000 makes for P.

                            I can't use

                            . replace VRegSmake = VRegPmake if VRegS == VRegP

                            Because S is smaller than P and the values are not matched one to one, even if I sort, you getting the point?

                            So, what should I do?

                            Comment


                            • #15
                              Mwiinga: Please start a new thread. Your question has nothing to do with moving medians, so it is just lost here.

                              Comment

                              Working...
                              X