Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hirfindahl-Hirschman calculation

    Hi Guys,

    For my research I need firm complexity as a control variable.

    The way in which well-known authors proxy for firm complexity is with the (revenue-based) Hirfindahl-Hirschman index.

    It is calculated as "the sum of the squares of each segment's sales as a percentage of the total firm sales".

    Now my dataset, retrieved from Compustat, looks like this (for two firms from my sample):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 SegmentType double NetSales str28 CompanyName long CIKNumber str4 fyear
    "BUSSEG"   6.614 "CERES INC"                  767884 "2010"
    "BUSSEG"   6.616 "CERES INC"                  767884 "2011"
    "BUSSEG"   6.616 "CERES INC"                  767884 "2011"
    "BUSSEG"   6.616 "CERES INC"                  767884 "2011"
    "BUSSEG"   5.371 "CERES INC"                  767884 "2012"
    "BUSSEG"   5.371 "CERES INC"                  767884 "2012"
    "BUSSEG"   5.371 "CERES INC"                  767884 "2012"
    "BUSSEG"   5.243 "CERES INC"                  767884 "2013"
    "BUSSEG"   5.243 "CERES INC"                  767884 "2013"
    "BUSSEG"   5.243 "CERES INC"                  767884 "2013"
    "BUSSEG"   2.404 "CERES INC"                  767884 "2014"
    "BUSSEG"   2.404 "CERES INC"                  767884 "2014"
    "BUSSEG"    2.72 "CERES INC"                  767884 "2015"
    "BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
    "BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
    "BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
    "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
    "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
    "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
    "BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
    "BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
    "BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
    "BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
    "BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
    "BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
    "BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
    "BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
    "BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
    "BUSSEG" 236.707 "EAGLE PHARMACEUTICALS INC"  827871 "2017"
    "BUSSEG" 236.707 "EAGLE PHARMACEUTICALS INC"  827871 "2017"
    "BUSSEG" 213.312 "EAGLE PHARMACEUTICALS INC"  827871 "2018"
    What I want to have is for every firm-year combination one row with the squared net sales from all segments that the firm has in that particular year, so that I can merge this together with the total sales later in order to compute the H-H index. Can anyone help me out with this?

    However, Compustat gives me this dataset, where there are different rows for each firm-year combination, but with the same amount of net sales... That is strange (or isn't it?).

    Best,
    Mixa

  • #2
    You have duplicates and ideally you should work out why that is and fix it before you do anything else. See

    Code:
    help duplicates
    There are many posts here on the Gini-Turing-Simpson-Hirschman-Herfindahl measure (the names aren't exhaustive of the history). I would search on Hirschman and/or Herfindahl (Hirfindahl is a typo).

    Comment


    • #3
      Hi Nick,

      thanks for your response. I read about the hhi index and I see that there is a command "hhi" to calculate the desired variable.

      However, I know that there are duplicates (in terms of Net Sales). Compustat gives me different rows with the same information for a segment... So are this really duplicates (and not different segments)?

      Nonetheless, if I drop the duplicates and just type hhi NetSales, is that enough?

      Comment


      • #4
        I see observations like

        Code:
        "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
        "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
        "BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
        and that is an example of duplicates for the same segment. Occasionally there are good reasons for them, but on the face of it they are a problem here. I can't say more about your dataset, as necessarily I don't know more about it than you do. Compustat is a familiar name but I know nothing about what it is or offers otherwise. I am not an economist, nor do I work with business or finance data. But on your information you should clean up with

        Code:
        duplicates drop * , force
        I don't know anything much about hhi (from SSC, as you are asked to explain: FAQ Advice #12). I didn't write it and don't remember ever using it. The help file implies that a by() option is compulsory but then gives an example in which it is not used. Conversely the code implies that by() is optional but a glance at the code suggests that the program will fail if it is not supplied.

        So, my impression is that the command is at best carelessly written and at worst likely to confuse considerably.

        There are plenty of other commands in this territory, but if this is the only thing you want it is best to write code directly. You can find the essence of this in two lines within the hhi code, one call to egen to calculate proportions and another call to calculate the sum of their squares.

        As for your data example all sales for each company and year are for the same segment the sum of squared probabilities is identically 1.

        Here is a dopey example for investment in the Grunfeld dataset. You can run this code as the dataset is bundled with your Stata.

        In your case you want, as I understand it, to do calculations separately for each company and year, but the code is still two lines. You just feed two variables to by().

        Code:
        . webuse grunfeld, clear
        
        . egen double p = pc(invest), by(year) prop
        
        . egen double hhi = total(p^2), by(year)
        
        . tabdisp year, c(hhi) format(%04.3f)
        
        ----------------------
             year |        hhi
        ----------+-----------
             1935 |      0.286
             1936 |      0.283
             1937 |      0.273
             1938 |      0.242
             1939 |      0.265
             1940 |      0.280
             1941 |      0.263
             1942 |      0.277
             1943 |      0.285
             1944 |      0.276
             1945 |      0.266
             1946 |      0.267
             1947 |      0.250
             1948 |      0.240
             1949 |      0.259
             1950 |      0.272
             1951 |      0.248
             1952 |      0.257
             1953 |      0.292
             1954 |      0.337
        ----------------------

        Comment


        • #5
          Hi Nick,

          Thanks again! However, I try to understand what you say but it is really hard I think...

          I have the following now. Please see the code below. I think I tackled the duplicates problem. Now the segment identifier distinguishes between different segments. Now what I need is for every firm, in every year, one value for the h-h index. And therefore, I should go from this dataset to a dataset with one row for every firm-year combination. Can you follow this? If you need more information, please ask!

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte SegmentIdentifier double NetSales str28 CompanyName long CIKNumber str4 fyear
           4   2891.1 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           4        0 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           4   2189.8 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           5  214.653 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           5  130.937 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           5  263.067 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           6    470.2 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           6        0 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           7  263.067 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           7      464 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           8   45.154 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           8    683.6 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           8   2189.8 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           9   33.211 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
          99  169.218 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
          99  326.919 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
          99  120.023 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
           4   33.211 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           4   72.013 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           5  130.937 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           5  120.023 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           6        0 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           7   96.732 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           8   24.038 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           8   49.291 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
           9    787.2 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
          99   96.732 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
          99   23.966 "CAESARS ENTERTAINMENT CORP"   858339 "2015"

          Comment


          • #6
            It's not hard to understand HHI. The formula is one line of algebra. The code is at most two lines. What is problematic is why you have observations like


            Code:
             
             99  169.218 "CAESARS ENTERTAINMENT CORP"   858339 "2014" 99  326.919 "CAESARS ENTERTAINMENT CORP"   858339 "2014" 99  120.023 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
            Perhaps you need to collapse first.

            Comment


            • #7
              Hi Nick,

              That is because these rows indicate different sorts of segments. They could either be business segments or geographic segments. For the HHI, it can be both so the example you are talking about is not about duplicates, just different kinds of segments for the same firm in the same year.

              Yeah I need to collapse indeed, I think I need to get to one row for one firm in one year, so that would mean I need to have the 'cumulative' sales numbers...
              Is there something you would recommend?

              Comment


              • #8
                It seems that you have another variable indicating geographical segment that you did not tell us about. Contrary to #7 it seems to me that you need one observation (not row, please) for each firm, year and business segment and that collapse (sum) is what your syntax will start with. But I can't be certain because no data example makes geographic segments clear to me.

                Comment


                • #9
                  Hi Mixami, I have used the Compustat Annual data, but not the segment data. But you need to get it down so that you have (a) 1 line per segment (and each segment has a unique number or id, and (b) that the total adds to 100% of revenue for the firm for the year. Then you can do the HHI like this:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float firm_id str9 company int year str20 segment long sales
                  1 "Apple"     2016 "Americas"              86613
                  1 "Apple"     2016 "Europe"                49952
                  1 "Apple"     2016 "Greater China"         48492
                  1 "Apple"     2016 "Japan"                 16928
                  1 "Apple"     2016 "Rest of Asia Pacific"  13654
                  1 "Apple"     2017 "Americas"              96600
                  1 "Apple"     2017 "Europe"                54938
                  1 "Apple"     2017 "Greater China"         44764
                  1 "Apple"     2017 "Japan"                 17733
                  1 "Apple"     2017 "Rest of Asia Pacific"  15199
                  1 "Apple"     2018 "Americas"             112093
                  1 "Apple"     2018 "Europe"                62420
                  1 "Apple"     2018 "Greater China"         51942
                  1 "Apple"     2018 "Japan"                 21733
                  1 "Apple"     2018 "Rest of Asia Pacific"  17407
                  2 "Test firm" 2014 "Segment 1"             10000
                  2 "Test firm" 2015 "Segment 1"              5000
                  2 "Test firm" 2015 "Segment 2"              5000
                  2 "Test firm" 2016 "Segment 1"              9000
                  2 "Test firm" 2016 "Segment 2"              1000
                  2 "Test firm" 2017 "Segment 1"              2500
                  2 "Test firm" 2017 "Segment 2"              2500
                  2 "Test firm" 2017 "Segment 3"              2500
                  2 "Test firm" 2017 "Segment 4"              2500
                  2 "Test firm" 2018 "Segment 1"              1000
                  2 "Test firm" 2018 "Segment 2"              2000
                  2 "Test firm" 2018 "Segment 3"              3000
                  2 "Test firm" 2018 "Segment 4"              4000
                  end
                  
                  format tot_sales  %9.0gc
                  list, sepby(company year) noobs
                  egen tot_sales = total(sales), by(firm_id year)
                  format tot_sales  %9.0gc
                  list, sepby(company year) abbrev(12) noobs
                  ssc install hhi  // in case you need to install it
                  hhi sales, by(firm_id year)  // creating using the hhi command
                  
                  * Creating hhi manually
                  gen sales_pc = sales / tot_sales
                  gen pc2 = sales_pc^2
                  egen hhi_man = total(pc2), by(firm_id year)
                  
                   +----------------------------------------------------------------------------------------------------------------------+
                    | firm_id     company   year                segment     sales   tot_sales   sales_pc   hhi_sales        pc2    hhi_man |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       1       Apple   2016               Americas    86,613     215,639   .4016574    .2757296   .1613287   .2757296 |
                    |       1       Apple   2016                 Europe    49,952     215,639   .2316464    .2757296   .0536601   .2757296 |
                    |       1       Apple   2016          Greater China    48,492     215,639   .2248758    .2757296   .0505691   .2757296 |
                    |       1       Apple   2016                  Japan    16,928     215,639   .0785016    .2757296   .0061625   .2757296 |
                    |       1       Apple   2016   Rest of Asia Pacific    13,654     215,639   .0633188    .2757296   .0040093   .2757296 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       1       Apple   2017               Americas    96,600     229,234   .4214035    .2835306   .1775809   .2835306 |
                    |       1       Apple   2017                 Europe    54,938     229,234    .239659    .2835306   .0574365   .2835306 |
                    |       1       Apple   2017          Greater China    44,764     229,234   .1952764    .2835306   .0381329   .2835306 |
                    |       1       Apple   2017                  Japan    17,733     229,234   .0773576    .2835306   .0059842   .2835306 |
                    |       1       Apple   2017   Rest of Asia Pacific    15,199     229,234   .0663034    .2835306   .0043961   .2835306 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       1       Apple   2018               Americas   112,093     265,595   .4220448    .2825942   .1781218   .2825942 |
                    |       1       Apple   2018                 Europe    62,420     265,595   .2350195    .2825942   .0552342   .2825942 |
                    |       1       Apple   2018          Greater China    51,942     265,595   .1955684    .2825942    .038247   .2825942 |
                    |       1       Apple   2018                  Japan    21,733     265,595   .0818276    .2825942   .0066958   .2825942 |
                    |       1       Apple   2018   Rest of Asia Pacific    17,407     265,595   .0655396    .2825942   .0042954   .2825942 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       2   Test firm   2014              Segment 1    10,000      10,000          1           1          1          1 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       2   Test firm   2015              Segment 1     5,000      10,000         .5          .5        .25         .5 |
                    |       2   Test firm   2015              Segment 2     5,000      10,000         .5          .5        .25         .5 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       2   Test firm   2016              Segment 1     9,000      10,000         .9    .8199999   .8099999   .8199999 |
                    |       2   Test firm   2016              Segment 2     1,000      10,000         .1    .8199999        .01   .8199999 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       2   Test firm   2017              Segment 1     2,500      10,000        .25         .25      .0625        .25 |
                    |       2   Test firm   2017              Segment 2     2,500      10,000        .25         .25      .0625        .25 |
                    |       2   Test firm   2017              Segment 3     2,500      10,000        .25         .25      .0625        .25 |
                    |       2   Test firm   2017              Segment 4     2,500      10,000        .25         .25      .0625        .25 |
                    |----------------------------------------------------------------------------------------------------------------------|
                    |       2   Test firm   2018              Segment 1     1,000      10,000         .1          .3        .01         .3 |
                    |       2   Test firm   2018              Segment 2     2,000      10,000         .2          .3        .04         .3 |
                    |       2   Test firm   2018              Segment 3     3,000      10,000         .3          .3        .09         .3 |
                    |       2   Test firm   2018              Segment 4     4,000      10,000         .4          .3        .16         .3 |
                    +----------------------------------------------------------------------------------------------------------------------+
                  Once you've gotten to this point, then I would collapse the data down so you have 1 line per firm-year (so that you can merge it into the data where you will run your actual regressions.)
                  Last edited by David Benson; 21 Nov 2019, 20:30.

                  Comment


                  • #10
                    If you want to create the HHI once your data is in wide format, you can do it like this. (I've taken this from the Statalist post here, although I made a mistake in obs #3 and #4 in that entry, so I have redone the calculations below:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str3 firm int year byte(nat_count nat1 nat2 nat3 nat4 nat5 nat6)
                    "TBL" 2007 3 16 2 1 . . .
                    "TBL" 2008 4 16 2 1 1 . .
                    "TBL" 2009 4 17 3 1 1 . .
                    "TBL" 2010 3 14 2 1 . . .
                    "TCC" 2007 4  6 2 1 . 1 .
                    "TCC" 2008 4  4 2 1 . 1 .
                    "TCC" 2009 4  3 2 1 1 . .
                    "TCC" 2010 4  2 2 2 2 . .
                    "TCC" 2011 6  1 2 2 2 2 1
                    end
                    
                    egen total = rowtotal(nat1-nat6)
                    order total, after(nat_count)
                    
                    list, noobs abbrev(12)
                    // The below calculated a diversity index based on the variety of nationalities on a firm's 
                    // Board of Directors 
                    
                      +---------------------------------------------------------------------------+
                      | firm   year   nat_count   total   nat1   nat2   nat3   nat4   nat5   nat6 |
                      |---------------------------------------------------------------------------|
                      |  TBL   2007           3      19     16      2      1      .      .      . |
                      |  TBL   2008           4      20     16      2      1      1      .      . |
                      |  TBL   2009           4      22     17      3      1      1      .      . |
                      |  TBL   2010           3      17     14      2      1      .      .      . |
                      |  TCC   2007           4      10      6      2      1      .      1      . |
                      |---------------------------------------------------------------------------|
                      |  TCC   2008           4       8      4      2      1      .      1      . |
                      |  TCC   2009           4       7      3      2      1      1      .      . |
                      |  TCC   2010           4       8      2      2      2      2      .      . |
                      |  TCC   2011           6      10      1      2      2      2      2      1 |
                      +---------------------------------------------------------------------------+
                    
                    
                    * Creating the proportions of each nationality squared
                    forvalues i = 1/6 {
                    gen p`i' = (nat`i' / total )^2
                    }
                    
                    egen sum_squares = rowtotal(p1-p6)  // sum of squared percentages, (HHI)
                    
                    list firm year p1- p6 sum_squares, sepby(firm) noobs
                    
                     +------------------------------------------------------------------------------------+
                      | firm   year         p1         p2         p3         p4        p5    p6   sum_sq~s |
                      |------------------------------------------------------------------------------------|
                      |  TBL   2007   .7091413   .0110803   .0027701          .         .     .   .7229916 |
                      |  TBL   2008        .64        .01      .0025      .0025         .     .       .655 |
                      |  TBL   2009   .5971074    .018595   .0020661   .0020661         .     .   .6198347 |
                      |  TBL   2010   .6782007   .0138408   .0034602          .         .     .   .6955018 |
                      |------------------------------------------------------------------------------------|
                      |  TCC   2007        .36        .04        .01          .       .01     .        .42 |
                      |  TCC   2008        .25      .0625    .015625          .   .015625     .     .34375 |
                      |  TCC   2009   .1836735   .0816327   .0204082   .0204082         .     .   .3061224 |
                      |  TCC   2010      .0625      .0625      .0625      .0625         .     .        .25 |
                      |  TCC   2011        .01        .04        .04        .04       .04   .01        .18 |
                      +------------------------------------------------------------------------------------+

                    Comment

                    Working...
                    X