Hirfindahl-Hirschman calculation

Mixami Peroli

Join Date: Nov 2019
Posts: 7

Hirfindahl-Hirschman calculation

19 Nov 2019, 03:37

Hi Guys,

For my research I need firm complexity as a control variable.

The way in which well-known authors proxy for firm complexity is with the (revenue-based) Hirfindahl-Hirschman index.

It is calculated as "the sum of the squares of each segment's sales as a percentage of the total firm sales".

Now my dataset, retrieved from Compustat, looks like this (for two firms from my sample):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 SegmentType double NetSales str28 CompanyName long CIKNumber str4 fyear
"BUSSEG"   6.614 "CERES INC"                  767884 "2010"
"BUSSEG"   6.616 "CERES INC"                  767884 "2011"
"BUSSEG"   6.616 "CERES INC"                  767884 "2011"
"BUSSEG"   6.616 "CERES INC"                  767884 "2011"
"BUSSEG"   5.371 "CERES INC"                  767884 "2012"
"BUSSEG"   5.371 "CERES INC"                  767884 "2012"
"BUSSEG"   5.371 "CERES INC"                  767884 "2012"
"BUSSEG"   5.243 "CERES INC"                  767884 "2013"
"BUSSEG"   5.243 "CERES INC"                  767884 "2013"
"BUSSEG"   5.243 "CERES INC"                  767884 "2013"
"BUSSEG"   2.404 "CERES INC"                  767884 "2014"
"BUSSEG"   2.404 "CERES INC"                  767884 "2014"
"BUSSEG"    2.72 "CERES INC"                  767884 "2015"
"BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
"BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
"BUSSEG"   2.539 "EAGLE PHARMACEUTICALS INC"  827871 "2012"
"BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
"BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
"BUSSEG"  13.679 "EAGLE PHARMACEUTICALS INC"  827871 "2013"
"BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
"BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
"BUSSEG"  19.099 "EAGLE PHARMACEUTICALS INC"  827871 "2014"
"BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
"BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
"BUSSEG"  66.227 "EAGLE PHARMACEUTICALS INC"  827871 "2015"
"BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
"BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
"BUSSEG" 189.482 "EAGLE PHARMACEUTICALS INC"  827871 "2016"
"BUSSEG" 236.707 "EAGLE PHARMACEUTICALS INC"  827871 "2017"
"BUSSEG" 236.707 "EAGLE PHARMACEUTICALS INC"  827871 "2017"
"BUSSEG" 213.312 "EAGLE PHARMACEUTICALS INC"  827871 "2018"

What I want to have is for every firm-year combination one row with the squared net sales from all segments that the firm has in that particular year, so that I can merge this together with the total sales later in order to compute the H-H index. Can anyone help me out with this?

However, Compustat gives me this dataset, where there are different rows for each firm-year combination, but with the same amount of net sales... That is strange (or isn't it?).

Best,
Mixa

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

19 Nov 2019, 03:55

You have duplicates and ideally you should work out why that is and fix it before you do anything else. See

Code:

help duplicates

There are many posts here on the Gini-Turing-Simpson-Hirschman-Herfindahl measure (the names aren't exhaustive of the history). I would search on Hirschman and/or Herfindahl (Hirfindahl is a typo).
Comment
Mixami Peroli

Join Date: Nov 2019

Posts: 7
#3

19 Nov 2019, 04:03

Hi Nick,

thanks for your response. I read about the hhi index and I see that there is a command "hhi" to calculate the desired variable.

However, I know that there are duplicates (in terms of Net Sales). Compustat gives me different rows with the same information for a segment... So are this really duplicates (and not different segments)?

Nonetheless, if I drop the duplicates and just type hhi NetSales, is that enough?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

19 Nov 2019, 04:50

I see observations like

Code:

"BUSSEG" 13.679 "EAGLE PHARMACEUTICALS INC" 827871 "2013" "BUSSEG" 13.679 "EAGLE PHARMACEUTICALS INC" 827871 "2013" "BUSSEG" 13.679 "EAGLE PHARMACEUTICALS INC" 827871 "2013"

and that is an example of duplicates for the same segment. Occasionally there are good reasons for them, but on the face of it they are a problem here. I can't say more about your dataset, as necessarily I don't know more about it than you do. Compustat is a familiar name but I know nothing about what it is or offers otherwise. I am not an economist, nor do I work with business or finance data. But on your information you should clean up with

Code:

duplicates drop * , force

I don't know anything much about hhi (from SSC, as you are asked to explain: FAQ Advice #12). I didn't write it and don't remember ever using it. The help file implies that a by() option is compulsory but then gives an example in which it is not used. Conversely the code implies that by() is optional but a glance at the code suggests that the program will fail if it is not supplied.

So, my impression is that the command is at best carelessly written and at worst likely to confuse considerably.

There are plenty of other commands in this territory, but if this is the only thing you want it is best to write code directly. You can find the essence of this in two lines within the hhi code, one call to egen to calculate proportions and another call to calculate the sum of their squares.

As for your data example all sales for each company and year are for the same segment the sum of squared probabilities is identically 1.

Here is a dopey example for investment in the Grunfeld dataset. You can run this code as the dataset is bundled with your Stata.

In your case you want, as I understand it, to do calculations separately for each company and year, but the code is still two lines. You just feed two variables to by().

Code:

. webuse grunfeld, clear . egen double p = pc(invest), by(year) prop . egen double hhi = total(p^2), by(year) . tabdisp year, c(hhi) format(%04.3f) ---------------------- year | hhi ----------+----------- 1935 | 0.286 1936 | 0.283 1937 | 0.273 1938 | 0.242 1939 | 0.265 1940 | 0.280 1941 | 0.263 1942 | 0.277 1943 | 0.285 1944 | 0.276 1945 | 0.266 1946 | 0.267 1947 | 0.250 1948 | 0.240 1949 | 0.259 1950 | 0.272 1951 | 0.248 1952 | 0.257 1953 | 0.292 1954 | 0.337 ----------------------
Comment

Mixami Peroli

Join Date: Nov 2019
Posts: 7

19 Nov 2019, 06:34

Hi Nick,

Thanks again! However, I try to understand what you say but it is really hard I think...

I have the following now. Please see the code below. I think I tackled the duplicates problem. Now the segment identifier distinguishes between different segments. Now what I need is for every firm, in every year, one value for the h-h index. And therefore, I should go from this dataset to a dataset with one row for every firm-year combination. Can you follow this? If you need more information, please ask!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte SegmentIdentifier double NetSales str28 CompanyName long CIKNumber str4 fyear
 4   2891.1 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 4        0 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 4   2189.8 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 5  214.653 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 5  130.937 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 5  263.067 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 6    470.2 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 6        0 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 7  263.067 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 7      464 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 8   45.154 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 8    683.6 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 8   2189.8 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 9   33.211 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
99  169.218 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
99  326.919 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
99  120.023 "CAESARS ENTERTAINMENT CORP"   858339 "2014"
 4   33.211 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 4   72.013 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 5  130.937 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 5  120.023 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 6        0 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 7   96.732 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 8   24.038 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 8   49.291 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
 9    787.2 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
99   96.732 "CAESARS ENTERTAINMENT CORP"   858339 "2015"
99   23.966 "CAESARS ENTERTAINMENT CORP"   858339 "2015"

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

19 Nov 2019, 06:48

It's not hard to understand HHI. The formula is one line of algebra. The code is at most two lines. What is problematic is why you have observations like

Code:

99 169.218 "CAESARS ENTERTAINMENT CORP" 858339 "2014" 99 326.919 "CAESARS ENTERTAINMENT CORP" 858339 "2014" 99 120.023 "CAESARS ENTERTAINMENT CORP" 858339 "2014"

Perhaps you need to collapse first.
Comment
Mixami Peroli

Join Date: Nov 2019

Posts: 7
#7

19 Nov 2019, 07:00

Hi Nick,

That is because these rows indicate different sorts of segments. They could either be business segments or geographic segments. For the HHI, it can be both so the example you are talking about is not about duplicates, just different kinds of segments for the same firm in the same year.

Yeah I need to collapse indeed, I think I need to get to one row for one firm in one year, so that would mean I need to have the 'cumulative' sales numbers...
Is there something you would recommend?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

21 Nov 2019, 02:52

It seems that you have another variable indicating geographical segment that you did not tell us about. Contrary to #7 it seems to me that you need one observation (not row, please) for each firm, year and business segment and that collapse (sum) is what your syntax will start with. But I can't be certain because no data example makes geographic segments clear to me.
Comment

David Benson

Join Date: Oct 2018
Posts: 489

21 Nov 2019, 19:27

Hi Mixami, I have used the Compustat Annual data, but not the segment data. But you need to get it down so that you have (a) 1 line per segment (and each segment has a unique number or id, and (b) that the total adds to 100% of revenue for the firm for the year. Then you can do the HHI like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float firm_id str9 company int year str20 segment long sales
1 "Apple"     2016 "Americas"              86613
1 "Apple"     2016 "Europe"                49952
1 "Apple"     2016 "Greater China"         48492
1 "Apple"     2016 "Japan"                 16928
1 "Apple"     2016 "Rest of Asia Pacific"  13654
1 "Apple"     2017 "Americas"              96600
1 "Apple"     2017 "Europe"                54938
1 "Apple"     2017 "Greater China"         44764
1 "Apple"     2017 "Japan"                 17733
1 "Apple"     2017 "Rest of Asia Pacific"  15199
1 "Apple"     2018 "Americas"             112093
1 "Apple"     2018 "Europe"                62420
1 "Apple"     2018 "Greater China"         51942
1 "Apple"     2018 "Japan"                 21733
1 "Apple"     2018 "Rest of Asia Pacific"  17407
2 "Test firm" 2014 "Segment 1"             10000
2 "Test firm" 2015 "Segment 1"              5000
2 "Test firm" 2015 "Segment 2"              5000
2 "Test firm" 2016 "Segment 1"              9000
2 "Test firm" 2016 "Segment 2"              1000
2 "Test firm" 2017 "Segment 1"              2500
2 "Test firm" 2017 "Segment 2"              2500
2 "Test firm" 2017 "Segment 3"              2500
2 "Test firm" 2017 "Segment 4"              2500
2 "Test firm" 2018 "Segment 1"              1000
2 "Test firm" 2018 "Segment 2"              2000
2 "Test firm" 2018 "Segment 3"              3000
2 "Test firm" 2018 "Segment 4"              4000
end

format tot_sales  %9.0gc
list, sepby(company year) noobs
egen tot_sales = total(sales), by(firm_id year)
format tot_sales  %9.0gc
list, sepby(company year) abbrev(12) noobs
ssc install hhi  // in case you need to install it
hhi sales, by(firm_id year)  // creating using the hhi command

* Creating hhi manually
gen sales_pc = sales / tot_sales
gen pc2 = sales_pc^2
egen hhi_man = total(pc2), by(firm_id year)

 +----------------------------------------------------------------------------------------------------------------------+
  | firm_id     company   year                segment     sales   tot_sales   sales_pc   hhi_sales        pc2    hhi_man |
  |----------------------------------------------------------------------------------------------------------------------|
  |       1       Apple   2016               Americas    86,613     215,639   .4016574    .2757296   .1613287   .2757296 |
  |       1       Apple   2016                 Europe    49,952     215,639   .2316464    .2757296   .0536601   .2757296 |
  |       1       Apple   2016          Greater China    48,492     215,639   .2248758    .2757296   .0505691   .2757296 |
  |       1       Apple   2016                  Japan    16,928     215,639   .0785016    .2757296   .0061625   .2757296 |
  |       1       Apple   2016   Rest of Asia Pacific    13,654     215,639   .0633188    .2757296   .0040093   .2757296 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       1       Apple   2017               Americas    96,600     229,234   .4214035    .2835306   .1775809   .2835306 |
  |       1       Apple   2017                 Europe    54,938     229,234    .239659    .2835306   .0574365   .2835306 |
  |       1       Apple   2017          Greater China    44,764     229,234   .1952764    .2835306   .0381329   .2835306 |
  |       1       Apple   2017                  Japan    17,733     229,234   .0773576    .2835306   .0059842   .2835306 |
  |       1       Apple   2017   Rest of Asia Pacific    15,199     229,234   .0663034    .2835306   .0043961   .2835306 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       1       Apple   2018               Americas   112,093     265,595   .4220448    .2825942   .1781218   .2825942 |
  |       1       Apple   2018                 Europe    62,420     265,595   .2350195    .2825942   .0552342   .2825942 |
  |       1       Apple   2018          Greater China    51,942     265,595   .1955684    .2825942    .038247   .2825942 |
  |       1       Apple   2018                  Japan    21,733     265,595   .0818276    .2825942   .0066958   .2825942 |
  |       1       Apple   2018   Rest of Asia Pacific    17,407     265,595   .0655396    .2825942   .0042954   .2825942 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       2   Test firm   2014              Segment 1    10,000      10,000          1           1          1          1 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       2   Test firm   2015              Segment 1     5,000      10,000         .5          .5        .25         .5 |
  |       2   Test firm   2015              Segment 2     5,000      10,000         .5          .5        .25         .5 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       2   Test firm   2016              Segment 1     9,000      10,000         .9    .8199999   .8099999   .8199999 |
  |       2   Test firm   2016              Segment 2     1,000      10,000         .1    .8199999        .01   .8199999 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       2   Test firm   2017              Segment 1     2,500      10,000        .25         .25      .0625        .25 |
  |       2   Test firm   2017              Segment 2     2,500      10,000        .25         .25      .0625        .25 |
  |       2   Test firm   2017              Segment 3     2,500      10,000        .25         .25      .0625        .25 |
  |       2   Test firm   2017              Segment 4     2,500      10,000        .25         .25      .0625        .25 |
  |----------------------------------------------------------------------------------------------------------------------|
  |       2   Test firm   2018              Segment 1     1,000      10,000         .1          .3        .01         .3 |
  |       2   Test firm   2018              Segment 2     2,000      10,000         .2          .3        .04         .3 |
  |       2   Test firm   2018              Segment 3     3,000      10,000         .3          .3        .09         .3 |
  |       2   Test firm   2018              Segment 4     4,000      10,000         .4          .3        .16         .3 |
  +----------------------------------------------------------------------------------------------------------------------+

Once you've gotten to this point, then I would collapse the data down so you have 1 line per firm-year (so that you can merge it into the data where you will run your actual regressions.)

Last edited by David Benson; 21 Nov 2019, 19:30.

Comment

David Benson

Join Date: Oct 2018
Posts: 489

#10

21 Nov 2019, 19:49

If you want to create the HHI once your data is in wide format, you can do it like this. (I've taken this from the Statalist post here, although I made a mistake in obs #3 and #4 in that entry, so I have redone the calculations below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 firm int year byte(nat_count nat1 nat2 nat3 nat4 nat5 nat6)
"TBL" 2007 3 16 2 1 . . .
"TBL" 2008 4 16 2 1 1 . .
"TBL" 2009 4 17 3 1 1 . .
"TBL" 2010 3 14 2 1 . . .
"TCC" 2007 4  6 2 1 . 1 .
"TCC" 2008 4  4 2 1 . 1 .
"TCC" 2009 4  3 2 1 1 . .
"TCC" 2010 4  2 2 2 2 . .
"TCC" 2011 6  1 2 2 2 2 1
end

egen total = rowtotal(nat1-nat6)
order total, after(nat_count)

list, noobs abbrev(12)
// The below calculated a diversity index based on the variety of nationalities on a firm's 
// Board of Directors 

  +---------------------------------------------------------------------------+
  | firm   year   nat_count   total   nat1   nat2   nat3   nat4   nat5   nat6 |
  |---------------------------------------------------------------------------|
  |  TBL   2007           3      19     16      2      1      .      .      . |
  |  TBL   2008           4      20     16      2      1      1      .      . |
  |  TBL   2009           4      22     17      3      1      1      .      . |
  |  TBL   2010           3      17     14      2      1      .      .      . |
  |  TCC   2007           4      10      6      2      1      .      1      . |
  |---------------------------------------------------------------------------|
  |  TCC   2008           4       8      4      2      1      .      1      . |
  |  TCC   2009           4       7      3      2      1      1      .      . |
  |  TCC   2010           4       8      2      2      2      2      .      . |
  |  TCC   2011           6      10      1      2      2      2      2      1 |
  +---------------------------------------------------------------------------+


* Creating the proportions of each nationality squared
forvalues i = 1/6 {
gen p`i' = (nat`i' / total )^2
}

egen sum_squares = rowtotal(p1-p6)  // sum of squared percentages, (HHI)

list firm year p1- p6 sum_squares, sepby(firm) noobs

 +------------------------------------------------------------------------------------+
  | firm   year         p1         p2         p3         p4        p5    p6   sum_sq~s |
  |------------------------------------------------------------------------------------|
  |  TBL   2007   .7091413   .0110803   .0027701          .         .     .   .7229916 |
  |  TBL   2008        .64        .01      .0025      .0025         .     .       .655 |
  |  TBL   2009   .5971074    .018595   .0020661   .0020661         .     .   .6198347 |
  |  TBL   2010   .6782007   .0138408   .0034602          .         .     .   .6955018 |
  |------------------------------------------------------------------------------------|
  |  TCC   2007        .36        .04        .01          .       .01     .        .42 |
  |  TCC   2008        .25      .0625    .015625          .   .015625     .     .34375 |
  |  TCC   2009   .1836735   .0816327   .0204082   .0204082         .     .   .3061224 |
  |  TCC   2010      .0625      .0625      .0625      .0625         .     .        .25 |
  |  TCC   2011        .01        .04        .04        .04       .04   .01        .18 |
  +------------------------------------------------------------------------------------+

Announcement

Hirfindahl-Hirschman calculation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment