Concentration measures for the other members of a set

Nick Cox

Join Date: Mar 2014

Posts: 35696
#1

Concentration measures for the other members of a set

09 Aug 2018, 03:38

A question was asked and closed on Stack Overflow (SO) about HHIs. https://stackoverflow.com/questions/...lude-each-firm If you think Statalist is severe, try other sites. For once, I was not a voter to close.

Already at a loss? HHI will be recognised by some as connoting Herfindahl and Hirschman (or vice versa) and more importantly the idea of measuring concentration of anything divided into proportional shares. At its simplest the measure (index) is just the sum of those proportions squared. To think how this behaves, consider extreme cases. If everything is in one category, there is just one positive proportion 1 and the sum of squared proportions is 1. If amounts or counts (sales, people, birds, bees, whatever) are divided equally among k categories, then the measure is k * (1/k)^2 = 1/k which tends towards 0 for arbitrarily large k. From that you can see that the reciprocal of this measure has an interpretation as an equivalent number of equally common categories and that the complement (1 minus the measure) measures the opposite of concentration (diversity, or whatever else you want to call that).

This idea goes back at least about a century and has independently been invented (discovered, if you will) in several sciences, including mainstream statistics, despite frequent and unfortunate disciplinary myopia leading many to presume that the idea was first thought up in one's own field and so should be named for people in that field who earlier made a fuss about it. (There is a regrettable note by Hirschman claiming priority over Herfindahl, regardless of the fact that he was far from first either. I.J. Good once wrote that any competent statistician would take 2 seconds to come up with the formula, which seems to me a little exaggerated the other way, although I am not a statistician.)

All that said, the question on SO was about calculating this measure (multiplied by 10000 for some bizarre reason) for sales for the other firms in the same market, i.e. excluding in turn each firm in the same market. Long-term readers of this forum are likely to recognise this kind of problem as a party piece for rangestat (SSC) written by Robert Picard and friends. Here is toy data, extending the SO example, some code and some results reproducing the hand calculations of the OP.

The check using entropyetc (SSC) is not much of a check insofar as I wrote that too, but it flags another discussion of this territory, and use of the name Simpson, which is pretty much standard in ecology (the same Simpson as is named in Simpson's paradox) .

Code:

clear input str1 market firm sales A 1 10 A 2 20 A 3 50 B 1 5 B 2 15 B 4 80 end mata mata clear mata : real scalar matchprob(real colvector p) { p = select(p, (p :< .)) if (rows(p) == 0) return(.) p = p / sum(p) return(sum(p:^2)) } end egen id = group(market), label rangestat (matchprob) sales, int(id 0 0) rename matchprob1 standard rangestat (matchprob) sales, int(id 0 0) excludeself rename matchprob1 others format standard others %6.5f list, sepby(market) +-------------------------------------------------+ | market firm sales id standard others | |-------------------------------------------------| 1. | A 1 10 A 0.46875 0.59184 | 2. | A 2 20 A 0.46875 0.72222 | 3. | A 3 50 A 0.46875 0.55556 | |-------------------------------------------------| 4. | B 1 5 B 0.66500 0.73407 | 5. | B 2 15 B 0.66500 0.88927 | 6. | B 4 80 B 0.66500 0.62500 | +-------------------------------------------------+ entropyetc firm [w=sales] , by(market) (analytic weights assumed) ---------------------------------------------------------------------- Group | Shannon H exp(H) Simpson 1/Simpson dissim. ----------+----------------------------------------------------------- A | 0.900 2.460 0.469 2.133 0.375 B | 0.613 1.846 0.665 1.504 0.550 ----------------------------------------------------------------------

Last edited by Nick Cox; 09 Aug 2018, 03:43.
Tags: None

3 likes

Dirk Enzmann

Join Date: Apr 2014
Posts: 536

09 Aug 2018, 04:45

You can obtain the same H and Simpson measure (here as 1-GV with GV = generalized variance, also also known as the Blau Index (Blau, 1977) or the Hirschman-Herfindahl Index (HHI)) also by using divcat (available on SSC) as follows:

Code:

clear
input str1 market firm sales
A 1 10
A 2 20
A 3 50
B 1 5
B 2 15
B 4 80
end

egen id = group(market), label

bys market: divcat firm [aw=sales], base(e)

The result will be

Code:

Measures of Diversity by market

-------------------------------------------------------------------------
                 | categs      GV     NGV       H      NH      RQ       n
-----------------+-------------------------------------------------------
               A |      3   0.531   0.797   0.900   0.819   0.828       3
               B |      3   0.335   0.503   0.613   0.558   0.598       3
-------------------------------------------------------------------------
Note: Entropy (H) is calculated using the logarithm to base e

Reference:

Blau, P. M. (1977). Inequality and Heterogeneity. New York: Free Press.

Last edited by Dirk Enzmann; 09 Aug 2018, 04:51.

Comment

Romalpa Akzo

Join Date: Oct 2017
Posts: 369

09 Aug 2018, 06:25

Direct calculation seems also well serving.

Code:

bys market: egen SumSale=total(sales)
bys market: egen SumSqrSale=total(sales^2)

gen HHI = SumSqrSale/(SumSales)^2
gen CHHI = (SumSqrSale-sales^2)/(SumSales-sales)^2

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#4

09 Aug 2018, 07:33

Romalpa Akzo Yes, excellent. Not robust to missings, but that is a detail.
Comment

Announcement