A question was asked and closed on Stack Overflow (SO) about HHIs. https://stackoverflow.com/questions/...lude-each-firm If you think Statalist is severe, try other sites. For once, I was not a voter to close.
Already at a loss? HHI will be recognised by some as connoting Herfindahl and Hirschman (or vice versa) and more importantly the idea of measuring concentration of anything divided into proportional shares. At its simplest the measure (index) is just the sum of those proportions squared. To think how this behaves, consider extreme cases. If everything is in one category, there is just one positive proportion 1 and the sum of squared proportions is 1. If amounts or counts (sales, people, birds, bees, whatever) are divided equally among k categories, then the measure is k * (1/k)^2 = 1/k which tends towards 0 for arbitrarily large k. From that you can see that the reciprocal of this measure has an interpretation as an equivalent number of equally common categories and that the complement (1 minus the measure) measures the opposite of concentration (diversity, or whatever else you want to call that).
This idea goes back at least about a century and has independently been invented (discovered, if you will) in several sciences, including mainstream statistics, despite frequent and unfortunate disciplinary myopia leading many to presume that the idea was first thought up in one's own field and so should be named for people in that field who earlier made a fuss about it. (There is a regrettable note by Hirschman claiming priority over Herfindahl, regardless of the fact that he was far from first either. I.J. Good once wrote that any competent statistician would take 2 seconds to come up with the formula, which seems to me a little exaggerated the other way, although I am not a statistician.)
All that said, the question on SO was about calculating this measure (multiplied by 10000 for some bizarre reason) for sales for the other firms in the same market, i.e. excluding in turn each firm in the same market. Long-term readers of this forum are likely to recognise this kind of problem as a party piece for rangestat (SSC) written by Robert Picard and friends. Here is toy data, extending the SO example, some code and some results reproducing the hand calculations of the OP.
The check using entropyetc (SSC) is not much of a check insofar as I wrote that too, but it flags another discussion of this territory, and use of the name Simpson, which is pretty much standard in ecology (the same Simpson as is named in Simpson's paradox) .
Already at a loss? HHI will be recognised by some as connoting Herfindahl and Hirschman (or vice versa) and more importantly the idea of measuring concentration of anything divided into proportional shares. At its simplest the measure (index) is just the sum of those proportions squared. To think how this behaves, consider extreme cases. If everything is in one category, there is just one positive proportion 1 and the sum of squared proportions is 1. If amounts or counts (sales, people, birds, bees, whatever) are divided equally among k categories, then the measure is k * (1/k)^2 = 1/k which tends towards 0 for arbitrarily large k. From that you can see that the reciprocal of this measure has an interpretation as an equivalent number of equally common categories and that the complement (1 minus the measure) measures the opposite of concentration (diversity, or whatever else you want to call that).
This idea goes back at least about a century and has independently been invented (discovered, if you will) in several sciences, including mainstream statistics, despite frequent and unfortunate disciplinary myopia leading many to presume that the idea was first thought up in one's own field and so should be named for people in that field who earlier made a fuss about it. (There is a regrettable note by Hirschman claiming priority over Herfindahl, regardless of the fact that he was far from first either. I.J. Good once wrote that any competent statistician would take 2 seconds to come up with the formula, which seems to me a little exaggerated the other way, although I am not a statistician.)
All that said, the question on SO was about calculating this measure (multiplied by 10000 for some bizarre reason) for sales for the other firms in the same market, i.e. excluding in turn each firm in the same market. Long-term readers of this forum are likely to recognise this kind of problem as a party piece for rangestat (SSC) written by Robert Picard and friends. Here is toy data, extending the SO example, some code and some results reproducing the hand calculations of the OP.
The check using entropyetc (SSC) is not much of a check insofar as I wrote that too, but it flags another discussion of this territory, and use of the name Simpson, which is pretty much standard in ecology (the same Simpson as is named in Simpson's paradox) .
Code:
clear input str1 market firm sales A 1 10 A 2 20 A 3 50 B 1 5 B 2 15 B 4 80 end mata mata clear mata : real scalar matchprob(real colvector p) { p = select(p, (p :< .)) if (rows(p) == 0) return(.) p = p / sum(p) return(sum(p:^2)) } end egen id = group(market), label rangestat (matchprob) sales, int(id 0 0) rename matchprob1 standard rangestat (matchprob) sales, int(id 0 0) excludeself rename matchprob1 others format standard others %6.5f list, sepby(market) +-------------------------------------------------+ | market firm sales id standard others | |-------------------------------------------------| 1. | A 1 10 A 0.46875 0.59184 | 2. | A 2 20 A 0.46875 0.72222 | 3. | A 3 50 A 0.46875 0.55556 | |-------------------------------------------------| 4. | B 1 5 B 0.66500 0.73407 | 5. | B 2 15 B 0.66500 0.88927 | 6. | B 4 80 B 0.66500 0.62500 | +-------------------------------------------------+ entropyetc firm [w=sales] , by(market) (analytic weights assumed) ---------------------------------------------------------------------- Group | Shannon H exp(H) Simpson 1/Simpson dissim. ----------+----------------------------------------------------------- A | 0.900 2.460 0.469 2.133 0.375 B | 0.613 1.846 0.665 1.504 0.550 ----------------------------------------------------------------------
Comment