Herfindahl - Hirschmann Index calculation with 2 variables

Sanne Jansen

Join Date: Jun 2021

Posts: 4
#46

09 Jun 2021, 03:45

Well I think it works now and I have the right results I think. I only have one final question. All the results are less than 1 so 0.xxxx. When I read about the HHI I find numbers which are higher than 1000. How is this possible and can I just multiply my results bij 1000, or does it not work that way?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35412
#47

09 Jun 2021, 05:41

I think it was working before -- hhi was doing what the definitions imply. Otherwise, it comes down to your chosen units. A sum of squared probabilities can't exceed 1, if you wish to report that as a percent of possible it can't exceed 100 and if you wish to report it as a sum of squared percentages the upper limit is 10000. Where the factor 1000 comes from I can't say.
Comment
Sanne Jansen

Join Date: Jun 2021

Posts: 4
#48

09 Jun 2021, 05:48

I am sorry I ment multiply by 10000. I would like to report it as a sum of squared percentages so this means then multiply my results by 10000 right?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35412
#49

09 Jun 2021, 06:15

That's correct if it's what you want.
Comment
Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#50

07 Feb 2022, 04:12

Hello mohina saxena . I wanted to ask you for your dataset have you extracted net sales as it is or have squared the original value of net sales before executing HHI
Comment
Anuradha Saikia

Join Date: Aug 2020

Posts: 153
#51

07 Feb 2022, 04:16

Hello mohina saxena . I wanted to ask you for your dataset have you extracted net sales as it is or have squared the original value of net sales before executing HHI
Comment
Kleon Marenas

Join Date: Apr 2020

Posts: 19
#52

31 Oct 2023, 05:08

Dear All,

I need to calculate the HHI to see if there is political fragmentation. More specifically I want to calculate this:

However, in my dataset I have the data like this:

ID panel_ID YEAR BoC SEAT
[board composition (the number of parties) (seats of the board)
1 1 2015 4 27
2 1 2016 4 27
3 2 2015 5 41
4 2 2016 5 41
5 3 2015 7 41
6 3 2016 7 41
7 3 2017 8 41

I so confused.
In order to measure the herfindahl index will I use the following code?
HHI SEAT, by (BoC year)

Thank you
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35412

#53

31 Oct 2023, 06:27

year or YEAR (whichever it is) explains itself, The rest of your variables are not so clear to me.

Cleaning up your data example (thanks) as an example for dataex

-- please read and follow https://www.statalist.org/forums/help#stata in giving data examples --

does not help me, as given BoC YEAR all subsets are single observations so the Gini-Friedman-Turing-Simpson-Hirschman-Herfindahl-Good-Blau index (*)(+) is identically 1 for each subset, regardless of details.

Code:

clear 
input ID panel_ID YEAR BoC SEAT
1 1 2015 4 27
2 1 2016 4 27
3 2 2015 5 41
4 2 2016 5 41
5 3 2015 7 41
6 3 2016 7 41
7 3 2017 8 41
end 

list, sepby(BoC YEAR)

     +-----------------------------------+
     | ID   panel_ID   YEAR   BoC   SEAT |
     |-----------------------------------|
  1. |  1          1   2015     4     27 |
     |-----------------------------------|
  2. |  2          1   2016     4     27 |
     |-----------------------------------|
  3. |  3          2   2015     5     41 |
     |-----------------------------------|
  4. |  4          2   2016     5     41 |
     |-----------------------------------|
  5. |  5          3   2015     7     41 |
     |-----------------------------------|
  6. |  6          3   2016     7     41 |
     |-----------------------------------|
  7. |  7          3   2017     8     41 |
     +-----------------------------------+

Let's back up and I will guess what you want using an example.

In given elections in given years parties A, B, C had seats as follows:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year str2 party float seats
2019 "A"  20
2019 "B"  30
2019 "C"  50
2023 "A"  25
2023 "B"  35
2023 "C " 40
end

All community-contributed commands in this territory, mine too, are just convenience wrappers. Each concise definition points to concise code using only official commands.

The total number of seats should be calculated first and then the index follows.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float year str2 party float seats
2019 "A"  20
2019 "B"  30
2019 "C"  50
2023 "A"  25
2023 "B"  35
2023 "C " 40
end

bysort year : egen total_seats = total(seats)

by year : egen wanted = total((seats/total_seats)^2)

list, sepby(year)

     +------------------------------------------+
     | year   party   seats   total_~s   wanted |
     |------------------------------------------|
  1. | 2019       A      20        100      .38 |
  2. | 2019       B      30        100      .38 |
  3. | 2019       C      50        100      .38 |
     |------------------------------------------|
  4. | 2023       A      25        100     .345 |
  5. | 2023       B      35        100     .345 |
  6. | 2023      C       40        100     .345 |
     +------------------------------------------+

I hope that helps. If it doesn't, I think you'll need to give a better data example and more explanation of what you want.

(*) The list of authors should not be assumed to be complete.
(+) Some people use 1 minus the quantity in #52, and Hirschman had a square root, but what the heck! It's the same idea in different flavours, so long as you're clear which way you like it.

Comment

Hamid muili

Join Date: Aug 2020

Posts: 92
#54

31 Dec 2024, 03:54

Assuming I have a datasets with different energy sources for different countries over a certain period, can i still use entropyetc to generate a HHI for each country similar to what was used in article by Rubio-varas and Munoz-delgado 2019." The energy mix concentration index. " 7 variables for different European countries over years
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35412
#55

31 Dec 2024, 04:59

#54 It is best please to give the precise formula you want to use. Bare name and date references are deprecated here and in any case similar is a weasel word unless you mean identical. You should be familiar with the request here to read and act on the FAQ Advice.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35412
#56

01 Jan 2025, 04:16

#55 seemed a fair request to me, but at any rate here is a fuller reference.

Rubio-Varas, Mar and Muñoz-Delgado, Beatriz. 2019.
The Energy Mix Concentration Index (EMCI): Methodological considerations for implementation.
MethodsX 6: 1228-1237
https://doi.org/10.1016/j.mex.2019.05.023
https://www.sciencedirect.com/scienc...15016119301438

So, it turns out that all we're talking about is a sum of squared probabilities, and the only difficulty evident is any tendency to create pointless new names for well-known existing devices.

In Stata terms whether entropyetc is easy to apply depends on your data layout.

That said, a point made in this thread -- see #37 -- and often elsewhere is that community-contributed commands here are just convenience wrappers as alternatives to direct calculations. The calculation is just

proportions

squared proportions

sum of squared proportions

which can always be taken step by step. In that spirit see now also

SJ-24-3 st0756 Stata tip 156: Concentration and diversity measures using egen
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q3/24 SJ 24(3):535--545 (no commands)
underlines how egen can be an invaluable workhorse for
concentration and diversity calculations

https://journals.sagepub.com/doi/pdf...6867X241276115

Last edited by Nick Cox; 01 Jan 2025, 04:22.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment