Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concentration Index (industries within one sector per country)

    Hello,

    I have a set of industries within one sector and I would like to calculate the degree of concentration of industries per country and per year. For instance, if country A had high concentration of their manufacturing sector in X or Z industries (isic). My first attempt was naturally calculating the Herfindahl index. Based on the definition, I did the following syntax

    egen double totalvalueadded=sum(ValueAdded), by(country year)

    bysort country year: g double squared= (ValueAdded/totalvalueadded)^2 (A proxy for market shares squared)

    egen double HHI=sum(squared), by(country year)

    Nevertheless, the plots that I am getting are very strange (that's why I am not showing them). Therefore, I am very sure there is something wrong in this calculation. My end goal is to verify which industries (isic) dominate in the sector per country and year and based on that explore certain industries characteristics by countries so later I can try to check patterns across regions throughout time. Thank you very much for the help!

    Code:
     * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(country year) byte isic long(Establishments Employment) double OutputINDSTAT4 float(val_per_worker lval_per_worker region totalemployment share_val) double ValueAdded float totalval
    4 1973 33 .     0 . . . 0 45486 . . 0
    4 1973 27 .     0 . . . 0 45486 . . 0
    4 1973 19 .     . . . . 0 45486 . . 0
    4 1973 26 .  1295 . . . 0 45486 . . 0
    4 1973 21 .     0 . . . 0 45486 . . 0
    4 1973 28 .     0 . . . 0 45486 . . 0
    4 1973 17 . 12555 . . . 0 45486 . . 0
    4 1973 35 .     . . . . 0 45486 . . 0
    4 1973 25 .   360 . . . 0 45486 . . 0
    4 1973 23 .     0 . . . 0 45486 . . 0
    4 1973 15 .  4321 . . . 0 45486 . . 0
    4 1973 24 .   259 . . . 0 45486 . . 0
    4 1973 20 .  1941 . . . 0 45486 . . 0
    4 1973 34 .     0 . . . 0 45486 . . 0
    4 1973 16 .     0 . . . 0 45486 . . 0
    4 1973 29 .     0 . . . 0 45486 . . 0
    4 1973 36 .     . . . . 0 45486 . . 0
    4 1973 31 .     0 . . . 0 45486 . . 0
    4 1973 32 .     . . . . 0 45486 . . 0
    4 1973 22 .  1287 . . . 0 45486 . . 0
    4 1973 18 .   725 . . . 0 45486 . . 0
    4 1974 33 .     0 . . . 0 56242 . . 0
    4 1974 28 .     0 . . . 0 56242 . . 0
    4 1974 34 .     0 . . . 0 56242 . . 0
    4 1974 16 .     0 . . . 0 56242 . . 0
    4 1974 22 .  1326 . . . 0 56242 . . 0
    4 1974 15 .  4845 . . . 0 56242 . . 0
    4 1974 20 .  2044 . . . 0 56242 . . 0
    4 1974 23 .     0 . . . 0 56242 . . 0
    4 1974 18 .   720 . . . 0 56242 . . 0
    4 1974 27 .     0 . . . 0 56242 . . 0
    4 1974 36 .     . . . . 0 56242 . . 0
    4 1974 35 .     . . . . 0 56242 . . 0
    4 1974 19 .     . . . . 0 56242 . . 0
    4 1974 21 .     0 . . . 0 56242 . . 0
    4 1974 32 .     . . . . 0 56242 . . 0
    4 1974 25 .   427 . . . 0 56242 . . 0
    4 1974 17 . 14243 . . . 0 56242 . . 0
    4 1974 29 .     0 . . . 0 56242 . . 0
    4 1974 24 .  3281 . . . 0 56242 . . 0
    4 1974 26 .  1235 . . . 0 56242 . . 0
    4 1974 31 .     0 . . . 0 56242 . . 0
    4 1975 21 .     0 . . . 0 66292 . . 0
    4 1975 22 .  1447 . . . 0 66292 . . 0
    4 1975 31 .     0 . . . 0 66292 . . 0
    4 1975 24 .  4046 . . . 0 66292 . . 0
    4 1975 19 .     . . . . 0 66292 . . 0
    4 1975 27 .     0 . . . 0 66292 . . 0
    4 1975 36 .     . . . . 0 66292 . . 0
    4 1975 25 .   507 . . . 0 66292 . . 0
    4 1975 17 . 17202 . . . 0 66292 . . 0
    4 1975 26 .  1646 . . . 0 66292 . . 0
    4 1975 34 .     0 . . . 0 66292 . . 0
    4 1975 28 .     0 . . . 0 66292 . . 0
    4 1975 33 .     0 . . . 0 66292 . . 0
    4 1975 32 .     . . . . 0 66292 . . 0
    4 1975 16 .     0 . . . 0 66292 . . 0
    4 1975 29 .     0 . . . 0 66292 . . 0
    4 1975 15 .  5103 . . . 0 66292 . . 0
    4 1975 18 .   874 . . . 0 66292 . . 0
    4 1975 23 .     0 . . . 0 66292 . . 0
    4 1975 20 .  2321 . . . 0 66292 . . 0
    4 1975 35 .     . . . . 0 66292 . . 0
    4 1976 36 .     . . . . 0 72158 . . 0
    4 1976 16 .     0 . . . 0 72158 . . 0
    4 1976 23 .     0 . . . 0 72158 . . 0
    4 1976 35 .     . . . . 0 72158 . . 0
    4 1976 19 .     . . . . 0 72158 . . 0
    4 1976 31 .     0 . . . 0 72158 . . 0
    4 1976 20 .  2317 . . . 0 72158 . . 0
    4 1976 28 .     0 . . . 0 72158 . . 0
    4 1976 24 .  4479 . . . 0 72158 . . 0
    4 1976 17 . 20520 . . . 0 72158 . . 0
    4 1976 15 .  4540 . . . 0 72158 . . 0
    4 1976 18 .   840 . . . 0 72158 . . 0
    4 1976 29 .     0 . . . 0 72158 . . 0
    4 1976 32 .     . . . . 0 72158 . . 0
    4 1976 22 .  1465 . . . 0 72158 . . 0
    4 1976 34 .     0 . . . 0 72158 . . 0
    4 1976 21 .     0 . . . 0 72158 . . 0
    4 1976 25 .   583 . . . 0 72158 . . 0
    4 1976 27 .     0 . . . 0 72158 . . 0
    4 1976 33 .     0 . . . 0 72158 . . 0
    4 1976 26 .  1335 . . . 0 72158 . . 0
    4 1977 18 .   920 . . . 0 77018 . . 0
    4 1977 16 .     0 . . . 0 77018 . . 0
    4 1977 29 .     0 . . . 0 77018 . . 0
    4 1977 21 .     0 . . . 0 77018 . . 0
    4 1977 25 .   623 . . . 0 77018 . . 0
    4 1977 20 .  2240 . . . 0 77018 . . 0
    4 1977 33 .     0 . . . 0 77018 . . 0
    4 1977 17 . 20540 . . . 0 77018 . . 0
    4 1977 19 .     . . . . 0 77018 . . 0
    4 1977 28 .     0 . . . 0 77018 . . 0
    4 1977 22 .  1747 . . . 0 77018 . . 0
    4 1977 35 .     . . . . 0 77018 . . 0
    4 1977 27 .     0 . . . 0 77018 . . 0
    4 1977 36 .     . . . . 0 77018 . . 0
    4 1977 31 .     0 . . . 0 77018 . . 0
    4 1977 24 .  4714 . . . 0 77018 . . 0
    end







  • #2
    Shouldn't HHI square the actual percentage (e.g. 76^2) and not the fraction (0.76^2)?

    Also, the question is hard to answer because i) the original HHI formula was not provided, making it hard to check the calculation, and ii) the "wrong" graphs were not shown under the reason that because they look wrong. I'd disagree with that practice. It's important to see why it's thought to be wrong and to know what a "right" one would look like.

    Comment


    • #3
      The original formula is. HHI=s12​+s22​+s32​+…sn2 (being sij the share of each industry's value added over total value added of a country in a year). You are correct, my apologies. But when I adjust for the actual percentage instead of the fraction. I still get the same graph, I am attaching one example of why the graph is clearly wrong.


      egen double totalvalueadded=sum(ValueAdded), by(country year)

      bysort country year: g double squared= ((ValueAdded/totalvalueadded)*100)^2 (A proxy for market shares squared)

      egen double HHI=sum(squared), by(country year)

      I am not sure where to add the industry part in my syntax

      Attached Files

      Comment


      • #4
        I agree with Ken Chui on the major point here. Having wild-looking results but not showing them to us does not make a question easier to answer.

        Whether you square probabilities so that Herfindahl or HHI falls in (0, 1] or percentages so that it falls in (0, 10000] is at most personal or tribal choice or convention; I wouldn't call one version right and another wrong any more than A calling the probability of heads 0.5 and B calling it 50% is a matter for complaint. I'd assert that the higher the level of analysis, the less likely calculations are to be phrased or reported as percents or equivalent.

        If there is only one firm in sight, its share is 1 or 100% and HHI is then 1 or 10000, depending on your preference.

        Even if other firms lurk as possible but not active their shares as 0 or 0 squared in any given calculation do not make any difference to HHI.

        I prefer use of probabilities, if only because, but not only because, I generally want to calculate entropy or some sibling alongside. I do not recall anyone insisting on percents as input for entropy calculation and they would need to be scaled to probabilities any way.

        As a small point but one I hope worth making, I advise strongly against citing egen, sum() as code because

        1. It went undocumented in Stata 9, a while ago. The documented name is now total().

        2. Despite clear explanations in the documentation, readers have often confused that egen function sum() for overall totals with the generic function sum() which returns running or cumulative sums, which is precisely why the name was changed in Stata 9.

        What's implausible at first sight in the graph in #3 are the zero values, which are precisely what you get if all values entering a calculation are zero or missing. In such cases HHI is probably better regarded as missing or indeterminate. egen reports 0 as total with all values 0 or missing

        A course I've never taken or given -- Debugging 101 -- might start with

        1. Check your code carefully by reading it over slowly. If you're lucky some silly error may leap out at you.

        2. If #1 fails, get specific and concrete. Check lines one by one using (very) simple data as examples. Check intermediate results.

        This toy example shows the nub of the matter. I advise keeping track of the number of values that are positive and not missing.


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 which float value
        "A"   0
        "A"   0
        "B"   .
        "B"   .
        "C" 100
        "C"   0
        "D"  50
        "D"  50
        end 
        
        egen double total = total(value), by(which)
        egen nvals = total(value > 0 & value < .), by(which)
        egen double HHI = total((value/total)^2), by(which)
        
        l, sepby(which)
        
         
            +-------------------------------------+
             | which   value   total   nvals   HHI |
             |-------------------------------------|
          1. |     A       0       0       0     0 |
          2. |     A       0       0       0     0 |
             |-------------------------------------|
          3. |     B       .       0       0     0 |
          4. |     B       .       0       0     0 |
             |-------------------------------------|
          5. |     C     100     100       1     1 |
          6. |     C       0     100       1     1 |
             |-------------------------------------|
          7. |     D      50     100       2    .5 |
          8. |     D      50     100       2    .5 |
             +-------------------------------------+
        So HHI is zero given input values that are zeros or missings or both, at least with this code. (Other code in which one step was dividing partial total by larger total could produce missing as a result.) So that is cases A and B here.

        In C there is one active firm and the index is 1 and in D two firms with equal shares so the index is 0.5 (check 0.5^2 + 0.5^2 = 0,5).

        If you want results to vary up to 10000, multiply at the end or create shares as percents.

        For separate industry calculations, given also country and year, you need either by(country industry year) as an option or by country industry year: as a prefix.

        Comment


        • #5

          I followed the syntax using my data as follows. Now, that you mentioned this distinction with sum and total, I get the idea a little clearer. In my plots I try to exclude the possibility of a HHI index equal to 0. Although this is not the plot I was expecting, the plot seems way more reasonable than the previous one. My initial goal is to see which countries have higher degrees of industrial concentration and see patterns over time. Later, my goal is to see which industries are driving this concentration.

          egen double total = total(ValueAdded), by(country year)
          egen nvals = total(ValueAdded > 0 & ValueAdded < .), by(country year)
          egen double HHI = total((ValueAdded/total)^2), by(country year)
          twoway (line HHI year if country==410&HHI!=0)


          Doing the same plot as before but with this minor adjustment (HHI!=0), I get the following plot.
          Attached Files
          Last edited by Hugo Rocha; 21 Feb 2022, 09:47.

          Comment


          • #6
            So, HHI is essentially stable at around 0.09. I'd use


            Code:
            twoway connect 
            to be open about where the gaps are.

            Comment


            • #7
              Yes, will do so. Thank you very much! (These are the problems with unbalanced panels.You have gaps in places you do not expect).

              Comment

              Working...
              X