Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • HHI Panel in long format

    Hi all,

    this is the first time I am writing in the statalist-forum and I would be very glad if somebody can help me.
    I didn't find any hints how to cope with my case in the forum, so here is my request:

    I do have a panel structure (long format) with companies like the following:

    ID year var1 var2 var 3 ..... var20
    1 2003 16% 12% 5% 0%
    1 2004 15% 11% 7% 1%
    1 2005 16% 14% 6% 3%
    2 2003 27% 19% 4% 3%
    2 2004 15% 11% 4% 3%
    2 2005 19% 9% 3% 2%

    What I'd like to compute is HHI -> Herfindahl-index for every Company (id) in each year.
    Actually, I didn't find any helping hints if I can use this long format or if I have to reshape it to the wide format.

    And if I use the command from stata-help-function in the existing format, stata produces a new hhi-variable for every var 1 to var20.
    sort id year
    hhi var1 var2 var3, by (id year)

    Maybe my question is to easy, but nevertheless I did not find any hints on how to cope with.
    So, I would be very pleased if somebody can help me.

    Thanks a lot and kind regards

    Fabian

  • #2
    Fabian:-
    welcome to the list.
    Typing -search Herfindahl-index- might give you some valuable entries.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      It's at least moderately puzzling that O.C. Herfindahl gets so much credit for an index he didn't originate. (The priority claim of A.O. Hirschman is more than a little unfortunate, as he wasn't first there either.) The tendency of scholars to know little more than the literature in their own discipline is perhaps to blame here (that's unfair on Hirschman in general, as he was singularly wide-ranging).

      But leaving that all aside it seems that you want the sum of squared proportions (or perhaps its complement, or perhaps its reciprocal: if either of the latter, then the calculation is easy).

      We need more information, strictly. Examples such as yours (made consistent here)

      Code:
      ID year var1 var2 var3 var4
      1 2003 16% 12% 5% 0%
      1 2004 15% 11% 7% 1%
      1 2005 16% 14% 6% 3%
      2 2003 27% 19% 4% 3%
      2 2004 15% 11% 4% 3%
      2 2005 19% 9% 3% 2%
      leave us wondering whether the % signs (a) are part of string variables or (b) are just supplied as explanation, but not part of the data. Please do read up about using dataex (SSC) to show us real or realistic data as the FAQ and "+ New topic" text do urge.

      You allude to hhi (SSC) : please note how to refer to user-written commands. I don't find the help for that program at all easy to follow, so it seems easier to suggest code from first principles.

      Here is your example used as sandbox. In practice you have more variables; in practice they add up to 100%; but the principle should not be too unclear.

      As I understand it, the superstructure of identifier and year plays no part in the calculation. Also, you may need to insert factors of 100 if you are using some different convention about units.

      Code:
      clear
      input ID year var1 var2 var3 var4
      1 2003 16 12 5 0
      1 2004 15 11 7 1
      1 2005 16 14 6 3
      2 2003 27 19 4 3
      2 2004 15 11 4 3
      2 2005 19 9 3 2
      end
      
      egen total = rowtotal(var1-var4)
      
      forval j = 1/4 {
         gen psq`j' = (var`j'/total)^2
      }
      
      egen hhi = rowtotal(psq*)
      
      list
      So, egen is your friend. To get proportions (or percents) squared, you first need a row total. The last step is also a row total.

      By the way, titles such as "HHI" really narrow the likely readership to a small subset of applied economists, modulo accidents about who else recognises the in-group jargon. The way to make the thread really general would be to translate to something like "sum of squared proportions". Then more people would recognise that you are asking about a moderately simple calculation.

      EDIT

      Code:
      egen total = rowtotal(var1-var4)
      
      forval j = 1/4 {
         gen psq`j' = (var`j'/total)^2
      }
      
      egen hhi = rowtotal(psq*)
      could easily be rewritten as

      Code:
      egen total = rowtotal(var1-var4)
      
      gen hhi = 0
      
      forval j = 1/4 {
         replace hhi = hhi + (var`j'/total)^2
      }
      not to mention Mata solutions. (I am not rewriting the total as var1 + var2 + var3 + var4 as extending that to 20 or so variables is too tedious.)
      Last edited by Nick Cox; 19 Oct 2015, 10:29.

      Comment


      • #4
        Dear Carlo, dear Nick,

        thanks a lot for your quick and helpful answers.
        Indeed, help-function for hhi (SSC) is not that helpful, so I used the way to compute the concentration measure as Nick explained.

        You're totally right, Nick. My request was a bit fuzzy, but you transferred it in the right way. Thanks a lot for the hints regarding how to post requests in the forum, I will use them next time.
        And indeed egen is my friend, the forval -solution a real helpful one.

        So, thanks a lot for helping me out!

        Regards

        Fabian

        Comment


        • #5
          Hello all,

          I am working with a very similar kind of data structure, wherein I intend to compute HHI for net sales. Since net sales variable can be negative so as I insert the code:

          hhi netsales, by(year id)

          the stata shows error as:
          negative values in varlist

          As I cannot drop negative net sales values, how can I resolve this issue.

          Many thanks in advance for your help


          Last edited by mohina saxena; 09 Jan 2019, 12:46.

          Comment


          • #6
            #5 I can't see that the measure makes any sense at all unless for shares that are zero or positive. Why not just SD or variance?

            Comment


            • #7
              Dear Nick,

              Many thanks for your clarification. In regards to the intuition that you have given I have understood it the following way, please correct me if I am wrong; since HHI is calculated from summation of market share, so in that sense market share cannot be negative, it has to be zero or positive. Thus, my measure of net sales is conceptually a wrong measure right?

              Regards
              Mohina

              Comment


              • #8
                As before: I didn't write hhi (SSC), but I agree with its author that negative inputs make no obvious sense.

                Comment


                • #9
                  Thanks a lot for the much needed clarification.

                  Comment


                  • #10
                    Hi, I am also struggling to accurately construct a HHI Index. The paper I am following following contains sales data and use following equation to construct HHI Index:

                    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	7.1 KB
ID:	1545401


                    My data contains thousand of blank sales observations which has no data, which in other words mean that company did not exist at that time and there were no reported data available. My data looks like as follows:

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input long firmid int Time long(Industry sales)
                    1 2003  3    .
                    1 2004  3    .
                    1 2005  3    .
                    1 2006  3    .
                    1 2007  3    .
                    1 2008  3    5
                    1 2009  3    6
                    1 2010  3    6
                    1 2011  3    6
                    1 2012  3   11
                    1 2013  3    .
                    1 2014  3    .
                    1 2015  3    .
                    1 2016  3    .
                    1 2017  3    .
                    1 2018  3    .
                    2 2003  7   11
                    2 2004  7   59
                    2 2005  7   95
                    2 2006  7  116
                    2 2007  7  158
                    2 2008  7  373
                    2 2009  7  224
                    2 2010  7  240
                    2 2011  7  291
                    2 2012  7    .
                    2 2013  7    .
                    2 2014  7    .
                    2 2015  7    .
                    2 2016  7    .
                    2 2017  7    .
                    2 2018  7    .
                    3 2003  4   41
                    3 2004  4   63
                    3 2005  4   72
                    3 2006  4   83
                    3 2007  4   85
                    3 2008  4   63
                    3 2009  4  129
                    3 2010  4  110
                    3 2011  4    .
                    3 2012  4    .
                    3 2013  4    .
                    3 2014  4    .
                    3 2015  4    .
                    3 2016  4    .
                    3 2017  4    .
                    3 2018  4    .
                    4 2003  8  414
                    4 2004  8  471
                    4 2005  8  316
                    4 2006  8  743
                    4 2007  8    .
                    4 2008  8 1381
                    4 2009  8 1775
                    4 2010  8 2063
                    4 2011  8 2309
                    4 2012  8 2403
                    4 2013  8 2469
                    4 2014  8 2620
                    4 2015  8    .
                    4 2016  8    .
                    4 2017  8    .
                    4 2018  8    .
                    5 2003  8  161
                    5 2004  8  160
                    5 2005  8  165
                    5 2006  8  161
                    5 2007  8    .
                    5 2008  8   90
                    5 2009  8   79
                    5 2010  8   94
                    5 2011  8  123
                    5 2012  8  113
                    5 2013  8  109
                    5 2014  8  118
                    5 2015  8    .
                    5 2016  8    .
                    5 2017  8    .
                    5 2018  8    .
                    6 2003  4  717
                    6 2004  4  796
                    6 2005  4  993
                    6 2006  4 1170
                    6 2007  4 1132
                    6 2008  4  934
                    6 2009  4 1535
                    6 2010  4 1929
                    6 2011  4 1977
                    6 2012  4 2141
                    6 2013  4 2242
                    6 2014  4 2310
                    6 2015  4 2302
                    6 2016  4 2628
                    6 2017  4    .
                    6 2018  4    .
                    7 2003 10 1849
                    7 2004 10 2355
                    7 2005 10 3046
                    7 2006 10 3992
                    end
                    label values firmid fid
                    label def fid 1 "0876711D LN Equity", modify
                    label def fid 2 "1218069D LN Equity", modify
                    label def fid 3 "1334987D LN Equity", modify
                    label def fid 4 "1561649D LN Equity", modify
                    label def fid 5 "1638414D LN Equity", modify
                    label def fid 6 "1655637D LN Equity", modify
                    label def fid 7 "3572335Q LN Equity", modify
                    label values Industry n_ind
                    label def n_ind 3 "Consumer Services", modify
                    label def n_ind 4 "Financials", modify
                    label def n_ind 7 "Oil & Gas", modify
                    label def n_ind 8 "Technology", modify
                    label def n_ind 10 "Utilities", modify

                    How can I construct HHI index based on above equation? Second whether hhi function considering blank cells as zero or simply ignoring those cells. Many thanks.

                    Attached Files

                    Comment


                    • #11
                      Thanks for the data example!

                      We can't comment on your code, as you don't show any.

                      The original question #1 is marginally relevant as the OP has, in my terms, a long-wide layout here, and regardless of what you call it, your layout (data structure, data format) differs from that in #1.

                      That said, HHI or its variants has been the subject of at least a hundred posts here on Statalist, and a wider search would have found this solution. I give here concise code and then show with a worked example how it is correct. I've assumed, contrary to your unsourced quotation, that you want calculations separately by year.

                      See especially #2 in https://www.statalist.org/forums/for...-of-a-variable from 29 March 2020.


                      Code:
                      . bysort Time Industry : egen p = pc(sales), prop
                      (38 missing values generated)
                      
                      . by Time Industry : egen HHI = total(p^2)
                      
                      . replace HHI = . if HHI == 0
                      (32 real changes made, 32 to missing)
                      
                      . tabdisp Time Industry, c(HHI) format(%4.3f)
                      
                      ---------------------------------------------------------------------------------------------------------
                                |                                           Industry                                          
                           Time | Consumer Services         Financials          Oil & Gas         Technology          Utilities
                      ----------+----------------------------------------------------------------------------------------------
                           2003 |                                0.898              1.000              0.597              1.000
                           2004 |                                0.864              1.000              0.621              1.000
                           2005 |                                0.874              1.000              0.549              1.000
                           2006 |                                0.876              1.000              0.707              1.000
                           2007 |                                0.870              1.000                                      
                           2008 |             1.000              0.882              1.000              0.885                  
                           2009 |             1.000              0.857              1.000              0.918                  
                           2010 |             1.000              0.898              1.000              0.917                  
                           2011 |             1.000              1.000              1.000              0.904                  
                           2012 |             1.000              1.000                                 0.914                  
                           2013 |                                1.000                                 0.919                  
                           2014 |                                1.000                                 0.918                  
                           2015 |                                1.000                                                        
                           2016 |                                1.000                                                        
                           2017 |                                                                                              
                           2018 |                                                                                              
                      ---------------------------------------------------------------------------------------------------------
                      
                      .
                      . list if Time == 2003 & Industry == 4
                      
                           +----------------------------------------------------------------------+
                           |             firmid   Time     Industry   sales          p        HHI |
                           |----------------------------------------------------------------------|
                        2. | 1655637D LN Equity   2003   Financials     717   .9459103   .8976719 |
                        3. | 1334987D LN Equity   2003   Financials      41   .0540897   .8976719 |
                           +----------------------------------------------------------------------+
                      
                      . di 717/(717 + 41)
                      .94591029
                      
                      . di 41/(717 + 41)
                      .05408971
                      
                      . di (717/(717 + 41))^2 + (41/(717 + 41))^2
                      .89767197
                      Note that with just 1 firm in each industry and year, HHI is necessarily 1. Conversely, total() in egen returns 0 as the sum of missing values, which we take as meaning missing.
                      Last edited by Nick Cox; 08 Apr 2020, 02:05.

                      Comment

                      Working...
                      X