Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculation of Blau-index?

    Dear All, I found this question here (http://bbs.pinggu.org/thread-6634832-1-1.html). The data set is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long stkcd int year float gender
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 1
    2 2015 0
    2 2015 1
    2 2015 1
    2 2015 1
    2 2014 0
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 0
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2014 1
    2 2013 1
    2 2013 1
    2 2013 1
    2 2013 1
    2 2013 1
    2 2013 0
    2 2013 0
    2 2013 1
    2 2013 1
    2 2013 1
    end
    For each company (stkcd) and year (year), I'd like to calculate the following index, $H=1-\sum_{i=1}^n P_i^n$ (Can't I use LaTeX here?).
    1. In this case, n=2, i.e., gender=1 (say male) or 0 (say, female).
    2. P_1 is the ratio of males to all persons, and P_2 is the ratio of females to all persons.
    3. In particular, H=1-P_1^2-P_2^2 in my case.
    I suppose that the following code can do this (hopefully correctly).
    Code:
    bys stkcd year: egen tem1 = total(gender)
    bys stkcd year: egen tem2 = count(gender)
    gen P1 = tem1/tem2
    gen P2 = 1-P1
    gen P1sq = P1^2
    gen P2sq = P2^2
    gen H = 1 - P1sq - P2sq
    However, I wonder if there is more concise code for this situation? Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Some algebra shows that this can be done more concisely as:

    Code:
    bys stkcd year: egen tem1 = total(gender)
    bys stkcd year: egen tem2 = count(gender)
    gen P1 = tem1/tem2
    gen H = 2*P1*(1-P1)
    Note: If you calculate it both ways, you will get results that are not exactly the same, due to rounding errors. But the differences appear in the 8th decimal place, so I think you can ignore them for any real-world purpose.

    Added: If you do all of the calculations generating them as doubles, the differences get shoved out to the 17th decimal place.
    Last edited by Clyde Schechter; 12 Sep 2018, 20:25.

    Comment


    • #3
      Also, there is the user created program -divcat- available via SSC (type: ssc install divcat):

      Code:
      bysort stkcd year: divcat gender , gv gen_gv(H_new)
      list , sepby(year)
      Last edited by Carole J. Wilson; 12 Sep 2018, 20:56. Reason: Corrected to include stkcd in the bysort prefix
      Stata/MP 14.1 (64-bit x86-64)
      Revision 19 May 2016
      Win 8.1

      Comment


      • #4
        Dear Clyde, Thanks for the helpful suggestion. Let's consider a more general case as follows.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long stkcd int year float x
        2 2015 1
        2 2015 0
        2 2015 1
        2 2015 1
        2 2015 0
        2 2015 1
        2 2015 1
        2 2015 1
        2 2015 1
        2 2015 0
        2 2015 1
        2 2015 1
        2 2015 1
        2 2014 0
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 2
        2 2014 1
        2 2014 2
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 0
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 1
        2 2014 2
        2 2014 2
        2 2013 3
        2 2013 3
        2 2013 1
        2 2013 1
        2 2013 1
        2 2013 0
        2 2013 0
        2 2013 1
        2 2013 1
        2 2013 1
        end
        I'd like the calculate H but there are different categories of `x' for different years (only one `stkcd' here). Any suggestions?
        Ho-Chuan (River) Huang
        Stata 17.0, MP(4)

        Comment


        • #5
          Here's the code to do it by hand for any number of categories (including different numbers by year and/or stkcd). H_new is created by the command -divcat- referenced in #3; they are equivalent.

          Code:
          bysort stkcd year x: gen prop_cat=_N
          bysort stkcd year: replace prop_cat= prop_cat/_N
          
          levelsof x, local(xlev)
          foreach w of local xlev {
              bysort stkcd year: egen sq_prop_cat`w'=min(cond(x==`w', prop_cat^2, .))
              }
          egen sum_propcat=rowtotal(sq_*)
          gen H=1-sum_propcat
          
          bysort stkcd year: divcat x , gv gen_gv(H_new)
          list , sepby(year)
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Dear Carole, Many thanks for suggesting the -divcat- command. It works well.

            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7
              Code:
              bys stkcd year x: gen a = _N*(_n==1)
              bys stkcd year (x): egen b = total(x!=x[_n-1])
              bys stkcd year: egen Blau = total((_N^(b-1)-a^b)/(_N^b))
              Edit. If I understand correctly, the exponent number is not 2, but the number of (distinct) values of x. On this regard, Carol's output and mine are, therefore, different.
              Last edited by Romalpa Akzo; 12 Sep 2018, 23:43.

              Comment


              • #8
                I'm pretty sure the exponent is 2, if this is the usual diversity index related to Simpson's index.
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  Then River might need to recheck the calculation for this index.

                  If the exponent is 2, one line of my suggestion in #7 should be omitted.

                  Code:
                  bys stkcd year x: gen a = _N*(_n==1)
                  bys stkcd year: egen Blau = total((_N-a^2)/(_N^2))

                  Comment


                  • #10
                    Does no one ever try searching the forum? "Blau index" does suggest itself as a search term with results

                    https://www.statalist.org/forums/for...iversity-index

                    https://www.statalist.org/forums/for...index-in-stata

                    https://www.statalist.org/forums/for...the-blau-index

                    https://www.statalist.org/forums/for...mbers-of-a-set

                    Comment


                    • #11
                      Dear Romalpa, My bad. It seems that the original post had a typo. It should be P_i^2 rather than P_i^n. Sorry for this, and thank you for your suggestion.
                      Ho-Chuan (River) Huang
                      Stata 17.0, MP(4)

                      Comment


                      • #12
                        Dear Nick, Many thanks for the information.
                        Ho-Chuan (River) Huang
                        Stata 17.0, MP(4)

                        Comment

                        Working...
                        X