Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Simpson's diversity index in Stata

    Hello,

    I am relatively new to using formulas in Stata and I want to calculate the Simpson's Diversity Index (Simpson in 1949) for each household (see data below) if this is possible?. Any advice is greatly appreciated! The formula for the Simpson's Index is



    Where:
    • n = number of individuals of each species
    • N = total number of individuals of all species

    input str14 HHID float Parcel_ID str20 Crop_name double(Quan_harvested Quan_sold)
    "1013000204" 21 "" . .
    "1021000108" 1 "Banana food" 48 0
    "1021000108" 1 "Cassava" 3 0
    "1021000113" 1 "Beans" 3 0
    "1021000113" 1 "Dodo" 2 0
    "1021000113" 1 "Dodo" 10 0
    "1021000113" 1 "Maize" 1 0
    "1021000113" 1 "Yam" 99999 .
    "1021000408" 1 "Banana food" 99999 .
    "1021000408" 1 "Beans" 2 0
    "1021000408" 1 "Maize" 1 0
    "1021000710" 1 "Banana food" 99999 .
    "1021000710" 1 "Beans" 3.5 0
    "1021000710" 1 "Maize" 60 0
    "1021000807" 1 "Coffee all" 7 7
    "1021000807" 1 "" . .
    "1021000807" 2 "Banana food" 180 80
    "1021000807" 2 "Banana food" 30 0
    "1021000807" 2 "Banana food" 90 90
    "1021000807" 2 "Cassava" 17 17
    "102100080803" 1 "Banana food" 20 0
    "102100080803" 1 "Coffee all" 0 .
    "102100080803" 2 "Beans" 1 0
    "102100080803" 2 "Maize" 1 1
    "102100080803" 2 "Groundnuts" 2 0
    "102100080803" 2 "Maize" 1 1
    "102100080803" 21 "Cassava" 2 0
    "102100080803" 21 "Maize" 2.5 1
    "102100080803" 21 "Sweet potatoes" 1 0
    "102100080803" 21 "" . .
    "102100110201" 1 "Beans" 1 0
    "102100110201" 1 "" . .
    "102100110201" 2 "Cocoa" 4 4
    "102100110201" 2 "Coffee all" 1 1
    "102100110201" 2 "Groundnuts" 3 0
    "102100110201" 3 "Groundnuts" 4 2
    "102100110201" 3 "Beans" 1 0
    "102100110201" 3 "Cassava" 99999 .
    "102100110201" 4 "" . .
    "1021001109" 1 "Sweet potatoes" 32 0
    "1021001109" 1 "Banana food" 12 0
    "1021001109" 1 "Cassava" 2 0
    "1021001109" 1 "Maize" .5 0
    "1021001109" 2 "Yam" 10 6
    "1021001109" 2 "Sugarcane" 120 0
    "1021001109" 3 "" . .
    "1021001304" 1 "Banana food" 4 0
    "1021001304" 1 "Beans" 4 0
    "1021001304" 1 "Beans" 10 0
    "1021001304" 1 "Cassava" 5 0
    "1021001304" 1 "Maize" 1 0
    "1021001910" 1 "Banana food" 99999 .
    "1021002501" 1 "" . .
    "1021002610" 1 "Banana food" 10 0
    "1021002610" 1 "Coffee all" 99999 .
    "1021002610" 1 "Sweet potatoes" 8 3
    "1021002610" 1 "Cassava" 99999 .
    "1021002610" 1 "Yam" 5 0
    "1021002610" 1 "Groundnuts" 1 0
    "1021002610" 1 "Maize" 1 0
    "1021002610" 21 "Sweet potatoes" 2 0
    "1021002611" 1 "Maize" 1 0
    "1021002611" 1 "Maize" 2 2
    "1021002810" 1 "Beans" .5 0
    "1021002810" 1 "Maize" 2 0
    "1021003309" 21 "" . .
    "1033000301" 1 "Banana food" 60 0
    "1033000301" 1 "Maize" 15 0
    "1033000302" 1 "Beans" 3 1
    "1033000302" 1 "Cassava" 0 .
    "1033000302" 1 "Banana food" 30 20
    "1033000302" 1 "Beans" 1 0
    "1033000302" 1 "Coffee all" 0 .
    "1033000302" 1 "Fallow" 99999 .
    "1033000303" 1 "Sweet potatoes" 0 .
    "1033000303" 1 "Beans" 0 .
    "1033000303" 1 "Cassava" 0 .
    "1033000303" 1 "Maize" 3 2
    "1033000303" 1 "Banana food" 50 15
    "1033000303" 2 "" . .
    "1033000303" . "Fallow" 99999 .
    "1033000304" 1 "Banana food" 40 0
    "1033000304" 1 "Coffee all" 0 .
    "1033000304" 1 "Banana beer" 30 0
    "1033000304" 21 "Sweet potatoes" 0 .
    "1033000304" 22 "Maize" 0 .
    "1033000304" 22 "Beans" 1 0
    "1033000304" 22 "Maize" 20 0
    "103300030403" 1 "Banana food" 5 0
    "103300030403" 1 "Coffee all" 2 2
    "103300030403" 2 "Banana food" 6 0
    "103300030403" 21 "Beans" 1 0
    "103300030403" 21 "Cassava" 0 .
    "103300030403" 21 "Maize" 7.5 7
    "1033000305" 1 "" . .
    "1033000305" 2 "" . .
    "1033000305" 3 "" . .
    "1033000305" 21 "" . .
    "1033000307" 1 "Banana beer" 30 0
    "1033000307" 2 "Banana beer" 10 0
    end
    [/CODE]
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 15403 observations
    Use the count() option to list more


  • #2
    The question is ambiguous. Do you want a diversity measure for (1) kinds of food (2) kinds of food, weighted by amounts?

    The formula you give is only appropriate when counting individuals, and not when working with amounts.

    Please read and act on FAQ Advice in giving references. https://www.statalist.org/forums/help#references

    Comment


    • #3
      Hi Nick,

      Sorry about how the question is worded. I meant to calculate the (1) the different kinds of foods not weighted by the amount. Thanks for any advice!

      Comment


      • #4
        Hi Nick,

        I am not sure what the code to use to input the formula in Stata? I am trying to calculate the different types of crops produced per household identified by HHID using the Simpson's Index formula above.


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str14 HHID str20 Crop_name
        "1013000204"   ""              
        "1021000108"   "Banana food"   
        "1021000108"   "Cassava"       
        "1021000113"   "Beans"         
        "1021000113"   "Dodo"          
        "1021000113"   "Dodo"          
        "1021000113"   "Maize"         
        "1021000113"   "Yam"           
        "1021000408"   "Banana food"   
        "1021000408"   "Beans"         
        "1021000408"   "Maize"         
        "1021000710"   "Banana food"   
        "1021000710"   "Beans"         
        "1021000710"   "Maize"         
        "1021000807"   "Coffee all"    
        "1021000807"   ""              
        "1021000807"   "Banana food"   
        "1021000807"   "Banana food"   
        "1021000807"   "Banana food"   
        "1021000807"   "Cassava"       
        "102100080803" "Banana food"   
        "102100080803" "Coffee all"    
        "102100080803" "Beans"         
        "102100080803" "Maize"         
        "102100080803" "Groundnuts"    
        "102100080803" "Maize"         
        "102100080803" "Cassava"       
        "102100080803" "Maize"         
        "102100080803" "Sweet potatoes"
        "102100080803" ""              
        "102100110201" "Beans"         
        "102100110201" ""              
        "102100110201" "Cocoa"         
        "102100110201" "Coffee all"    
        "102100110201" "Groundnuts"    
        "102100110201" "Groundnuts"    
        "102100110201" "Beans"         
        "102100110201" "Cassava"       
        "102100110201" ""              
        "1021001109"   "Sweet potatoes"
        "1021001109"   "Banana food"   
        "1021001109"   "Cassava"       
        "1021001109"   "Maize"         
        "1021001109"   "Yam"           
        "1021001109"   "Sugarcane"     
        "1021001109"   ""              
        "1021001304"   "Banana food"   
        "1021001304"   "Beans"         
        "1021001304"   "Beans"         
        "1021001304"   "Cassava"       
        "1021001304"   "Maize"         
        "1021001910"   "Banana food"   
        "1021002501"   ""              
        "1021002610"   "Banana food"   
        "1021002610"   "Coffee all"    
        "1021002610"   "Sweet potatoes"
        "1021002610"   "Cassava"       
        "1021002610"   "Yam"           
        "1021002610"   "Groundnuts"    
        "1021002610"   "Maize"         
        "1021002610"   "Sweet potatoes"
        "1021002611"   "Maize"         
        "1021002611"   "Maize"         
        "1021002810"   "Beans"         
        "1021002810"   "Maize"         
        "1021003309"   ""              
        "1033000301"   "Banana food"   
        "1033000301"   "Maize"         
        "1033000302"   "Beans"         
        "1033000302"   "Cassava"       
        "1033000302"   "Banana food"   
        "1033000302"   "Beans"         
        "1033000302"   "Coffee all"    
        "1033000302"   "Fallow"        
        "1033000303"   "Sweet potatoes"
        "1033000303"   "Beans"         
        "1033000303"   "Cassava"       
        "1033000303"   "Maize"         
        "1033000303"   "Banana food"   
        "1033000303"   ""              
        "1033000303"   "Fallow"        
        "1033000304"   "Banana food"   
        "1033000304"   "Coffee all"    
        "1033000304"   "Banana beer"   
        "1033000304"   "Sweet potatoes"
        "1033000304"   "Maize"         
        "1033000304"   "Beans"         
        "1033000304"   "Maize"         
        "103300030403" "Banana food"   
        "103300030403" "Coffee all"    
        "103300030403" "Banana food"   
        "103300030403" "Beans"         
        "103300030403" "Cassava"       
        "103300030403" "Maize"         
        "1033000305"   ""              
        "1033000305"   ""              
        "1033000305"   ""              
        "1033000305"   ""              
        "1033000307"   "Banana beer"   
        "1033000307"   "Banana beer"   
        end
        ------------------ copy up to and including the previous line ------------------

        Listed 100 out of 15403 observations
        Use the count() option to list more

        Comment


        • #5
          You could write your own routine, but a quick search using search diversity yields the program -divcat- (ssc install divcat). Reading the help file indicates that the -gv- option calculates the 1 minus the Simpson Index. Does this give you what you want?

          Code:
          bysort HHID: divcat Crop_name, gv gen_gv(D)
          Stata/MP 14.1 (64-bit x86-64)
          Revision 19 May 2016
          Win 8.1

          Comment


          • #6
            Not sure how you get that for each household. This may give you some technique.

            Code:
            clear 
            input str14 HHID float Parcel_ID str20 Crop_name double(Quan_harvested Quan_sold)
            "1013000204" 21 "" . .
            "1021000108" 1 "Banana food" 48 0
            "1021000108" 1 "Cassava" 3 0
            "1021000113" 1 "Beans" 3 0
            "1021000113" 1 "Dodo" 2 0
            "1021000113" 1 "Dodo" 10 0
            "1021000113" 1 "Maize" 1 0
            "1021000113" 1 "Yam" 99999 .
            "1021000408" 1 "Banana food" 99999 .
            "1021000408" 1 "Beans" 2 0
            "1021000408" 1 "Maize" 1 0
            "1021000710" 1 "Banana food" 99999 .
            "1021000710" 1 "Beans" 3.5 0
            "1021000710" 1 "Maize" 60 0
            "1021000807" 1 "Coffee all" 7 7
            "1021000807" 1 "" . .
            "1021000807" 2 "Banana food" 180 80
            "1021000807" 2 "Banana food" 30 0
            "1021000807" 2 "Banana food" 90 90
            "1021000807" 2 "Cassava" 17 17
            "102100080803" 1 "Banana food" 20 0
            "102100080803" 1 "Coffee all" 0 .
            "102100080803" 2 "Beans" 1 0
            "102100080803" 2 "Maize" 1 1
            "102100080803" 2 "Groundnuts" 2 0
            "102100080803" 2 "Maize" 1 1
            "102100080803" 21 "Cassava" 2 0
            "102100080803" 21 "Maize" 2.5 1
            "102100080803" 21 "Sweet potatoes" 1 0
            "102100080803" 21 "" . .
            "102100110201" 1 "Beans" 1 0
            "102100110201" 1 "" . .
            "102100110201" 2 "Cocoa" 4 4
            "102100110201" 2 "Coffee all" 1 1
            "102100110201" 2 "Groundnuts" 3 0
            "102100110201" 3 "Groundnuts" 4 2
            "102100110201" 3 "Beans" 1 0
            "102100110201" 3 "Cassava" 99999 .
            "102100110201" 4 "" . .
            "1021001109" 1 "Sweet potatoes" 32 0
            "1021001109" 1 "Banana food" 12 0
            "1021001109" 1 "Cassava" 2 0
            "1021001109" 1 "Maize" .5 0
            "1021001109" 2 "Yam" 10 6
            "1021001109" 2 "Sugarcane" 120 0
            "1021001109" 3 "" . .
            "1021001304" 1 "Banana food" 4 0
            "1021001304" 1 "Beans" 4 0
            "1021001304" 1 "Beans" 10 0
            "1021001304" 1 "Cassava" 5 0
            "1021001304" 1 "Maize" 1 0
            "1021001910" 1 "Banana food" 99999 .
            "1021002501" 1 "" . .
            "1021002610" 1 "Banana food" 10 0
            "1021002610" 1 "Coffee all" 99999 .
            "1021002610" 1 "Sweet potatoes" 8 3
            "1021002610" 1 "Cassava" 99999 .
            "1021002610" 1 "Yam" 5 0
            "1021002610" 1 "Groundnuts" 1 0
            "1021002610" 1 "Maize" 1 0
            "1021002610" 21 "Sweet potatoes" 2 0
            "1021002611" 1 "Maize" 1 0
            "1021002611" 1 "Maize" 2 2
            "1021002810" 1 "Beans" .5 0
            "1021002810" 1 "Maize" 2 0
            "1021003309" 21 "" . .
            "1033000301" 1 "Banana food" 60 0
            "1033000301" 1 "Maize" 15 0
            "1033000302" 1 "Beans" 3 1
            "1033000302" 1 "Cassava" 0 .
            "1033000302" 1 "Banana food" 30 20
            "1033000302" 1 "Beans" 1 0
            "1033000302" 1 "Coffee all" 0 .
            "1033000302" 1 "Fallow" 99999 .
            "1033000303" 1 "Sweet potatoes" 0 .
            "1033000303" 1 "Beans" 0 .
            "1033000303" 1 "Cassava" 0 .
            "1033000303" 1 "Maize" 3 2
            "1033000303" 1 "Banana food" 50 15
            "1033000303" 2 "" . .
            "1033000303" . "Fallow" 99999 .
            "1033000304" 1 "Banana food" 40 0
            "1033000304" 1 "Coffee all" 0 .
            "1033000304" 1 "Banana beer" 30 0
            "1033000304" 21 "Sweet potatoes" 0 .
            "1033000304" 22 "Maize" 0 .
            "1033000304" 22 "Beans" 1 0
            "1033000304" 22 "Maize" 20 0
            "103300030403" 1 "Banana food" 5 0
            "103300030403" 1 "Coffee all" 2 2
            "103300030403" 2 "Banana food" 6 0
            "103300030403" 21 "Beans" 1 0
            "103300030403" 21 "Cassava" 0 .
            "103300030403" 21 "Maize" 7.5 7
            "1033000305" 1 "" . .
            "1033000305" 2 "" . .
            "1033000305" 3 "" . .
            "1033000305" 21 "" . .
            "1033000307" 1 "Banana beer" 30 0
            "1033000307" 2 "Banana beer" 10 0
            end
            
            egen HHtag = tag(HHID Crop_name) if Crop_name != "" 
            egen HHn = total(HHtag), by(HHID) 
            egen alltag = tag(Crop_name) if Crop_name != "" 
            egen allN = total(alltag) 
            
            gen Simpson = 1 - HHn * (HHn - 1) / (allN * (allN - 1)) 
            
            tabdisp HHID , c(Simpson)

            Comment


            • #7
              Hi Carole and Nick,

              Thank you for the really helpful advice! I learned a lot from your posts. Apologies for my badly phrased questions. I realized that I had given the wrong formula in my previous post. I am trying to calculate the Simpson's Index for each household (HHID). Any advice is really appreciated! Thank you!

              The formula that I am trying to use is

              Simpsonโ€™s Index = 1 โˆ’ โˆ‘ ๐‘ j2 with ๐‘ ๐‘— = aij/Ai

              Where sj is the share of crop j in the total area cultivated by the household i and aij is the area of the crop on the jth crop by the ith household
              Ai is the total area cultivated under all crops



              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str14 HHID str20 cropname double totalplot
              "1013000204"   ""                                  .
              "1021000108"   "Cassava"                           0
              "1021000108"   "Banana food"                      .5
              "1021000113"   "Maize"                           .25
              "1021000113"   "Beans"                           .25
              "1021000113"   "Dodo"                              .
              "1021000113"   "Yam"                             .25
              "1021000408"   "Banana food"      .12999999523162842
              "1021000408"   "Maize"             .3700000047683716
              "1021000408"   "Beans"             .3700000047683716
              "1021000710"   "Maize"                           .25
              "1021000710"   "Beans"                           .25
              "1021000710"   "Banana food"                     .25
              "1021000807"   "Coffee all"                        2
              "1021000807"   "Natural pastures"                  2
              "1021000807"   "Banana food"                       1
              "1021000807"   "Cassava"                         1.5
              "102100080803" "Banana food"                       2
              "102100080803" "Coffee all"                        2
              "102100080803" "Maize"                           .25
              "102100080803" "Beans"                           .25
              "102100080803" "Maize"                           .25
              "102100080803" "Groundnuts"                      .25
              "102100080803" "Maize"                             1
              "102100080803" "Cassava"                           1
              "102100080803" "Sweet potatoes"                  .25
              "102100080803" ""                                  .
              "102100110201" ""                                  .
              "102100110201" ""                                  .
              "102100110201" "Groundnuts"                       .5
              "102100110201" "Coffee all"                       .5
              "102100110201" "Cocoa"                            .5
              "102100110201" "Groundnuts"                       .5
              "102100110201" "Beans"                            .5
              "102100110201" "Cassava"                          .5
              "102100110201" ""                                  .
              "1021001109"   "Sweet potatoes"   .15000000596046448
              "1021001109"   "Cassava"                         .25
              "1021001109"   "Banana food"                     .25
              "1021001109"   "Maize"             .6000000238418579
              "1021001109"   "Yam"                            2.25
              "1021001109"   "Sugarcane"                       .25
              "1021001109"   ""                                  .
              "1021001304"   "Maize"            .05000000074505806
              "1021001304"   "Beans"            .05000000074505806
              "1021001304"   "Cassava"          .05000000074505806
              "1021001304"   "Banana food"      .05000000074505806
              "1021001910"   "Banana food"                     .25
              "1021002501"   ""                                  .
              "1021002610"   "Sweet potatoes"                  .25
              "1021002610"   "Banana food"                     .25
              "1021002610"   "Coffee all"                      .25
              "1021002610"   "Cassava"                          .5
              "1021002610"   "Yam"                              .5
              "1021002610"   "Maize"                           .25
              "1021002610"   "Groundnuts"                      .25
              "1021002610"   "Sweet potatoes"                   .5
              "1021002611"   "Maize"                             1
              "1021002810"   "Maize"            .20000000298023224
              "1021002810"   "Beans"            .20000000298023224
              "1021003309"   ""                                  .
              "1033000301"   "Maize"                           1.5
              "1033000301"   "Banana food"                      .5
              "1033000302"   "Beans"            .12999999523162842
              "1033000302"   "Cassava"          .12999999523162842
              "1033000302"   "Beans"                           .25
              "1033000302"   "Banana food"                     .25
              "1033000302"   "Coffee all"                      .25
              "1033000302"   ""                                  .
              "1033000303"   "Sweet potatoes"                  .25
              "1033000303"   "Beans"                            .5
              "1033000303"   "Cassava"                          .5
              "1033000303"   "Maize"                            .5
              "1033000303"   "Banana food"                       1
              "1033000303"   ""                                  .
              "1033000303"   ""                                  .
              "1033000304"   "Banana food"                      .5
              "1033000304"   "Coffee all"                       .5
              "1033000304"   "Banana beer"                     5.5
              "1033000304"   "Sweet potatoes"                   .5
              "1033000304"   "Maize"                            .5
              "1033000304"   "Maize"                           1.5
              "1033000304"   "Beans"                           1.5
              "103300030403" "Banana food"                       1
              "103300030403" "Coffee all"                        1
              "103300030403" "Banana food"                     .25
              "103300030403" "Maize"                             3
              "103300030403" "Beans"                             3
              "103300030403" "Cassava"                           3
              "1033000305"   ""                                  .
              "1033000305"   ""                                  .
              "1033000305"   ""                                  .
              "1033000305"   ""                                  .
              "1033000307"   "Banana beer"                     100
              "1033000307"   "Beans"                            .5
              "1033000307"   "Banana beer"                     1.5
              "1033000307"   "Cassava"                         .25
              "1033000307"   "Banana food"                    1.75
              "1033000307"   ""                                  .
              "1033000307"   ""                                  .
              end
              ------------------ copy up to and including the previous line ------------------

              Last edited by Mangji Zo; 28 Jul 2018, 16:44.

              Comment


              • #8
                Can you hand calculate the SI index for a household? Ideally one with missing values (like household 1021000108).
                Stata/MP 14.1 (64-bit x86-64)
                Revision 19 May 2016
                Win 8.1

                Comment


                • #9
                  See also entropyetc (SSC).

                  Comment


                  • #10
                    Since you only have one entry for each crop in each HHID, another way to do this is to make a separate variable for each square, use egen by HHID to write the values across all observations in that HHID, and then do the 1- x12 - x2 with generate to calculate the index.

                    Comment

                    Working...
                    X