Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Computing an average based on others

    Dear Profs and colleagues,

    I am going to do: For every county (string variable), take the average level of diversity (blau_indexethnic) within the region (7 regions), and then I subtract the county’s level of diversity from this region average.(I mean for a given county, I take the average level of diversity within the region, and then I subtract the county’s level of diversity from this region average, means subtracting that country's diversity from the region level diversity)
    [code]
    year:2010-2019
    NPC_FIC:firm ID
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(NPC_FIC year) str6 country float(region blau_indexethnic)
    501244941 2010 "AD" 1        0
    501339663 2010 "AD" 1        0
    501061164 2010 "AD" 2        0
    501077744 2010 "AD" 2       .03
    501101357 2010 "AD" 4       .07
    501043859 2010 "AD" 5       .01
    502231489 2010 "AN" 1      .75
    501037669 2010 "AO" 1        0
    503205787 2010 "AO" 2        0
    501036723 2010 "AO" 2        .07
    501347430 2010 "AO" 3        .04
    501033739 2010 "AO" 3       .6
    501075483 2010 "AO" 4       .08
    501361748 2010 "AO" 5       .05
    501237916 2010 "AO" 5       .012
    501973936 2010 "AO" 1       .5
    501227384 2010 "SP" 1        .34
    502490973 2010 "SP" 4        .79
    501344192 2010 "SP" 4        .3
    501305448 2010 "SP" 5        .4
    501829031 2010 "SP" 5        .16
    501227275 2010 "BR" 6       .5
    500226662 2010 "BR" 6        .7
    501050721 2010 "BR" 7        .9
    501822217 2010 "BR" 7 .6666666
    500721928 2010 "PT" 1        0
    501171904 2010 "PT" 1        0
    501244634 2010 "PT" 1        0
    501611851 2010 "PT" 1        0
    502222461 2010 "PT" 1        0
    
    end
    I use this command but it is wrong, because country PT which has zero blau_indexethnic must have zero regio-average as well, but it does not.
    Code:
    bysort year region country: egen region_avg = mean(blau_indexethnic)
    
    gen region_avg_excl = ( region_avg- blau_indexethnic)
    Any ideas are appreciated.

    Cheers,
    Paris
    Last edited by Paris Rira; 09 Jun 2024, 11:56.

  • #2
    I find your question very unclear but the following may help: https://www.stata.com/support/faqs/d...ng-properties/

    Comment


    • #3
      This would seem to be addressable by just calculating the desired index at different levels of aggregation, and then looking at differences. The differences between measures for quite different levels of aggregation are likely to be stark and unrevealing. Or so I guess.

      Comment


      • #4
        In the first stage: I would like to compute the average of blau_indexethnic for each country based on region, i.e. for region=1 the average of blau_indexethnic for country BR, for region=2 the average of blau_indexethnic for country BR, ...and for all 700 countries across 7 regions.

        Comment


        • #5
          I am lost here on your precise set-up. It seems that you already know how to calculate this Blau index at one scale, so doing that at another scale should be straightforward. I see no reason why deviations should have average zero in this case.

          Much depends on how you count but I don't think there are 700 countries in the world.

          Comment


          • #6
            I can re-express my question in this way (its Microdata I really can't test/check if the approach is accurate or not, almost 30 million obs)
            is this command correct? In order to compute the mean diversity index in each region (year) for each nation and subtract the country level diversity from region-level one. (diversity variable is blau_indexethnic)
            Code:
            bysort year nacio region : egen region_avg = mean(blau_indexethnic)
            gen region_avg_excl = (region_avg- blau_indexethnic)
            And about the number of counties its 208 in my data, surely 700 do not exist

            Comment


            • #7
              I doubt that anyone can help given this limited information and naturally you can't post the dataset here. Perhaps you can invent a tiny sandbox dataset with say 2 countries and 4 regions for 1 year and show us for invented data how it should work.

              Comment


              • #8
                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input double(year NPC_FIC) str6 nacio float(region blau_indexethnic region_avg2 region_avg_excl)
                2010 500000863 "BR" 3        0 .3111229   .3111229
                2010 500001440 "BR" 4        0 .3289521   .3289521
                2010 500002759 "BR" 4 .6666666 .3289521 -.33771455
                2010 500003392 "BR" 3 .4444444 .3111229 -.13332152
                2010 500013781 "BR" 1        0 .2588833   .2588833
                2010 500015674 "BR" 3        0 .3111229   .3111229
                2010 500016157 "BR" 3        0 .3111229   .3111229
                2010 500016805 "BR" 3        0 .3111229   .3111229
                2010 500019820 "BR" 4        0 .3289521   .3289521
                2010 500020452 "BR" 3        0 .3111229   .3111229
                2010 500021061 "BR" 2        0 .2156455   .2156455
                2010 500021109 "BR" 4        0 .3289521   .3289521
                2010 500021403 "BR" 3        0 .3111229   .3111229
                2010 500022341 "BR" 3        0 .3111229   .3111229
                2010 500023177 "BR" 3        0 .3111229   .3111229
                2010 500024286 "BR" 3        0 .3111229   .3111229
                2010 500024314 "BR" 3        0 .3111229   .3111229
                2010 500024981 "BR" 4        0 .3289521   .3289521
                2010 500025271 "BR" 3        0 .3111229   .3111229
                2010 500027162 "BR" 3        0 .3111229   .3111229
                2010 500028461 "BR" 3        0 .3111229   .3111229
                2010 500028697 "BR" 1        0 .2588833   .2588833
                2010 500030284 "BR" 3        0 .3111229   .3111229
                2010 500033280 "BR" 4        0 .3289521   .3289521
                2010 500033404 "BR" 1        0 .2588833   .2588833
                2010 500034469 "BR" 3        0 .3111229   .3111229
                2010 500036572 "BR" 4        0 .3289521   .3289521
                2010 500037855 "BR" 3       .5 .3111229  -.1888771
                2010 500040165 "BR" 1        0 .2588833   .2588833
                2010 500041634 "BR" 3        0 .3111229   .3111229
                2010 500043164 "BR" 2        0 .2156455   .2156455
                2010 500043663 "BR" 4        0 .3289521   .3289521
                2010 500044457 "BR" 2        0 .2156455   .2156455
                2010 500045017 "BR" 3        0 .3111229   .3111229
                2010 500045687 "BR" 3      .48 .3111229 -.16887707
                2010 500049313 "BR" 4        0 .3289521   .3289521
                2010 500049414 "BR" 4        0 .3289521   .3289521
                2010 500051182 "BR" 4        0 .3289521   .3289521
                2010 500051350 "BR" 3        0 .3111229   .3111229
                2010 500052753 "BR" 4        0 .3289521   .3289521
                2010 500053800 "BR" 3        0 .3111229   .3111229
                2010 500054648 "BR" 2        0 .2156455   .2156455
                2010 500054675 "BR" 4        0 .3289521   .3289521
                2010 500055376 "BR" 2        0 .2156455   .2156455
                2010 500056755 "BR" 4        0 .3289521   .3289521
                2010 500056920 "BR" 2     .375 .2156455  -.1593545
                2010 500058055 "BR" 4        0 .3289521   .3289521
                2010 500058835 "BR" 4        0 .3289521   .3289521
                2010 500060187 "BR" 3        0 .3111229   .3111229
                2010 500060629 "BR" 3 .4444444 .3111229 -.13332152
                2010 500062988 "BR" 2        0 .2156455   .2156455
                2010 500064224 "BR" 3        0 .3111229   .3111229
                2010 500065566 "BR" 3        0 .3111229   .3111229
                2010 500068796 "BR" 2        0 .2156455   .2156455
                2010 500068926 "BR" 3        0 .3111229   .3111229
                2010 500071103 "BR" 3        0 .3111229   .3111229
                2010 500071339 "BR" 2        0 .2156455   .2156455
                2010 500074120 "BR" 3        0 .3111229   .3111229
                2010 500075539 "BR" 3        0 .3111229   .3111229
                2010 500075937 "BR" 3        0 .3111229   .3111229
                2010 500076106 "BR" 4 .4444444 .3289521 -.11549234
                2010 500078670 "BR" 2        0 .2156455   .2156455
                2010 500078849 "BR" 3        0 .3111229   .3111229
                2010 500079222 "BR" 3        0 .3111229   .3111229
                2010 500079366 "BR" 3 .4444444 .3111229 -.13332152
                2010 500080282 "BR" 3        0 .3111229   .3111229
                2010 500080338 "BR" 3        0 .3111229   .3111229
                2010 500081241 "BR" 3        0 .3111229   .3111229
                2010 500085079 "BR" 3        0 .3111229   .3111229
                2010 500085727 "BR" 4        0 .3289521   .3289521
                2010 500085816 "BR" 4        0 .3289521   .3289521
                2010 500085970 "BR" 4        0 .3289521   .3289521
                2010 500087647 "BR" 3        0 .3111229   .3111229
                2010 500088508 "BR" 4        0 .3289521   .3289521
                2010 500088687 "BR" 2        0 .2156455   .2156455
                2010 500090557 "BR" 3        0 .3111229   .3111229
                2010 500091124 "BR" 3        0 .3111229   .3111229
                2010 500094837 "BR" 4        0 .3289521   .3289521
                2010 500102022 "BR" 2        0 .2156455   .2156455
                2010 500106300 "BR" 2        0 .2156455   .2156455
                2010 500107114 "BR" 3        0 .3111229   .3111229
                2010 500107240 "BR" 3        0 .3111229   .3111229
                2010 500107355 "BR" 1        0 .2588833   .2588833
                2010 500107545 "BR" 2 .4444444 .2156455  -.2287989
                end

                Comment


                • #9
                  That's part of my question answered, only. You have yet to show how it should work. You're presuming that people know precisely how the Blau index was calculated and how it should behave at different scales. Not true in my case in a way that can help you.

                  Others may be able to help more.

                  Comment


                  • #10
                    Prof Cox,
                    Blau index is a well-known index and I don't have questions in terms of computing it. For example for education diversity:
                    where pi is the proportion of the workforce in Education group.
                    Click image for larger version

Name:	blau.png
Views:	1
Size:	6.6 KB
ID:	1755743

                    Code:
                    by NPC_FIC year Edugroup, sort: gen this_edu = _N
                    by NPC_FIC year (Edugroup): gen everedu = _N
                    gen proportionedu = this_edu/everedu
                    collapse (first) proportionedu, by(NPC_FIC year Edugroup)
                    reshape wide proportionedu, i(NPC_FIC year) j(Edugroup) 
                    mvencode proportionedu*, mv(0)
                    
                    gen blau_indexedu = 1 - (proportionedu1^2 + proportionedu2^2 + proportionedu3^2 + proportionedu4^2)

                    Comment

                    Working...
                    X