Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Disparity Index in Stata

    Hello everyone,

    I have tried to keep my question as concise as possible but wanted to provide enough context to understand the issue I am having. Thank you in advance for your patience with the length of my post.

    I am working on constructing an index of disparity and an inclusion score that measure how different racial and ethnic groups are doing on various indicators such as unemployment and educational attainment relative to the overall population. To calculate my index of disparity, I am using an approach that the National Equity Atlas derived from Pearcy and Keppel's 2002 paper, "A Summary Measure of Health Disparity." I am having a little trouble reproducing the values of the index that the National Equity Atlas used to calculate their index of disparity, which they use to develop an inclusion score.



    Here below is the National Equity Atlas' formula for their index of disparity (ID):
    Click image for larger version

Name:	Screen Shot 2022-04-22 at 7.32.16 PM.png
Views:	1
Size:	77.8 KB
ID:	1661718





    In the formula, r is the indicator value for each racial/ethnic group i, R is the value for the total population, and n is the number of racial/ethnic groups with valid data. The researchers calculated this index only for indicators where data were available for two or more racial/ethnic groups. This index averages the absolute value of the differences between each group and the overall population, and expresses it as a percentage of the overall population value.


    Here is their summary of how they calculate the inclusion score after creating the index:

    (1) With index of disparity values to capture inclusion for each indicator in place, they normalized the values by converting them into z-scores, which they computed relative to other counties in the United States.
    (2) They then reversed the signs of all z-scores for index values (so that higher values are indicative of equity rather than disparity).
    (3) Next, they normalized the resulting z-scores using min-max scaling, which expresses each z-score as a percentage of the range between the minimum and maximum score. The result is an inclusion score that ranges between zero and 100.
    (4) They then reset scores below one to one so that the final inclusion scores range from one to 100.

    I tried to replicate the index and inclusion score steps in Stata to understand how these measures were created. I use unemployment as an example indicator. Here below is my code. I am using Stata 15.1.

    Code:
    egen inc2 = total((abs(punemp_nhw + punemp_nhb + punemp_his + punemp_nhapi + punemp_nhaian-punemp_all))/5)
        replace inc2 = inc2/(punemp_all)
        replace inc2 = inc2*100
    
    *Check new variable
    tab inc2, m
    
    *Standardize variable
    zscore inc2, stub (std)
    tab stdinc2,m
    
    *reverse sign of z-score, and check new variable
    replace stdinc2 = -stdinc2 if stdinc2 > 0
    tab stdinc2, m
    sum stdinc2
    
    *min max scaling
    nscale stdinc2, gen(inc_new)
    
    *Check new variable
    tab inc_new, m
    sum inc_new
    
    //Reset scores from 0-100
    gen inc_final = inc_new * 100
    
    *Check new variable
    tab inc_final, m
    
    *Check the original inclusion score variable in the dataset by seeing what it is for a particular county
    tab is_econvit02 if geo_name_short =="Baldwin"
    
    *Check the inclusion score variable I created by seeing what it is for a particular county
    tab inc_final if geo_name_short =="Baldwin"
    sum inc_final
    
    //reset scores <1 to 1 so that we have a range from 1-100.
    replace inc_final = 1 if inc_final < 1
    
    *Check new variable
    tab inc_final, m
    sum inc_final

    And here below is the National Equity Atlas dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(punemp_all punemp_nhw punemp_nhb punemp_his punemp_nhapi punemp_nhaian punemp_nhoo is_econvit02) str50 geo_name_short str5 geo_code str2 geo_type str4 year
    .03328717046574591 .03101006336190831 .04544117647058824                  .                  .                  .                  .  87.37863808208324 "Baldwin"   "01003" "04" "2019"
    .06980448950036205 .05940369939205795  .0978940650925335                  .                  .                  .                  .  83.75185880428843 "Calhoun"   "01015" "04" "2019"
    .04804434862950416  .0445805440820396 .06168999481596682                  .                  .                  .                  .  89.76070044913462 "Etowah"    "01055" "04" "2019"
    .05427564092822516 .03180902823712719 .08522727272727272 .02897950734837508 .02147926783713112                  . .07069705979517674  71.70766869803604 "Jefferson" "01073" "04" "2019"
    .03502516079581431 .02725659229208925 .05389379364870207                  .                  .                  .                  .  77.31749841991041 "Lee"       "01081" "04" "2019"
    .04640576929438619 .03535998208487291 .07230124179990771 .01370458994996737 .00364109232769831                  .                  . 63.452168535884645 "Mobile"    "01097" "04" "2019"
    .02740894972181452 .02648805653097499 .03997523657911029 .01925854597977853                  .                  .                  .  84.52456052247526 "Shelby"    "01117" "04" "2019"
    .05140127184993449 .03963380316920345 .04713212973318422  .0563663282571912 .05200141693234148 .12216142270861834 .10030959752321982  72.52940299078824 "Anchorage" "02020" "04" "2019"
    .04571439874982691 .02114108506452366                  . .03698830409356725                  . .11790354931542688                  .  53.38316298221631 "Coconino"  "04005" "04" "2019"
    .03936291604991454 .03630944881030451 .06060764142478559 .04028756855798191 .03149732927459652 .07307010242119359 .04936922031494667  80.70651814821439 "Maricopa"  "04013" "04" "2019"
    end
    label var punemp_all "Percent unemployed  - all"
    label var punemp_nhw "Percent unemployed  - Non-Hispanic White"
    label var punemp_nhb "Percent unemployed  - Non-Hispanic Black"
    label var punemp_his "Percent unemployed  - Hispanic"
    label var punemp_nhapi "Percent unemployed  - Asian or Pacific Islander"
    label var punemp_nhaian "Percent unemployed  - American Indian/Alaska Native"
    label var punemp_nhoo "Percent unemployed  - Other race or multiracial"
    label var is_econvit02 "Inclusion score - unemployment"
    label var geo_name_short "County Name"
    label var geo_code "geo_code"
    label var geo_type "County FIPS code"
    label var year "Year of data collection"
    I was able to run the code, but the numbers I get for my values are not matching up with that of the original variable.

    For example, when I run the following code with the dataset's inclusion score variable, I get 87.38:

    Code:
    *Check the original inclusion score variable in the dataset by seeing what it is for a particular county
    tab is_econvit02 if geo_name_short =="Baldwin"
    When I run the following code with my own inclusion score variable, I get 90.29:

    Code:
    *Check the inclusion score variable I created by seeing what it is for a particular county
    tab inc_final if geo_name_short =="Baldwin"
    sum inc_final

    Could anyone please help me understand where I am going wrong with coding the index and inclusion score?

    Thanks very much for any and all advice!
    Last edited by Akosua Manu; 26 Apr 2022, 16:43.

  • #2
    I don't really understand the rationale for the inclusion score. But, regardless, in
    Code:
    *reverse sign of z-score, and check new variable
    replace stdinc2 = -stdinc2 if stdinc2 > 0
    the code is not doing what the comment says. The code only reverses the sign of the z-score if the original z-score is positive. If the desire is to reverse the sign of all z-scores, it should be just -replace stdinc2 = -stdinc2-, with no -if- clause.

    Added: I also don't think that
    Code:
    egen inc2 = total((abs(punemp_nhw + punemp_nhb + punemp_his + punemp_nhapi + punemp_nhaian-punemp_all))/5)
    replace inc2 = inc2/(punemp_all)
    replace inc2 = inc2*100
    correctly implements the formula for the disparity index. I think it should be
    Code:
    gen inc2 = 0
    foreach v of varlist punemp_nhw-punemp_nhoo {
        replace inc2 = inc2 + abs(`v'-punemp_all) if !missing(`v', punemp_all)
    }
    egen denom = rownonmiss(punemp_nhw-punemp_nhoo)
    replace inc2 = inc2*100/(denom*punemp_all)
    replace inc2 = . if missing(punemp_all)
    Last edited by Clyde Schechter; 26 Apr 2022, 18:28.

    Comment


    • #3
      Hello Clyde, thanks very much for your response. The inclusion score essentially provides information on racial disparities in a particular indicator, such as unemployment. One final part of the approach that I didn't mention involves taking the geometric mean of the inclusion scores for all of the indicators I'm interested in. I'll then take the geometric mean of that inclusion score and the score for indicator values for the entire population to get a single equity index that shows the level of population outcomes and racial disparities in those outcomes within an area.

      Thanks also for catching the error in my code. However, I am unfortunately still not getting the same score as the original variable in the dataset with your suggestion. My inclusion score is 64.67 as compared to the dataset's inclusion score of 87.38.

      Comment


      • #4
        Hello Clyde,

        I am just now seeing the edits that you shared to your post. I previously only made changes based on your original post. I will try the revised code that you've shared and follow up. Thanks again.

        Comment


        • #5
          Hello Clyde,

          Thanks again for the code. Your approach worked much better. I got an inclusion score of 86.80 for my own variable as compared to the dataset's inclusion score of 87.38. When I ran a correlation between my inclusion score and the dataset's, it was perfectly correlated. The means and standard deviations are also fairly close. I really appreciate your help.

          Comment

          Working...
          X