Hello everyone,
I have tried to keep my question as concise as possible but wanted to provide enough context to understand the issue I am having. Thank you in advance for your patience with the length of my post.
I am working on constructing an index of disparity and an inclusion score that measure how different racial and ethnic groups are doing on various indicators such as unemployment and educational attainment relative to the overall population. To calculate my index of disparity, I am using an approach that the National Equity Atlas derived from Pearcy and Keppel's 2002 paper, "A Summary Measure of Health Disparity." I am having a little trouble reproducing the values of the index that the National Equity Atlas used to calculate their index of disparity, which they use to develop an inclusion score.
Here below is the National Equity Atlas' formula for their index of disparity (ID):data:image/s3,"s3://crabby-images/bcf82/bcf82d67f83829442ac7e160fb3b20dadee3a698" alt="Click image for larger version
Name: Screen Shot 2022-04-22 at 7.32.16 PM.png
Views: 1
Size: 77.8 KB
ID: 1661718"
In the formula, r is the indicator value for each racial/ethnic group i, R is the value for the total population, and n is the number of racial/ethnic groups with valid data. The researchers calculated this index only for indicators where data were available for two or more racial/ethnic groups. This index averages the absolute value of the differences between each group and the overall population, and expresses it as a percentage of the overall population value.
Here is their summary of how they calculate the inclusion score after creating the index:
(1) With index of disparity values to capture inclusion for each indicator in place, they normalized the values by converting them into z-scores, which they computed relative to other counties in the United States.
(2) They then reversed the signs of all z-scores for index values (so that higher values are indicative of equity rather than disparity).
(3) Next, they normalized the resulting z-scores using min-max scaling, which expresses each z-score as a percentage of the range between the minimum and maximum score. The result is an inclusion score that ranges between zero and 100.
(4) They then reset scores below one to one so that the final inclusion scores range from one to 100.
I tried to replicate the index and inclusion score steps in Stata to understand how these measures were created. I use unemployment as an example indicator. Here below is my code. I am using Stata 15.1.
And here below is the National Equity Atlas dataset.
I was able to run the code, but the numbers I get for my values are not matching up with that of the original variable.
For example, when I run the following code with the dataset's inclusion score variable, I get 87.38:
When I run the following code with my own inclusion score variable, I get 90.29:
Could anyone please help me understand where I am going wrong with coding the index and inclusion score?
Thanks very much for any and all advice!
I have tried to keep my question as concise as possible but wanted to provide enough context to understand the issue I am having. Thank you in advance for your patience with the length of my post.
I am working on constructing an index of disparity and an inclusion score that measure how different racial and ethnic groups are doing on various indicators such as unemployment and educational attainment relative to the overall population. To calculate my index of disparity, I am using an approach that the National Equity Atlas derived from Pearcy and Keppel's 2002 paper, "A Summary Measure of Health Disparity." I am having a little trouble reproducing the values of the index that the National Equity Atlas used to calculate their index of disparity, which they use to develop an inclusion score.
Here below is the National Equity Atlas' formula for their index of disparity (ID):
In the formula, r is the indicator value for each racial/ethnic group i, R is the value for the total population, and n is the number of racial/ethnic groups with valid data. The researchers calculated this index only for indicators where data were available for two or more racial/ethnic groups. This index averages the absolute value of the differences between each group and the overall population, and expresses it as a percentage of the overall population value.
Here is their summary of how they calculate the inclusion score after creating the index:
(1) With index of disparity values to capture inclusion for each indicator in place, they normalized the values by converting them into z-scores, which they computed relative to other counties in the United States.
(2) They then reversed the signs of all z-scores for index values (so that higher values are indicative of equity rather than disparity).
(3) Next, they normalized the resulting z-scores using min-max scaling, which expresses each z-score as a percentage of the range between the minimum and maximum score. The result is an inclusion score that ranges between zero and 100.
(4) They then reset scores below one to one so that the final inclusion scores range from one to 100.
I tried to replicate the index and inclusion score steps in Stata to understand how these measures were created. I use unemployment as an example indicator. Here below is my code. I am using Stata 15.1.
Code:
egen inc2 = total((abs(punemp_nhw + punemp_nhb + punemp_his + punemp_nhapi + punemp_nhaian-punemp_all))/5) replace inc2 = inc2/(punemp_all) replace inc2 = inc2*100 *Check new variable tab inc2, m *Standardize variable zscore inc2, stub (std) tab stdinc2,m *reverse sign of z-score, and check new variable replace stdinc2 = -stdinc2 if stdinc2 > 0 tab stdinc2, m sum stdinc2 *min max scaling nscale stdinc2, gen(inc_new) *Check new variable tab inc_new, m sum inc_new //Reset scores from 0-100 gen inc_final = inc_new * 100 *Check new variable tab inc_final, m *Check the original inclusion score variable in the dataset by seeing what it is for a particular county tab is_econvit02 if geo_name_short =="Baldwin" *Check the inclusion score variable I created by seeing what it is for a particular county tab inc_final if geo_name_short =="Baldwin" sum inc_final //reset scores <1 to 1 so that we have a range from 1-100. replace inc_final = 1 if inc_final < 1 *Check new variable tab inc_final, m sum inc_final
And here below is the National Equity Atlas dataset.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(punemp_all punemp_nhw punemp_nhb punemp_his punemp_nhapi punemp_nhaian punemp_nhoo is_econvit02) str50 geo_name_short str5 geo_code str2 geo_type str4 year .03328717046574591 .03101006336190831 .04544117647058824 . . . . 87.37863808208324 "Baldwin" "01003" "04" "2019" .06980448950036205 .05940369939205795 .0978940650925335 . . . . 83.75185880428843 "Calhoun" "01015" "04" "2019" .04804434862950416 .0445805440820396 .06168999481596682 . . . . 89.76070044913462 "Etowah" "01055" "04" "2019" .05427564092822516 .03180902823712719 .08522727272727272 .02897950734837508 .02147926783713112 . .07069705979517674 71.70766869803604 "Jefferson" "01073" "04" "2019" .03502516079581431 .02725659229208925 .05389379364870207 . . . . 77.31749841991041 "Lee" "01081" "04" "2019" .04640576929438619 .03535998208487291 .07230124179990771 .01370458994996737 .00364109232769831 . . 63.452168535884645 "Mobile" "01097" "04" "2019" .02740894972181452 .02648805653097499 .03997523657911029 .01925854597977853 . . . 84.52456052247526 "Shelby" "01117" "04" "2019" .05140127184993449 .03963380316920345 .04713212973318422 .0563663282571912 .05200141693234148 .12216142270861834 .10030959752321982 72.52940299078824 "Anchorage" "02020" "04" "2019" .04571439874982691 .02114108506452366 . .03698830409356725 . .11790354931542688 . 53.38316298221631 "Coconino" "04005" "04" "2019" .03936291604991454 .03630944881030451 .06060764142478559 .04028756855798191 .03149732927459652 .07307010242119359 .04936922031494667 80.70651814821439 "Maricopa" "04013" "04" "2019" end label var punemp_all "Percent unemployed - all" label var punemp_nhw "Percent unemployed - Non-Hispanic White" label var punemp_nhb "Percent unemployed - Non-Hispanic Black" label var punemp_his "Percent unemployed - Hispanic" label var punemp_nhapi "Percent unemployed - Asian or Pacific Islander" label var punemp_nhaian "Percent unemployed - American Indian/Alaska Native" label var punemp_nhoo "Percent unemployed - Other race or multiracial" label var is_econvit02 "Inclusion score - unemployment" label var geo_name_short "County Name" label var geo_code "geo_code" label var geo_type "County FIPS code" label var year "Year of data collection"
For example, when I run the following code with the dataset's inclusion score variable, I get 87.38:
Code:
*Check the original inclusion score variable in the dataset by seeing what it is for a particular county tab is_econvit02 if geo_name_short =="Baldwin"
Code:
*Check the inclusion score variable I created by seeing what it is for a particular county tab inc_final if geo_name_short =="Baldwin" sum inc_final
Could anyone please help me understand where I am going wrong with coding the index and inclusion score?
Thanks very much for any and all advice!
Comment