
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am working the data of binary variables on 30 districts and 8 questions (q01 to q08). Here is the data I have used:

    district q01 q02 q03 q04 q05 q06 q07 q08 population
    District 1 0 1 0 0 0 1 0 0 892
    District 2 0 0 0 1 0 1 0 0 138
    District 3 0 1 0 1 0 0 0 1 923
    District 4 1 1 0 1 1 0 0 0 887
    District 5 0 1 1 0 1 1 0 0 514
    District 6 1 0 1 1 1 0 1 1 578
    District 7 0 0 0 1 1 0 1 0 393
    District 8 1 0 0 1 0 1 0 0 566
    District 9 0 0 0 0 1 0 0 1 514
    District 10 1 0 0 0 0 0 0 1 770
    District 11 0 0 0 0 1 1 1 1 207
    District 12 1 1 0 1 0 0 0 1 625
    District 13 0 1 0 0 0 1 1 0 550
    District 14 1 0 1 1 0 0 0 0 596
    District 15 1 1 0 0 1 0 0 1 250
    District 16 1 1 1 0 1 1 0 0 481
    District 17 1 1 1 0 1 0 1 0 553
    District 18 1 0 1 1 0 0 1 1 652
    District 19 1 0 0 0 0 1 0 0 503
    District 20 0 1 1 1 0 0 0 1 234
    District 21 1 1 0 1 0 1 0 1 883
    District 22 1 1 0 0 0 1 0 0 344
    District 23 1 1 0 0 1 0 0 0 238
    District 24 1 1 1 1 0 1 0 1 944
    District 25 1 0 1 1 1 0 1 0 940
    District 26 0 0 0 1 0 1 1 0 730
    District 27 0 0 0 0 1 1 0 0 890
    District 28 1 1 1 1 1 0 0 0 916
    District 29 0 0 0 1 0 0 1 1 959
    District 30 0 1 0 1 0 1 0 1 404
    recode c01 (1=1 "1") (2=0 "0"), generate(binary_c01)
    recode c02 (1=1 "1") (2=0 "0"), generate(binary_c02)
    recode c03 (1=1 "1") (2=0 "0"), generate(binary_c03)
    recode c04 (1=1 "1") (2=0 "0"), generate(binary_c04)
    recode c05 (1=1 "1") (2=0 "0"), generate(binary_c05)
    recode c06 (1=1 "1") (2=0 "0"), generate(binary_c06)
    recode c07 (1=1 "1") (2=0 "0"), generate(binary_c07)
    recode c08 (1=1 "1") (2=0 "0"), generate(binary_c08)
    egen total_sum = rowtotal ( binary_c01 binary_c02 binary_c03 binary_c04 binary_c05 binary_c06 binary_c07 binary_c08)
    gen R_Sum = total_sum / 8
    collapse (mean) R_Sum , by(district)
    gen aggregate_population = R_Sum * Population
    **Normalized R_Sum**
    egen R_Sum_min = min(R_Sum)
    egen R_Sum_max = max(R_Sum)
    gen R_Sum_normalized = (R_Sum - R_Sum_min) / (R_Sum_max - R_Sum_min)
    **Normalized aggregate_population**
    egen aggregate_population_min = min(aggregate_population)
    egen aggregate_population_max = max(aggregate_population)
    gen aggregate_population_normalized = (aggregate_population - aggregate_population_min) / (aggregate_population_max - aggregate_population_min)
    list district R_Sum aggregate_population R_Sum_normalized aggregate_population_normalized
    The code runs without errors, and I observe the following:
    • One district has a normalized aggregate population value of 1.
    • One district has a normalized aggregate population value of 0.
    • The remaining districts have values between 0 and 1 after normalization.
    I understand that normalization should result in values between 0 and 1, but here one district is exactly "1" and the other is exactly "0", also I want to confirm if this is the correct approach for normalizing aggregate population values.

    Any advice or improvements to my approach would be greatly appreciated.

    Thank you for your assistance.

  • #2
    the best (yet inconclusive) advice I can give you is reposting your query on the General forum, as this one (as per FAQ) is for practising purposes only. Thanks.
    Kind regards,
    (StataNow 18.5)

