Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cultural Distance index

    Dear Profs and Colleagues,

    I am going to compute 2 formulas: Euclidean Distance (Standardized) and Kogut and Singh (1988) index.KSIij is the cultural distance between country i and country j. Iki and Ikj are the values of cultural dimension k for country i and country j, respectively. Vk is the variance of the cultural dimension k. (k here has 4 dimensions).
    Click image for larger version

Name:	2.png
Views:	1
Size:	28.5 KB
ID:	1743021

    Kogut and Singh (1988) index:
    Click image for larger version

Name:	10.png
Views:	1
Size:	17.3 KB
ID:	1743022
    First, I Computed the variance of each cultural dimension:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4)
      .  .   .   .
     80 38  53  68
     64 27  41  52
      .  .   .   .
      .  .   .   .
     49 46  56  86
     11 55  79  70
     38 90  61  51
      .  .   .   .
     80 20  55  60
     70 30  40  85
     69 38  49  76
     46 76  48  54
     43 67  67  61
     63 23  28  86
     80 20  66  30
     67 13  64  80
     35 15  21  86
      .  .   .   .
     57 58  57  74
     35 67  66  65
     18 74  16  23
      .  .   .   .
      .  .   .   .
     78  8  63  67
     40 60  30  60
      .  .   .   .
     57 51  42  86
      .  .   .   .
     33 63  26  59
     68 71  43  86
     35 89  66  35
      .  .   .   .
      .  .   .   .
     60 35  57 112
     95  6  37 101
     68 25  57  29
     73 33  40  80
     46 80  88  82
     78 14  46  48
     28 70  68  35
     13 54  47  81
     77 48  56  40
      .  .   .   .
     58 41  43  59
      .  .   .   .
     50 76  70  75
     45 39  68  13
      .  .   .   .
     54 46  95  92
      .  .   .   .
     60 18  39  85
     42 60  19  65
     40 60  50  70
     44 70   9  63
     70 46  53  68
      .  .   .   .
      .  .   .   .
      .  .   .   .
      .  .   .   .
     56 59  47  96
     81 30  69  82
    104 26  50  36
      .  .   .   .
     31 69   8  50
     22 79  58  49
     95 11  44  86
     64 16  42  87
     94 32  64  44
     55 14  50  70
     68 60  64  93
      .  .   .   .
     63 27  31 104
     90 30  42  90
     86 25  43  92
     93 39  36  95
      .  .   .   .
      .  .   .   .
     31 71   5  29
     74 20  48   8
     71 27  19  88
    104 52 110  51
     85 47  37  92
     64 20  34  64
     66 37  45  85
     47 16  58  55
     58 17  45  69
      .  .   .   .
      .  .   .   .
     40 91  62  46
     61 36  38 100
     81 12  73  76
     70 20  40  30
      .  .   .   .
      .  .   .   .
      .  .   .   .
      .  .   .   .
      .  .   .   .
      .  .   .   .
      .  .   .   .
    end
    Code:
    * Compute the mean of each cultural dimension
    egen mean_cultural_dimension1 = mean(cultural_dimension1)
    egen mean_cultural_dimension2 = mean(cultural_dimension2)
    egen mean_cultural_dimension3 = mean(cultural_dimension3)
    egen mean_cultural_dimension4 = mean(cultural_dimension4)
    
    * Compute the squared differences
    gen squared_diff_cultural_dimension1 = (cultural_dimension1 - mean_cultural_dimension1)^2
    gen squared_diff_cultural_dimension2 = (cultural_dimension2 - mean_cultural_dimension2)^2
    gen squared_diff_cultural_dimension3 = (cultural_dimension3 - mean_cultural_dimension3)^2
    gen squared_diff_cultural_dimension4 = (cultural_dimension4 - mean_cultural_dimension4)^2
    
    * Compute the variance of each cultural dimension
    egen var_cultural_dimension1 = mean(squared_diff_cultural_dimension1)
    egen var_cultural_dimension2 = mean(squared_diff_cultural_dimension2)
    egen var_cultural_dimension3 = mean(squared_diff_cultural_dimension3)
    egen var_cultural_dimension4 = mean(squared_diff_cultural_dimension4)
    Now I have variances for 4 dimensions :
    Code:
    input float(var_cultural_dimension1 var_cultural_dimension2 var_cultural_dimension3 var_cultural_dimension4)
    6.618873 14.25023 14.48565 62.29937
    6.618873 14.25023 14.48565 62.29937
    6.618873 14.25023 14.48565 62.29937
    .
    .
    .
    6.618873 14.25023 14.48565 62.29937
    end
    Afterward, I don't know what should I do to compute those two formulas. in case the variances are correct.

    Any ideas appreciated.
    Cheers,
    Paris

  • #2
    Your code for the variances looks correct. But I wouldn't do it this way: it's too complicated.

    I notice that there is no variable designating country in your data set. So I've made a new data set that includes a numeric country variable. (If you have a country variable in the full data set that is string, use -encode- to make it numeric.)

    I also notice that many of your observations contain only missing values. Those contribute nothing to the calculation, so the first step is to get rid of them before they clutter up the data set, which will grow quite large.

    Also, as the distance formulas are symmetric with respect to the country i and j indices, and both indices are 0 when i = j, I do the calculations only for pairs where i < j.

    Finally, the only difference between the KSI and the Euclidean distance (standardized) formula you show is that the KSI has a factor of four in the denominator that does not appear in the Euclidean distance. So I chose to calculate the KSI, and then just multiply by 4 to get the Euclidean distance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4) byte country
      .  .   .   .   1
     80 38  53  68   2
     64 27  41  52   3
      .  .   .   .   4
      .  .   .   .   5
     49 46  56  86   6
     11 55  79  70   7
     38 90  61  51   8
      .  .   .   .   9
     80 20  55  60  10
     70 30  40  85  11
     69 38  49  76  12
     46 76  48  54  13
     43 67  67  61  14
     63 23  28  86  15
     80 20  66  30  16
     67 13  64  80  17
     35 15  21  86  18
      .  .   .   .  19
     57 58  57  74  20
     35 67  66  65  21
     18 74  16  23  22
      .  .   .   .  23
      .  .   .   .  24
     78  8  63  67  25
     40 60  30  60  26
      .  .   .   .  27
     57 51  42  86  28
      .  .   .   .  29
     33 63  26  59  30
     68 71  43  86  31
     35 89  66  35  32
      .  .   .   .  33
      .  .   .   .  34
     60 35  57 112  35
     95  6  37 101  36
     68 25  57  29  37
     73 33  40  80  38
     46 80  88  82  39
     78 14  46  48  40
     28 70  68  35  41
     13 54  47  81  42
     77 48  56  40  43
      .  .   .   .  44
     58 41  43  59  45
      .  .   .   .  46
     50 76  70  75  47
     45 39  68  13  48
      .  .   .   .  49
     54 46  95  92  50
      .  .   .   .  51
     60 18  39  85  52
     42 60  19  65  53
     40 60  50  70  54
     44 70   9  63  55
     70 46  53  68  56
      .  .   .   .  57
      .  .   .   .  58
      .  .   .   .  59
      .  .   .   .  60
     56 59  47  96  61
     81 30  69  82  62
    104 26  50  36  63
      .  .   .   .  64
     31 69   8  50  65
     22 79  58  49  66
     95 11  44  86  67
     64 16  42  87  68
     94 32  64  44  69
     55 14  50  70  70
     68 60  64  93  71
      .  .   .   .  72
     63 27  31 104  73
     90 30  42  90  74
     86 25  43  92  75
     93 39  36  95  76
      .  .   .   .  77
      .  .   .   .  78
     31 71   5  29  79
     74 20  48   8  80
     71 27  19  88  81
    104 52 110  51  82
     85 47  37  92  83
     64 20  34  64  84
     66 37  45  85  85
     47 16  58  55  86
     58 17  45  69  87
      .  .   .   .  88
      .  .   .   .  89
     40 91  62  46  90
     61 36  38 100  91
     81 12  73  76  92
     70 20  40  30  93
      .  .   .   .  94
      .  .   .   .  95
      .  .   .   .  96
      .  .   .   .  97
      .  .   .   .  98
      .  .   .   .  99
      .  .   .   . 100
    end
    
    egen mcount = rowmiss(cultural_dimension*)
    drop if mcount > 0
    
    reshape long cultural_dimension, i(country) j(k)
    by k, sort: egen Vk = sd(cultural_dimension)
    replace Vk = Vk^2
    tempfile copy
    save `copy'
    
    rangejoin country 1 . using `copy', by(k)
    gen delta = cultural_dimension - cultural_dimension_U
    by country country_U (k), sort: egen ksi = total((delta^2)/(4*Vk))
    collapse (first) ksi, by(country country_U)
    gen euclidean = 4*ksi
    rename (country country_U) (country_i country_j)
    -rangejoin- is written by Robert Picard and is available from SSC. To use it you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.
    Last edited by Clyde Schechter; 12 Feb 2024, 17:05.

    Comment


    • #3
      Thank you Prof Clyde.
      Country has a variable, sorry oversighted it.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str6 nacio
      "AD"
      "AE"
      "AF"
      "AL"
      "AM"
      "AR"
      "AT"
      "AU"
      "AZ"
      "BD"
      "BG"
      "BR"
      "CA"
      "CH"
      "CL"
      "CN"
      "CO"
      "CR"
      "CY"
      "CZ"
      "DE"
      "DK"
      "DM"
      "DZ"
      "EC"
      "EE"
      "EG"
      "ES"
      "ET"
      "FI"
      "FR"
      "GB"
      "GE"
      "GH"
      "GR"
      "GT"
      "HK"
      "HR"
      "HU"
      "ID"
      "IE"
      "IL"
      "IN"
      "IQ"
      "IR"
      "IS"
      "IT"
      "JM"
      "JO"
      "JP"
      "KG"
      "KP"
      "LT"
      "LU"
      "LV"
      "MA"
      "MD"
      "ME"
      "MK"
      "ML"
      "MT"
      "MX"
      "MY"
      "NG"
      "NO"
      "NZ"
      "PA"
      "PE"
      "PH"
      "PK"
      "PL"
      "PR"
      "PT"
      "RO"
      "RS"
      "RU"
      "RW"
      "SA"
      "SE"
      "SG"
      "SI"
      "SK"
      "SR"
      "TH"
      "TR"
      "TT"
      "TW"
      "UA"
      "UG"
      "US"
      "UY"
      "VE"
      "VN"
      "ZM"
      "ZW"
      "AD"
      "AD"
      "AD"
      "AD"
      "AD"
      end
      Also, I should compute the cultural distance between the specific country, PT, with other countries. Moreover, I believe that the difference between these two formulas is not only in the dominator of 4. Euclidean Distance (Standardized) has radical while KSI does not.

      Comment


      • #4
        You are correct about the radical. Sorry. That just requires you to change the last line to:
        Code:
        gen euclidian = sqrt(4*ksi)
        The fact that you only have to compare PT to the other countries, and not all countries with each other, makes it simpler. You really should have said that in #1!
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4) str6 nacio
          .  .   .   . "AD"
         80 38  53  68 "AE"
         64 27  41  52 "AF"
          .  .   .   . "AL"
          .  .   .   . "AM"
         49 46  56  86 "AR"
         11 55  79  70 "AT"
         38 90  61  51 "AU"
          .  .   .   . "AZ"
         80 20  55  60 "BD"
         70 30  40  85 "BG"
         69 38  49  76 "BR"
         46 76  48  54 "CA"
         43 67  67  61 "CH"
         63 23  28  86 "CL"
         80 20  66  30 "CN"
         67 13  64  80 "CO"
         35 15  21  86 "CR"
          .  .   .   . "CY"
         57 58  57  74 "CZ"
         35 67  66  65 "DE"
         18 74  16  23 "DK"
          .  .   .   . "DM"
          .  .   .   . "DZ"
         78  8  63  67 "EC"
         40 60  30  60 "EE"
          .  .   .   . "EG"
         57 51  42  86 "ES"
          .  .   .   . "ET"
         33 63  26  59 "FI"
         68 71  43  86 "FR"
         35 89  66  35 "GB"
          .  .   .   . "GE"
          .  .   .   . "GH"
         60 35  57 112 "GR"
         95  6  37 101 "GT"
         68 25  57  29 "HK"
         73 33  40  80 "HR"
         46 80  88  82 "HU"
         78 14  46  48 "ID"
         28 70  68  35 "IE"
         13 54  47  81 "IL"
         77 48  56  40 "IN"
          .  .   .   . "IQ"
         58 41  43  59 "IR"
          .  .   .   . "IS"
         50 76  70  75 "IT"
         45 39  68  13 "JM"
          .  .   .   . "JO"
         54 46  95  92 "JP"
          .  .   .   . "KG"
         60 18  39  85 "KP"
         42 60  19  65 "LT"
         40 60  50  70 "LU"
         44 70   9  63 "LV"
         70 46  53  68 "MA"
          .  .   .   . "MD"
          .  .   .   . "ME"
          .  .   .   . "MK"
          .  .   .   . "ML"
         56 59  47  96 "MT"
         81 30  69  82 "MX"
        104 26  50  36 "MY"
          .  .   .   . "NG"
         31 69   8  50 "NO"
         22 79  58  49 "NZ"
         95 11  44  86 "PA"
         64 16  42  87 "PE"
         94 32  64  44 "PH"
         55 14  50  70 "PK"
         68 60  64  93 "PL"
          .  .   .   . "PR"
         63 27  31 104 "PT"
         90 30  42  90 "RO"
         86 25  43  92 "RS"
         93 39  36  95 "RU"
          .  .   .   . "RW"
          .  .   .   . "SA"
         31 71   5  29 "SE"
         74 20  48   8 "SG"
         71 27  19  88 "SI"
        104 52 110  51 "SK"
         85 47  37  92 "SR"
         64 20  34  64 "TH"
         66 37  45  85 "TR"
         47 16  58  55 "TT"
         58 17  45  69 "TW"
          .  .   .   . "UA"
          .  .   .   . "UG"
         40 91  62  46 "US"
         61 36  38 100 "UY"
         81 12  73  76 "VE"
         70 20  40  30 "VN"
          .  .   .   . "ZM"
          .  .   .   . "ZW"
          .  .   .   . "AD"
          .  .   .   . "AD"
          .  .   .   . "AD"
          .  .   .   . "AD"
          .  .   .   . "AD"
        end
        
        
        egen mcount = rowmiss(cultural_dimension*)
        drop if mcount > 0
        
        reshape long cultural_dimension, i(nacio) j(k)
        by k, sort: egen Vk = sd(cultural_dimension)
        replace Vk = Vk^2
        
        by k: egen cultural_dimension_U = max(cond(nacio == "PT", cultural_dimension, .))
        
        gen delta = cultural_dimension - cultural_dimension_U
        by nacio (k), sort: egen ksi = total((delta^2)/(4*Vk))
        collapse (first) ksi, by(nacio)
        gen euclidean = sqrt(4*ksi)

        Comment


        • #5
          Thank you very much, Professor. Clyde.
          Your assistance is consistently excellent, and I greatly appreciate it. The solution worked perfectly, as usual.
          Code:
          egen mcount = rowmiss(cultural_dimension*)
          drop if mcount > 0
          reshape long cultural_dimension, i(nacio) j(k)
          by k, sort: egen Vk = sd(cultural_dimension)
          replace Vk = Vk^2
          
          by k: egen cultural_dimension_U = max(cond(nacio == "PT", cultural_dimension, .))
          
          gen delta = cultural_dimension - cultural_dimension_U
          by nacio (k), sort: egen ksi = total((delta^2)/(4*Vk))
          collapse (first) ksi, by(nacio)
          gen euclidean = sqrt(4*ksi)
          sum ksi euclidean
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                   ksi |         68    1.934141    1.588269          0   6.841477
             euclidean |         68     2.52845    1.167715          0   5.231244
          Cheers,

          Comment

          Working...
          X