Cultural Distance index

Paris Rira

Join Date: Dec 2022
Posts: 384

Cultural Distance index

12 Feb 2024, 15:24

Dear Profs and Colleagues,

I am going to compute 2 formulas: Euclidean Distance (Standardized) and Kogut and Singh (1988) index.KSIij is the cultural distance between country i and country j. Iki and Ikj are the values of cultural dimension k for country i and country j, respectively. Vk is the variance of the cultural dimension k. (k here has 4 dimensions).

Click image for larger version

Name: 2.png
Views: 1
Size: 28.5 KB
ID: 1743021

Kogut and Singh (1988) index:

Click image for larger version

Name: 10.png
Views: 1
Size: 17.3 KB
ID: 1743022

First, I Computed the variance of each cultural dimension:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4)
  .  .   .   .
 80 38  53  68
 64 27  41  52
  .  .   .   .
  .  .   .   .
 49 46  56  86
 11 55  79  70
 38 90  61  51
  .  .   .   .
 80 20  55  60
 70 30  40  85
 69 38  49  76
 46 76  48  54
 43 67  67  61
 63 23  28  86
 80 20  66  30
 67 13  64  80
 35 15  21  86
  .  .   .   .
 57 58  57  74
 35 67  66  65
 18 74  16  23
  .  .   .   .
  .  .   .   .
 78  8  63  67
 40 60  30  60
  .  .   .   .
 57 51  42  86
  .  .   .   .
 33 63  26  59
 68 71  43  86
 35 89  66  35
  .  .   .   .
  .  .   .   .
 60 35  57 112
 95  6  37 101
 68 25  57  29
 73 33  40  80
 46 80  88  82
 78 14  46  48
 28 70  68  35
 13 54  47  81
 77 48  56  40
  .  .   .   .
 58 41  43  59
  .  .   .   .
 50 76  70  75
 45 39  68  13
  .  .   .   .
 54 46  95  92
  .  .   .   .
 60 18  39  85
 42 60  19  65
 40 60  50  70
 44 70   9  63
 70 46  53  68
  .  .   .   .
  .  .   .   .
  .  .   .   .
  .  .   .   .
 56 59  47  96
 81 30  69  82
104 26  50  36
  .  .   .   .
 31 69   8  50
 22 79  58  49
 95 11  44  86
 64 16  42  87
 94 32  64  44
 55 14  50  70
 68 60  64  93
  .  .   .   .
 63 27  31 104
 90 30  42  90
 86 25  43  92
 93 39  36  95
  .  .   .   .
  .  .   .   .
 31 71   5  29
 74 20  48   8
 71 27  19  88
104 52 110  51
 85 47  37  92
 64 20  34  64
 66 37  45  85
 47 16  58  55
 58 17  45  69
  .  .   .   .
  .  .   .   .
 40 91  62  46
 61 36  38 100
 81 12  73  76
 70 20  40  30
  .  .   .   .
  .  .   .   .
  .  .   .   .
  .  .   .   .
  .  .   .   .
  .  .   .   .
  .  .   .   .
end

Code:

* Compute the mean of each cultural dimension
egen mean_cultural_dimension1 = mean(cultural_dimension1)
egen mean_cultural_dimension2 = mean(cultural_dimension2)
egen mean_cultural_dimension3 = mean(cultural_dimension3)
egen mean_cultural_dimension4 = mean(cultural_dimension4)

* Compute the squared differences
gen squared_diff_cultural_dimension1 = (cultural_dimension1 - mean_cultural_dimension1)^2
gen squared_diff_cultural_dimension2 = (cultural_dimension2 - mean_cultural_dimension2)^2
gen squared_diff_cultural_dimension3 = (cultural_dimension3 - mean_cultural_dimension3)^2
gen squared_diff_cultural_dimension4 = (cultural_dimension4 - mean_cultural_dimension4)^2

* Compute the variance of each cultural dimension
egen var_cultural_dimension1 = mean(squared_diff_cultural_dimension1)
egen var_cultural_dimension2 = mean(squared_diff_cultural_dimension2)
egen var_cultural_dimension3 = mean(squared_diff_cultural_dimension3)
egen var_cultural_dimension4 = mean(squared_diff_cultural_dimension4)

Now I have variances for 4 dimensions :

Code:

input float(var_cultural_dimension1 var_cultural_dimension2 var_cultural_dimension3 var_cultural_dimension4)
6.618873 14.25023 14.48565 62.29937
6.618873 14.25023 14.48565 62.29937
6.618873 14.25023 14.48565 62.29937
.
.
.
6.618873 14.25023 14.48565 62.29937
end

Afterward, I don't know what should I do to compute those two formulas. in case the variances are correct.

Any ideas appreciated.
Cheers,
Paris

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

12 Feb 2024, 16:03

Your code for the variances looks correct. But I wouldn't do it this way: it's too complicated.

I notice that there is no variable designating country in your data set. So I've made a new data set that includes a numeric country variable. (If you have a country variable in the full data set that is string, use -encode- to make it numeric.)

I also notice that many of your observations contain only missing values. Those contribute nothing to the calculation, so the first step is to get rid of them before they clutter up the data set, which will grow quite large.

Also, as the distance formulas are symmetric with respect to the country i and j indices, and both indices are 0 when i = j, I do the calculations only for pairs where i < j.

Finally, the only difference between the KSI and the Euclidean distance (standardized) formula you show is that the KSI has a factor of four in the denominator that does not appear in the Euclidean distance. So I chose to calculate the KSI, and then just multiply by 4 to get the Euclidean distance.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4) byte country
  .  .   .   .   1
 80 38  53  68   2
 64 27  41  52   3
  .  .   .   .   4
  .  .   .   .   5
 49 46  56  86   6
 11 55  79  70   7
 38 90  61  51   8
  .  .   .   .   9
 80 20  55  60  10
 70 30  40  85  11
 69 38  49  76  12
 46 76  48  54  13
 43 67  67  61  14
 63 23  28  86  15
 80 20  66  30  16
 67 13  64  80  17
 35 15  21  86  18
  .  .   .   .  19
 57 58  57  74  20
 35 67  66  65  21
 18 74  16  23  22
  .  .   .   .  23
  .  .   .   .  24
 78  8  63  67  25
 40 60  30  60  26
  .  .   .   .  27
 57 51  42  86  28
  .  .   .   .  29
 33 63  26  59  30
 68 71  43  86  31
 35 89  66  35  32
  .  .   .   .  33
  .  .   .   .  34
 60 35  57 112  35
 95  6  37 101  36
 68 25  57  29  37
 73 33  40  80  38
 46 80  88  82  39
 78 14  46  48  40
 28 70  68  35  41
 13 54  47  81  42
 77 48  56  40  43
  .  .   .   .  44
 58 41  43  59  45
  .  .   .   .  46
 50 76  70  75  47
 45 39  68  13  48
  .  .   .   .  49
 54 46  95  92  50
  .  .   .   .  51
 60 18  39  85  52
 42 60  19  65  53
 40 60  50  70  54
 44 70   9  63  55
 70 46  53  68  56
  .  .   .   .  57
  .  .   .   .  58
  .  .   .   .  59
  .  .   .   .  60
 56 59  47  96  61
 81 30  69  82  62
104 26  50  36  63
  .  .   .   .  64
 31 69   8  50  65
 22 79  58  49  66
 95 11  44  86  67
 64 16  42  87  68
 94 32  64  44  69
 55 14  50  70  70
 68 60  64  93  71
  .  .   .   .  72
 63 27  31 104  73
 90 30  42  90  74
 86 25  43  92  75
 93 39  36  95  76
  .  .   .   .  77
  .  .   .   .  78
 31 71   5  29  79
 74 20  48   8  80
 71 27  19  88  81
104 52 110  51  82
 85 47  37  92  83
 64 20  34  64  84
 66 37  45  85  85
 47 16  58  55  86
 58 17  45  69  87
  .  .   .   .  88
  .  .   .   .  89
 40 91  62  46  90
 61 36  38 100  91
 81 12  73  76  92
 70 20  40  30  93
  .  .   .   .  94
  .  .   .   .  95
  .  .   .   .  96
  .  .   .   .  97
  .  .   .   .  98
  .  .   .   .  99
  .  .   .   . 100
end

egen mcount = rowmiss(cultural_dimension*)
drop if mcount > 0

reshape long cultural_dimension, i(country) j(k)
by k, sort: egen Vk = sd(cultural_dimension)
replace Vk = Vk^2
tempfile copy
save `copy'

rangejoin country 1 . using `copy', by(k)
gen delta = cultural_dimension - cultural_dimension_U
by country country_U (k), sort: egen ksi = total((delta^2)/(4*Vk))
collapse (first) ksi, by(country country_U)
gen euclidean = 4*ksi
rename (country country_U) (country_i country_j)

-rangejoin- is written by Robert Picard and is available from SSC. To use it you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

Last edited by Clyde Schechter; 12 Feb 2024, 16:05.

Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

12 Feb 2024, 16:22

Thank you Prof Clyde.
Country has a variable, sorry oversighted it.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 nacio
"AD"
"AE"
"AF"
"AL"
"AM"
"AR"
"AT"
"AU"
"AZ"
"BD"
"BG"
"BR"
"CA"
"CH"
"CL"
"CN"
"CO"
"CR"
"CY"
"CZ"
"DE"
"DK"
"DM"
"DZ"
"EC"
"EE"
"EG"
"ES"
"ET"
"FI"
"FR"
"GB"
"GE"
"GH"
"GR"
"GT"
"HK"
"HR"
"HU"
"ID"
"IE"
"IL"
"IN"
"IQ"
"IR"
"IS"
"IT"
"JM"
"JO"
"JP"
"KG"
"KP"
"LT"
"LU"
"LV"
"MA"
"MD"
"ME"
"MK"
"ML"
"MT"
"MX"
"MY"
"NG"
"NO"
"NZ"
"PA"
"PE"
"PH"
"PK"
"PL"
"PR"
"PT"
"RO"
"RS"
"RU"
"RW"
"SA"
"SE"
"SG"
"SI"
"SK"
"SR"
"TH"
"TR"
"TT"
"TW"
"UA"
"UG"
"US"
"UY"
"VE"
"VN"
"ZM"
"ZW"
"AD"
"AD"
"AD"
"AD"
"AD"
end

Also, I should compute the cultural distance between the specific country, PT, with other countries. Moreover, I believe that the difference between these two formulas is not only in the dominator of 4. Euclidean Distance (Standardized) has radical while KSI does not.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

12 Feb 2024, 17:52

You are correct about the radical. Sorry. That just requires you to change the last line to:

Code:

gen euclidian = sqrt(4*ksi)

The fact that you only have to compare PT to the other countries, and not all countries with each other, makes it simpler. You really should have said that in #1!

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int cultural_dimension1 byte cultural_dimension2 int(cultural_dimension3 cultural_dimension4) str6 nacio
  .  .   .   . "AD"
 80 38  53  68 "AE"
 64 27  41  52 "AF"
  .  .   .   . "AL"
  .  .   .   . "AM"
 49 46  56  86 "AR"
 11 55  79  70 "AT"
 38 90  61  51 "AU"
  .  .   .   . "AZ"
 80 20  55  60 "BD"
 70 30  40  85 "BG"
 69 38  49  76 "BR"
 46 76  48  54 "CA"
 43 67  67  61 "CH"
 63 23  28  86 "CL"
 80 20  66  30 "CN"
 67 13  64  80 "CO"
 35 15  21  86 "CR"
  .  .   .   . "CY"
 57 58  57  74 "CZ"
 35 67  66  65 "DE"
 18 74  16  23 "DK"
  .  .   .   . "DM"
  .  .   .   . "DZ"
 78  8  63  67 "EC"
 40 60  30  60 "EE"
  .  .   .   . "EG"
 57 51  42  86 "ES"
  .  .   .   . "ET"
 33 63  26  59 "FI"
 68 71  43  86 "FR"
 35 89  66  35 "GB"
  .  .   .   . "GE"
  .  .   .   . "GH"
 60 35  57 112 "GR"
 95  6  37 101 "GT"
 68 25  57  29 "HK"
 73 33  40  80 "HR"
 46 80  88  82 "HU"
 78 14  46  48 "ID"
 28 70  68  35 "IE"
 13 54  47  81 "IL"
 77 48  56  40 "IN"
  .  .   .   . "IQ"
 58 41  43  59 "IR"
  .  .   .   . "IS"
 50 76  70  75 "IT"
 45 39  68  13 "JM"
  .  .   .   . "JO"
 54 46  95  92 "JP"
  .  .   .   . "KG"
 60 18  39  85 "KP"
 42 60  19  65 "LT"
 40 60  50  70 "LU"
 44 70   9  63 "LV"
 70 46  53  68 "MA"
  .  .   .   . "MD"
  .  .   .   . "ME"
  .  .   .   . "MK"
  .  .   .   . "ML"
 56 59  47  96 "MT"
 81 30  69  82 "MX"
104 26  50  36 "MY"
  .  .   .   . "NG"
 31 69   8  50 "NO"
 22 79  58  49 "NZ"
 95 11  44  86 "PA"
 64 16  42  87 "PE"
 94 32  64  44 "PH"
 55 14  50  70 "PK"
 68 60  64  93 "PL"
  .  .   .   . "PR"
 63 27  31 104 "PT"
 90 30  42  90 "RO"
 86 25  43  92 "RS"
 93 39  36  95 "RU"
  .  .   .   . "RW"
  .  .   .   . "SA"
 31 71   5  29 "SE"
 74 20  48   8 "SG"
 71 27  19  88 "SI"
104 52 110  51 "SK"
 85 47  37  92 "SR"
 64 20  34  64 "TH"
 66 37  45  85 "TR"
 47 16  58  55 "TT"
 58 17  45  69 "TW"
  .  .   .   . "UA"
  .  .   .   . "UG"
 40 91  62  46 "US"
 61 36  38 100 "UY"
 81 12  73  76 "VE"
 70 20  40  30 "VN"
  .  .   .   . "ZM"
  .  .   .   . "ZW"
  .  .   .   . "AD"
  .  .   .   . "AD"
  .  .   .   . "AD"
  .  .   .   . "AD"
  .  .   .   . "AD"
end


egen mcount = rowmiss(cultural_dimension*)
drop if mcount > 0

reshape long cultural_dimension, i(nacio) j(k)
by k, sort: egen Vk = sd(cultural_dimension)
replace Vk = Vk^2

by k: egen cultural_dimension_U = max(cond(nacio == "PT", cultural_dimension, .))

gen delta = cultural_dimension - cultural_dimension_U
by nacio (k), sort: egen ksi = total((delta^2)/(4*Vk))
collapse (first) ksi, by(nacio)
gen euclidean = sqrt(4*ksi)

Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

13 Feb 2024, 03:46

Thank you very much, Professor. Clyde.
Your assistance is consistently excellent, and I greatly appreciate it. The solution worked perfectly, as usual.

Code:

egen mcount = rowmiss(cultural_dimension*)
drop if mcount > 0
reshape long cultural_dimension, i(nacio) j(k)
by k, sort: egen Vk = sd(cultural_dimension)
replace Vk = Vk^2

by k: egen cultural_dimension_U = max(cond(nacio == "PT", cultural_dimension, .))

gen delta = cultural_dimension - cultural_dimension_U
by nacio (k), sort: egen ksi = total((delta^2)/(4*Vk))
collapse (first) ksi, by(nacio)
gen euclidean = sqrt(4*ksi)
sum ksi euclidean

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         ksi |         68    1.934141    1.588269          0   6.841477
   euclidean |         68     2.52845    1.167715          0   5.231244

Cheers,

Announcement

Cultural Distance index

Comment

Comment

Comment

Comment