Sankey plot - adding labels

Noemi Seng

Join Date: Jan 2024

Posts: 90
#1

Sankey plot - adding labels

17 Oct 2024, 03:10

Dear community,

does someone have experience in creating sankey plots in Stata? I created a sankey plot for FDI data with the 10 lagest country pairs following this instruction by Fernando Rios-Avila which was extremely helpful. However, I would like to make one addition, namely add in the middle of the connecting lines the name of the sector, in which the most FDI occurs between the origin and host country. Does someone have an idea how I could add this to the plot?

I retrieved the sankey palettes colrspace schemepack from SSC and before executing my code I ran the two code files (sankey_plot and sankey_i) by Fernando Rios-Avila

My code is:

use gravity_sectorlevel, replace
collapse (sum) total_fdi_stock_dest = TotalassetsthUSD, by( iso3_d_encode iso3_d country_d iso3_o iso3_o_encode country_o year country_pair)
collapse (sum) total_fdi_stock_dest, by( iso3_d_encode iso3_d country_d )
keep if iso3_d == "AUS" | iso3_d == "BEL" | iso3_d == "CHN" | iso3_d == "DEU" | iso3_d == "FRA" | iso3_d == "GBR" | iso3_d == "HKG" | iso3_d == "IRL" | iso3_d == "LUX" | iso3_d == "NLD" | iso3_d == "USA"
gen fdi_dest_rs = total_fdi_stock_dest / 1000000
format fdi_dest_rs %9.2f
save dest_countries.dta, replace

use gravity_sectorlevel, replace
collapse (sum) total_fdi_stock_origin = TotalassetsthUSD, by( iso3_d_encode iso3_d country_d iso3_o iso3_o_encode country_o year country_pair)
collapse (sum) total_fdi_stock_origin, by( iso3_o_encode iso3_o country_o )
keep if iso3_o == "USA" | iso3_o == "JPN" | iso3_o == "GBR" | iso3_o == "FRA" | iso3_o == "ESP" | iso3_o == "DEU" | iso3_o == "CHE" | iso3_o == "CAN"
gen fdi_origin_rs = total_fdi_stock_origin / 1000000
format fdi_origin_rs %9.2f
save origin_countries.dta, replace

use gravity_sectorlevel, replace
collapse (sum) total_fdi_stock = TotalassetsthUSD, by( iso3_d_encode iso3_d country_d iso3_o iso3_o_encode country_o year country_pair)
collapse (sum) total_fdi_stock, by(country_pair iso3_d_encode iso3_d country_d iso3_o_encode iso3_o country_o)
sort total_fdi_stock
gen rank = _n
keep in -20/-1
list country_pair total_fdi_stock in 1/10
merge m:1 iso3_d using dest_countries.dta
drop _merge
merge m:1 iso3_o using origin_countries.dta
drop _merge

gen fdi_origin_str = string(fdi_origin_rs, "%9.2f")
gen fdi_dest_str = string(fdi_dest_rs, "%9.2f")

egen label0 = concat(iso3_o fdi_origin_str), p(" ")
egen label1 = concat(iso3_d fdi_dest_str), p(" ")

set scheme stcolor

gen x0 = 1
gen x1 = 2
sankey_plot x0 iso3_o_encode x1 iso3_d_encode, ///
width0(total_fdi_stock) extra adjust ///
colorpalette(viridis, opacity(40)) gap(0.1) noline labcolor(black) ///
label0(label0) label1(label1) ///
xlabel(1 "Origin" 2 "Host", nogrid) xsize(5) ysize(5.5)

sankey_plot x0 iso3_o x1 iso3_d, width0(total_fdi_stock) extra adjust colorpalette(viridis, opacity(40)) gap(0.1) noline labcolor(black) label0(label0) xlabel(1 "Origin" 2 "Host", nogrid) xsize(5) ysize(5.5) // title("Top 20 country pairs by total assets (average over time)")

My plot looks like this:

What I want is sth like this, where the sector is displayed (taken from https://www.usitc.gov/publications/3...d_nov_2023.pdf) :

I would appreciate any help!

Best
Noemi
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2409
#2

17 Oct 2024, 07:30

Hi Noemi
For something like that you need more layers.
Specifcally, your X0 X1 will need to be there going from 1 to 2 (this is what you have) and from 2 to 3. The 2 would be your middle group
Hope this helps
F
Comment
Noemi Seng

Join Date: Jan 2024

Posts: 90
#3

17 Oct 2024, 09:01

Dear FernandoRios

thank you so much for your response. I think my data does not have this structure. So there is no way I could just manually add a label to the connecting segment, stating the sector name?

Best
Noemi
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2409
#4

17 Oct 2024, 11:24

you may need to restructure it
i cannot say more beyond that without seen the data
Comment

Noemi Seng

Join Date: Jan 2024
Posts: 90

18 Oct 2024, 08:49

This is an example of my data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country_pair str3 iso3_o str14 country_o str3 iso3_d str14 country_d str3(iso3_d_encode iso3_o_encode) double total_fdi_stock str12 sector
"FRAGBR" "FRA" "France"         "GBR" "United Kingdom" "GBR" "FRA"  6150296.337366313 "Finance"     
"USAAUS" "USA" "United States"  "AUS" "Australia"      "AUS" "USA"  6230860.105399132 "Finance"     
"GBRAUS" "GBR" "United Kingdom" "AUS" "Australia"      "AUS" "GBR"  6446273.537326217 "Management"  
"FRABEL" "FRA" "France"         "BEL" "Belgium"        "BEL" "FRA"  6954542.277267169 "FInance"     
"USADEU" "USA" "United States"  "DEU" "Germany"        "DEU" "USA"   7208773.22272037 "Information "
"CHEUSA" "CHE" "Switzerland"    "USA" "United States"  "USA" "CHE"   7390831.17666626 "Finance"     
"USACHN" "USA" "United States"  "CHN" "China"          "CHN" "USA"  7771158.493024005 "Real Estate"
"DEUGBR" "DEU" "Germany"        "GBR" "United Kingdom" "GBR" "DEU"  8468188.065097764 "Finance"     
"GBRNLD" "GBR" "United Kingdom" "NLD" "Netherlands"    "NLD" "GBR"  9012846.412998468 "FInance"     
"GBRFRA" "GBR" "United Kingdom" "FRA" "France"         "FRA" "GBR"   9598786.51034689 "Finance"     
"JPNUSA" "JPN" "Japan"          "USA" "United States"  "USA" "JPN"  10150002.13390249 "Management"  
"JPNGBR" "JPN" "Japan"          "GBR" "United Kingdom" "GBR" "JPN" 10435803.109379198 "Finance"     
"USAIRL" "USA" "United States"  "IRL" "Ireland"        "IRL" "USA" 11598436.177555203 "Finance"     
"ESPGBR" "ESP" "Spain"          "GBR" "United Kingdom" "GBR" "ESP" 13133246.855458409 "Information "
"GBRUSA" "GBR" "United Kingdom" "USA" "United States"  "USA" "GBR" 15515183.413017288 "Finance"     
"CANUSA" "CAN" "Canada"         "USA" "United States"  "USA" "CAN" 16677937.239602685 "Finance"     
"GBRHKG" "GBR" "United Kingdom" "HKG" "Hong Kong"      "HKG" "GBR"  16703170.70928955 "Finance"     
"USALUX" "USA" "United States"  "LUX" "Luxembourg"     "LUX" "USA"  20181799.81582506 "Management"  
"USANLD" "USA" "United States"  "NLD" "Netherlands"    "NLD" "USA"   20239658.5090217 "Real Estate"
"USAGBR" "USA" "United States"  "GBR" "United Kingdom" "GBR" "USA"  74039827.05130184 "Finance"     
end

Unfortunately I don't manage to restructure it in a way similar to the job market example in your guideline. Maybe you have an idea seeing the data? Please let me know if you need more information about the data.

Best
Noemi

Announcement

Sankey plot - adding labels

Comment

Comment

Comment

Comment