Building on Jared Greathouse's advice, here's a fairly simple way to verify that there is a one-to-one correspondence in both directions between country_encoded and country_string:
This will be somewhat time consuming in a data set of this size because sorting is involved. But it's important to be sure that the data is right--we should not be in a rush to get the wrong results.
Code:
by country_encoded (country_string), sort: assert country_string[1] == country_string[_N] by country_string (country_encoded), sort: assert country_encoded[1] == country_encoded[_N]
Comment