Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unifying country names when merging data

    I have data from different sources, some use names for a country and other sources use different name. Such as Hong Kong and Hong Kong, China. I merged the data but I have the two countries separate with different IDs. I want to merge both together in one ID and unify the name to be Hong Kong and merge all the variables together under this name. Thanks for helping.

  • #2
    Your best bet is to employ a standardized coding scheme. Take a look at the -kountry- command from SSC, which may be what you're looking for.

    Comment


    • #3
      A good place to start, that will capture a lot of the common variations is to use Rafal Raciborski's kountry, see ssc desc kountry.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks so much Mr. Ali and Mr. Maarten for the reply.

        Can I use it after I already merged the data or it should be before merging?

        Thanks

        Comment


        • #5
          Should be used before merging. In both datasets, use
          Code:
          kountry country_var, from(other) stuck marker
          This will generate an iso3n code variable which can then be used as the key when you merge. It will also generate a marker variable identifying successful standardizations. Before merging, you should check if any countries failed to standardize using
          Code:
          tab if marker==0

          Comment


          • #6
            Ok Thanks Mr. Ali. If I found a country that is failed to standardize and the marker==0 what should I do in this case? Thanks

            Comment


            • #7
              If the countries which failed to standardize are low in quantity, you may find it easiest to simply recode them manually by creating temporary four-digit codes for those countries, replacing the missing _ISO3N_ in both datasets:

              Code:
              replace _ISO3N_ = <new four digit code> if country == "nonstandardized"
              If there are many non-standardized countries, you can create a new dataset containing one observation for each of the non-standardized countries and a variable called _ISO3N_. Then you can create new unique four digit codes (to ensure there is no clash with the preexisting three digit codes) temporary codes for each of those countries:

              Code:
              replace _ISO3N_ = _n+999
              Merge this dataset with each of your master datasets using the update option, then merge both master datasets using _ISO3N_ as the key.

              Comment


              • #8
                Thanks Mr. Ali. That was really beneficial.

                Comment


                • #9
                  Hi;
                  I am getting the following error after running tab code:

                  tab if marker == 0
                  varlist required
                  r(100);

                  Comment


                  • #10
                    Yes, of course you are. Your command asks Stata to create a table, but you don't specify what variable(s) you want it to tabulate. That is what you need to fill in between -tab- and -if-.

                    Comment

                    Working...
                    X