Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets without unique primary variable values.

    Hello,

    I am trying to merge the following two datasets. I want to merge by using continent and year as primary keys, but in the first dataset, there are multiple countries under the same continent, and in the second dataset, there are multiple U.S. states as well as multiple products (given by product_code variable) under the same continent-year duo. Ideally, I want to link all countries under a given continent in the first dataset with all U.S. states x products in the second dataset and inherit the corresponding variables like export_share (from the first dataset) and product_code and dollars (from the second dataset) using the first dataset as the master.

    This felt like the only circumstance to ever use merge m:m, but oops, I was wrong. Any help would be great, thanks so much!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str30 partner int year double export_share str19 continent
    "Aruba"                      1997   .03 "americas"           
    "Afghanistan"                1997     0 "sw central asia"    
    "Angola"                     1997   .04 "africa"             
    "Anguila"                    1997     0 "americas"           
    "Albania"                    1997     0 "europe"             
    "Andorra"                    1997     0 "europe"             
    "Netherlands Antilles"       1997   .07 "americas"           
    "United Arab Emirates"       1997   .38 "sw central asia"    
    "Argentina"                  1997   .84 "americas"           
    "Armenia"                    1997   .01 "europe"             
    "Antigua and Barbuda"        1997   .01 "americas"           
    "Australia"                  1997  1.75 "se asia and oceania"
    "Austria"                    1997    .3 "europe"             
    "Azerbaijan"                 1997   .01 "europe"             
    "Burundi"                    1997     0 "africa"             
    "Benin"                      1997   .01 "africa"             
    "Burkina Faso"               1997     0 "africa"             
    "Bangladesh"                 1997   .04 "sw central asia"    
    "Bulgaria"                   1997   .02 "europe"             
    "Bahrain"                    1997   .06 "sw central asia"    
    "Bahamas, The"               1997   .12 "americas"           
    "Bosnia and Herzegovina"     1997   .01 "europe"             
    "Belarus"                    1997   .01 "europe"             
    "Belgium-Luxembourg"         1997  2.06 "europe"             
    "Belize"                     1997   .02 "americas"           
    "Bermuda"                    1997   .05 "americas"           
    "Bolivia"                    1997   .04 "americas"           
    "Brazil"                     1997  2.31 "americas"           
    "Barbados"                   1997   .04 "americas"           
    "Brunei"                     1997   .03 "se asia and oceania"
    "Bhutan"                     1997     0 "sw central asia"    
    "Central African Republic"   1997     0 "africa"             
    "Canada"                     1997 21.83 "americas"           
    "Cocos (Keeling) Islands"    1997     0 "se asia and oceania"
    "Switzerland"                1997  1.21 "europe"             
    "Chile"                      1997   .64 "americas"           
    "China"                      1997  1.86 "east asia"          
    "Cote d'Ivoire"              1997   .02 "africa"             
    "Cameroon"                   1997   .02 "africa"             
    "Congo, Rep."                1997   .01 "africa"             
    "Cook Islands"               1997     0 "se asia and oceania"
    "Colombia"                   1997   .76 "americas"           
    "Comoros"                    1997     0 "africa"             
    "Cape Verde"                 1997     0 "africa"             
    "Costa Rica"                 1997   .29 "americas"           
    "Christmas Island"           1997     0 "se asia and oceania"
    "Cayman Islands"             1997   .04 "americas"           
    "Cyprus"                     1997   .04 "europe"             
    "Czech Republic"             1997   .09 "europe"             
    "Germany"                    1997  3.56 "europe"             
    "Dominica"                   1997   .01 "americas"           
    "Denmark"                    1997   .26 "europe"             
    "Dominican Republic"         1997   .57 "americas"           
    "Algeria"                    1997    .1 "africa"             
    "Ecuador"                    1997   .22 "americas"           
    "Egypt, Arab Rep."           1997   .56 "africa"             
    "Eritrea"                    1997     0 "africa"             
    "Western Sahara"             1997     0 "africa"             
    "Spain"                      1997   .81 "europe"             
    "Estonia"                    1997   .01 "europe"             
    "Ethiopia(excludes Eritrea)" 1997   .02 "africa"             
    "Finland"                    1997   .25 "europe"             
    "Fiji"                       1997     0 "se asia and oceania"
    "Falkland Island"            1997     0 "americas"           
    "France"                     1997  2.41 "europe"             
    "Faeroe Islands"             1997     0 "europe"             
    "Micronesia, Fed. Sts."      1997     0 "se asia and oceania"
    "Gabon"                      1997   .01 "africa"             
    "United Kingdom"             1997   5.3 "europe"             
    "Georgia"                    1997   .02 "europe"             
    "Ghana"                      1997   .05 "africa"             
    "Gibraltar"                  1997     0 "europe"             
    "Guinea"                     1997   .01 "africa"             
    "Gambia, The"                1997     0 "africa"             
    "Guinea-Bissau"              1997     0 "africa"             
    "Equatorial Guinea"          1997   .01 "africa"             
    "Greece"                     1997   .14 "europe"             
    "Grenada"                    1997   .01 "americas"           
    "Greenland"                  1997     0 "americas"           
    "Guatemala"                  1997   .25 "americas"           
    "Guyana"                     1997   .02 "americas"           
    "Hong Kong, China"           1997   2.2 "east asia"          
    "Honduras"                   1997   .29 "americas"           
    "Croatia"                    1997   .02 "europe"             
    "Haiti"                      1997   .07 "americas"           
    "Hungary"                    1997   .07 "europe"             
    "Indonesia"                  1997   .66 "europe"             
    "India"                      1997   .53 "sw central asia"    
    "Ireland"                    1997   .68 "europe"             
    "Iran, Islamic Rep."         1997     0 "sw central asia"    
    "Iraq"                       1997   .01 "sw central asia"    
    "Iceland"                    1997   .03 "europe"             
    "Israel"                     1997   .87 "sw central asia"    
    "Italy"                      1997  1.31 "europe"             
    "Jamaica"                    1997   .21 "americas"           
    "Jordan"                     1997   .06 "sw central asia"    
    "Japan"                      1997  9.55 "east asia"          
    "Kazakhstan"                 1997   .04 "sw central asia"    
    "Kenya"                      1997   .03 "africa"             
    "Kyrgyz Republic"            1997     0 "sw central asia"    
    end
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str17 origin str21 continent int year str2 product_code float dollars
    "Alabama" "africa"   1997 "01"          0
    "Alabama" "africa"   2002 "01"      .1338
    "Alabama" "africa"   2007 "01"      .0093
    "Alabama" "africa"   2012 "01"      4.911
    "Alabama" "africa"   1997 "02"     2.2506
    "Alabama" "africa"   2002 "02"      .1364
    "Alabama" "africa"   2007 "02"          0
    "Alabama" "africa"   2012 "02"     1.2008
    "Alabama" "africa"   1997 "03"     1.0534
    "Alabama" "africa"   2002 "03"     1.4096
    "Alabama" "africa"   2007 "03"     1.1097
    "Alabama" "africa"   2012 "03"    25.7242
    "Alabama" "africa"   1997 "04"      2.693
    "Alabama" "africa"   2002 "04"     9.9684
    "Alabama" "africa"   2007 "04"      .1472
    "Alabama" "africa"   2012 "04"      .7842
    "Alabama" "africa"   1997 "05"      .3762
    "Alabama" "africa"   2002 "05"       .537
    "Alabama" "africa"   2007 "05"       .124
    "Alabama" "africa"   2012 "05"    94.8851
    "Alabama" "africa"   1997 "06"      .1661
    "Alabama" "africa"   2002 "06"      .1017
    "Alabama" "africa"   2007 "06"      .0346
    "Alabama" "africa"   2012 "06"      .1575
    "Alabama" "africa"   1997 "07"      .4944
    "Alabama" "africa"   2002 "07"     1.1195
    "Alabama" "africa"   2007 "07"     1.7168
    "Alabama" "africa"   2012 "07"      .7506
    "Alabama" "africa"   1997 "08"       .012
    "Alabama" "africa"   2002 "08"      .0175
    "Alabama" "africa"   2007 "08"      .0114
    "Alabama" "africa"   2012 "08"          0
    "Alabama" "africa"   1997 "09"      .6969
    "Alabama" "africa"   2002 "09"      .0343
    "Alabama" "africa"   2007 "09"      .0002
    "Alabama" "africa"   2012 "09"      .0066
    "Alabama" "americas" 1997 "01"      .0579
    "Alabama" "americas" 2002 "01"      .1761
    "Alabama" "americas" 2007 "01"      .4266
    "Alabama" "americas" 2012 "01"      10.21
    "Alabama" "americas" 1997 "02"      .6794
    "Alabama" "americas" 2002 "02"     1.1111
    "Alabama" "americas" 2007 "02"      .0199
    "Alabama" "americas" 2012 "02"      .5037
    "Alabama" "americas" 1997 "03"    17.2595
    "Alabama" "americas" 2002 "03"     8.8181
    "Alabama" "americas" 2007 "03"    22.9147
    "Alabama" "americas" 2012 "03"    14.1758
    "Alabama" "americas" 1997 "04"      3.129
    "Alabama" "americas" 2002 "04"     4.9623
    "Alabama" "americas" 2007 "04"     2.7964
    "Alabama" "americas" 2012 "04"     3.6339
    "Alabama" "americas" 1997 "05"     1.8991
    "Alabama" "americas" 2002 "05"     3.2578
    "Alabama" "americas" 2007 "05"  17.087101
    "Alabama" "americas" 2012 "05"    65.4292
    "Alabama" "americas" 1997 "06"     3.4051
    "Alabama" "americas" 2002 "06"      .7909
    "Alabama" "americas" 2007 "06"      .3818
    "Alabama" "americas" 2012 "06"     1.0203
    "Alabama" "americas" 1997 "07"     2.9975
    "Alabama" "americas" 2002 "07"     3.6664
    "Alabama" "americas" 2007 "07"     2.9118
    "Alabama" "americas" 2012 "07"     4.5321
    "Alabama" "americas" 1997 "08"      .0961
    "Alabama" "americas" 2002 "08"      .1048
    "Alabama" "americas" 2007 "08"      .8568
    "Alabama" "americas" 2012 "08"      .0064
    "Alabama" "americas" 1997 "09"     6.5483
    "Alabama" "americas" 2002 "09"     2.9866
    "Alabama" "americas" 2007 "09"     2.1862
    "Alabama" "americas" 2012 "09"     2.1106
    "Alabama" "canada"   1997 "01"     7.4898
    "Alabama" "canada"   2002 "01"    13.2828
    "Alabama" "canada"   2007 "01"     5.7015
    "Alabama" "canada"   2012 "01"     1.6135
    "Alabama" "canada"   1997 "02"      .0934
    "Alabama" "canada"   2002 "02"       .026
    "Alabama" "canada"   2007 "02"      .0591
    "Alabama" "canada"   2012 "02"          0
    "Alabama" "canada"   1997 "03"    39.3433
    "Alabama" "canada"   2002 "03"    44.9927
    "Alabama" "canada"   2007 "03"     3.8102
    "Alabama" "canada"   2012 "03"     2.2815
    "Alabama" "canada"   1997 "04"      .9966
    "Alabama" "canada"   2002 "04"      .5412
    "Alabama" "canada"   2007 "04"     4.9669
    "Alabama" "canada"   2012 "04"    15.8382
    "Alabama" "canada"   1997 "05"    58.9206
    "Alabama" "canada"   2002 "05"  65.829704
    "Alabama" "canada"   2007 "05"    48.2476
    "Alabama" "canada"   2012 "05"    27.6214
    "Alabama" "canada"   1997 "06"     1.6863
    "Alabama" "canada"   2002 "06" .017099999
    "Alabama" "canada"   2007 "06"      .0924
    "Alabama" "canada"   2012 "06"      .0425
    "Alabama" "canada"   1997 "07"     3.8584
    "Alabama" "canada"   2002 "07"     7.3508
    "Alabama" "canada"   2007 "07"     7.2861
    "Alabama" "canada"   2012 "07"     6.6236
    end

  • #2
    No, this is definitely not the place for -merge m:m-. Look, I've been using Stata on an almost daily basis since 1994. I have only once encountered a situation where -merge m:m- would produce correct results. Even then, there was a better way to do it. So just forget -merge m:m- exists. It was a mistake on StataCorp's part to ever introduce it. But for reasons of backward compatibility, they feel stuck with maintaining it.

    Now, as for what you should do, this is not a job for -merge- at all.
    I want to link all countries under a given continent in the first dataset with all U.S. states x products in the second dataset and inherit the corresponding variables like export_share (from the first dataset) and product_code and dollars (from the second dataset) using the first dataset as the master.
    That is a textbook description of exactly what -joinby- does.
    Code:
    use dataset1, clear
    joinby continent year using dataset2

    Comment


    • #3
      Clyde Schechter, thank you so much! I didn't know the existence of the joinby command. It did the job. Thank you!

      Comment


      • #4
        I didn't know the existence of the joinby command.
        It doesn't at all surprise me to learn that. -joinby- is very useful and, at least in my workflow, comes up often. Yet for some reason, it languishes in obscurity. While I think most people who teach introductory Stata give short shrift to data management in general, the -joinby- command, despite its utility, is especially neglected. I don't understand why, but, rest assured, you are not at all alone in not knowing about this important, underrated command.

        Comment

        Working...
        X