Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • drop duplicated pairs.

    Dear All, I find this question here (https://bbs.pinggu.org/forum.php?mod...=1#pid57044671). The data set is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3(origin dest) float(index_bilateral year)
    "CHN" "BGR"  25.245697 2010
    "CHN" "RUS" -.26786003 2004
    "CHN" "IND"   15.62117 2008
    "CHN" "IND"   6.619596 2006
    "CHN" "CYP"   65.14904 2003
    "CHN" "PRT"  28.602924 2011
    "CHN" "BGR"  16.159151 2013
    "CHN" "EST"   43.11666 2012
    "CHN" "MLT"   47.30183 2008
    "CHN" "LVA"   51.74584 2005
    "CHN" "IDN"  16.247213 2014
    "CHN" "CZE"  34.753216 2008
    "CHN" "HUN"   37.37414 2000
    "CHN" "KOR"  -18.88718 2001
    "CHN" "CZE"  17.311008 2013
    "CHN" "RUS"   5.484401 2005
    "CHN" "EST"   42.96254 2006
    "CHN" "TUR"   50.51025 2014
    "CHN" "LTU"   48.40796 2006
    "CHN" "LTU"   50.13712 2008
    "CHN" "CZE"  24.503586 2011
    "CHN" "KOR"   -26.1602 2013
    "CHN" "SVK"  -4.946331 2013
    "CHN" "IDN"  -9.215469 2001
    "CHN" "MEX"   58.31996 2004
    "CHN" "HUN"   42.35766 2003
    "CHN" "KOR"   -27.2578 2004
    "CHN" "POL"   35.22688 2012
    "CHN" "CYP"   52.51056 2012
    "CHN" "RUS"  -23.19418 2000
    "CHN" "CYP"   57.22584 2004
    "CHN" "SVK"  35.143818 2001
    "CHN" "BGR"   48.62837 2000
    "CHN" "RUS"  23.311113 2010
    "CHN" "TUR"   40.50524 2010
    "CHN" "SVK"  24.040916 2006
    "CHN" "RUS"  25.840374 2008
    "CHN" "MEX"   57.34701 2005
    "CHN" "SVK"  4.5704594 2010
    "CHN" "CYP"   63.87872 2009
    "CHN" "LVA"    41.2198 2004
    "CHN" "KOR"  -23.98084 2012
    "CHN" "MLT"   26.56485 2002
    "CHN" "BEL"   4.807569 2003
    "CHN" "TUR"   51.87334 2008
    "CHN" "GRC"   48.62494 2013
    "CHN" "IDN" -.51416165 2012
    "CHN" "CZE"  14.584755 2014
    "CHN" "HUN"  34.845596 2002
    "CHN" "POL"  32.281773 2001
    "CHN" "CZE"   26.69204 2010
    "CHN" "MLT"   60.83664 2014
    "CHN" "PRT"   43.49453 2007
    "CHN" "IDN"    7.19298 2008
    "CHN" "GRC"   55.29412 2010
    "CHN" "EST"   58.36507 2002
    "CHN" "HUN"   27.52004 2006
    "CHN" "SVN"   39.25492 2007
    "CHN" "SVN"  32.444096 2010
    "CHN" "SVK"  -4.403139 2014
    "CHN" "SVK"  4.2479577 2003
    "CHN" "LTU"   48.02407 2007
    "CHN" "KOR" -26.778866 2010
    "CHN" "BRA" -16.774914 2001
    "CHN" "PRT"  22.120983 2014
    "CHN" "POL"   34.62191 2002
    "CHN" "CYP"   68.21292 2002
    "CHN" "CZE"  24.281513 2002
    "CHN" "BRA" -24.257875 2002
    "CHN" "HUN"  -.3747167 2010
    "CHN" "HUN"   26.74776 2008
    "CHN" "BGR"   68.55055 2006
    "CHN" "MEX"   53.34143 2014
    "CHN" "BGR"   44.87311 2003
    "CHN" "HRV"   6.029083 2002
    "CHN" "LTU"  21.262976 2001
    "CHN" "KOR" -24.096216 2014
    "CHN" "KOR"  -6.164859 2008
    "CHN" "GRC"   64.14749 2007
    "CHN" "SVK"  25.626314 2005
    "CHN" "LVA"   60.10111 2007
    "CHN" "TUR"   54.39251 2006
    "CHN" "BGR"   58.43769 2001
    "CHN" "IDN"  -9.743893 2002
    "CHN" "POL"   40.53851 2009
    "CHN" "BGR"   61.54499 2008
    "CHN" "LVA"  30.358263 2001
    "CHN" "HUN"  -1.405918 2012
    "CHN" "IDN"  -2.970981 2007
    "CHN" "HUN"   5.198897 2014
    "CHN" "CZE"  28.565166 2006
    "CHN" "LTU"   21.54196 2002
    "CHN" "EST"   68.44804 2001
    "CHN" "BEL"   3.225987 2013
    "CHN" "SVN"  34.447018 2011
    "CHN" "RUS"  11.041714 2006
    "CHN" "LTU"  33.574306 2009
    "CHN" "MEX"    62.5219 2006
    "CHN" "HRV" -12.727107 2000
    "CHN" "SVN"   27.55047 2001
    "BEL" "CHN"   .8011382 2012
    "BEL" "CHN"   -13.3949 2004
    "BEL" "CHN" -15.197672 2000
    "BEL" "CHN"  -4.807569 2003
    "BEL" "CHN"  -22.55604 2006
    "BEL" "CHN"  -3.225987 2013
    "BEL" "CHN"  -11.47142 2009
    "BEL" "CHN"   .9251509 2002
    "BEL" "CHN"  -9.647018 2001
    "BEL" "CHN"  -4.850701 2014
    "BEL" "CHN"  -7.969518 2010
    "BEL" "CHN" -21.128284 2007
    "BEL" "CHN" -18.861422 2005
    "BEL" "CHN"  -5.933108 2011
    "BEL" "CHN"  -25.02423 2008
    "BGR" "CHN"  -58.43769 2001
    "BGR" "CHN"  -46.04298 2009
    "BGR" "CHN" -25.245697 2010
    "BGR" "CHN"  -56.58496 2005
    "BGR" "CHN"  -68.55055 2006
    "BGR" "CHN"  -46.10373 2004
    "BGR" "CHN" -16.159151 2013
    "BGR" "CHN"  -61.54499 2008
    "BGR" "CHN"  -48.62837 2000
    "BGR" "CHN"  -22.00906 2011
    "BGR" "CHN"  -47.20781 2002
    "BGR" "CHN" -17.001524 2014
    "BGR" "CHN"  -59.63894 2007
    "BGR" "CHN"  -21.91966 2012
    "BGR" "CHN"  -44.87311 2003
    "BRA" "CHN"  24.257875 2002
    "BRA" "CHN"  13.162673 2011
    "BRA" "CHN"  15.624968 2000
    "BRA" "CHN"  28.500885 2003
    "BRA" "CHN"   5.904176 2010
    "BRA" "CHN"  16.774914 2001
    "BRA" "CHN"  18.910135 2005
    "BRA" "CHN"   6.474211 2007
    "BRA" "CHN"   10.03562 2006
    "BRA" "CHN"   9.583493 2012
    "BRA" "CHN" -1.0021992 2008
    "BRA" "CHN"   4.442032 2013
    "BRA" "CHN"  21.865746 2004
    "BRA" "CHN"  11.803807 2009
    "BRA" "CHN" -2.3279502 2014
    "CYP" "CHN"  -68.21292 2002
    "CYP" "CHN"  -73.69014 2007
    "CYP" "CHN"  -66.83552 2006
    "CYP" "CHN"  -66.31237 2000
    "CYP" "CHN"  -57.22584 2004
    "CYP" "CHN"  -49.63297 2013
    "CYP" "CHN"  -58.10071 2001
    "CYP" "CHN"  -68.57891 2010
    "CYP" "CHN"   -63.3471 2005
    "CYP" "CHN"  -54.56297 2014
    "CYP" "CHN"  -71.86724 2008
    "CYP" "CHN"  -63.87872 2009
    "CYP" "CHN"  -52.51056 2012
    "CYP" "CHN"  -59.69622 2011
    "CYP" "CHN"  -65.14904 2003
    "CZE" "CHN" -24.281513 2002
    "CZE" "CHN" -34.032505 2007
    "CZE" "CHN" -14.584755 2014
    "CZE" "CHN" -17.311008 2013
    "CZE" "CHN" -17.492489 2001
    "CZE" "CHN"  -27.75855 2005
    "CZE" "CHN" -28.565166 2006
    "CZE" "CHN" -34.753216 2008
    "CZE" "CHN"  -23.58339 2004
    "CZE" "CHN" -11.225542 2000
    "CZE" "CHN"  -26.69204 2010
    "CZE" "CHN" -24.503586 2011
    "CZE" "CHN" -24.731274 2003
    "CZE" "CHN"  -28.88364 2009
    "CZE" "CHN" -19.054138 2012
    "EST" "CHN"  -39.77054 2005
    "EST" "CHN"  -39.66985 2014
    "EST" "CHN"  -42.98664 2008
    "EST" "CHN"  -42.96254 2006
    "EST" "CHN"  -58.36507 2002
    "EST" "CHN" -28.522646 2009
    "EST" "CHN"  -68.44804 2001
    "EST" "CHN"  -49.15324 2003
    "EST" "CHN"  -32.25482 2010
    "EST" "CHN" -30.581184 2013
    "EST" "CHN"  -42.99669 2000
    "EST" "CHN" -36.860653 2011
    "EST" "CHN"  -47.77432 2007
    "EST" "CHN"  -44.49921 2004
    "EST" "CHN"  -43.11666 2012
    "GRC" "CHN"  -65.65321 2008
    "GRC" "CHN"  -53.34895 2001
    "GRC" "CHN"  -47.14865 2003
    "GRC" "CHN"  -44.98263 2012
    "GRC" "CHN"  -50.84057 2014
    "GRC" "CHN"  -50.93797 2002
    "GRC" "CHN"  -55.27709 2006
    "GRC" "CHN" -37.631474 2004
    "GRC" "CHN"  -55.29412 2010
    "GRC" "CHN"  -53.39635 2011
    end
    The is a "trade" data set, where `origin' denotes the exporting countries and `dest' denotes the importing countries. Clearly, in the same `year', the absolute values of trade between any pairs of countries (`index_bilateral2') must be equal. The question is how to keep only one observation for each pair (for each year), say, from the perspective of exporting countries? Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Nick solved a similar challenge here (that I replicated below). There is a Stata Journal article on it "Stata tip 71: The problem of split identity, or how to group dyads" here

    Code:
    sort origin dest year
    bysort origin dest (year): gen n = _n
    ssc install multencode
    multencode origin dest, gen(orig_id dest_id)
    gen dyad_id = cond(orig_id < dest_id, string(orig_id) + " " + string(dest_id), string(dest_id) + " " + string(orig_id))
    egen dyad = group(dyad_id), label
    
    . gen id = _n
    
    . list if n<=5 & id<=50, sepby(origin dest) nolabel
    
         +--------------------------------------------------------------------------------+
         | origin   dest   index_b~l   year   n   orig_id   dest_id   dyad_id   dyad   id |
         |--------------------------------------------------------------------------------|
      1. |    BEL    CHN   -15.19767   2000   1         1         4       1 4      1    1 |
      2. |    BEL    CHN   -9.647018   2001   2         1         4       1 4      1    2 |
      3. |    BEL    CHN    .9251509   2002   3         1         4       1 4      1    3 |
      4. |    BEL    CHN   -4.807569   2003   4         1         4       1 4      1    4 |
      5. |    BEL    CHN    -13.3949   2004   5         1         4       1 4      1    5 |
         |--------------------------------------------------------------------------------|
     16. |    BGR    CHN   -48.62837   2000   1         2         4       2 4      2   16 |
     17. |    BGR    CHN   -58.43769   2001   2         2         4       2 4      2   17 |
     18. |    BGR    CHN   -47.20781   2002   3         2         4       2 4      2   18 |
     19. |    BGR    CHN   -44.87311   2003   4         2         4       2 4      2   19 |
     20. |    BGR    CHN   -46.10373   2004   5         2         4       2 4      2   20 |
         |--------------------------------------------------------------------------------|
     31. |    BRA    CHN    15.62497   2000   1         3         4       3 4      3   31 |
     32. |    BRA    CHN    16.77491   2001   2         3         4       3 4      3   32 |
     33. |    BRA    CHN    24.25788   2002   3         3         4       3 4      3   33 |
     34. |    BRA    CHN    28.50089   2003   4         3         4       3 4      3   34 |
     35. |    BRA    CHN    21.86575   2004   5         3         4       3 4      3   35 |
         |--------------------------------------------------------------------------------|
     46. |    CHN    BEL    4.807569   2003   1         4         1       1 4      1   46 |
     47. |    CHN    BEL    3.225987   2013   2         4         1       1 4      1   47 |
         |--------------------------------------------------------------------------------|
     48. |    CHN    BGR    48.62837   2000   1         4         2       2 4      2   48 |
     49. |    CHN    BGR    58.43769   2001   2         4         2       2 4      2   49 |
     50. |    CHN    BGR    44.87311   2003   3         4         2       2 4      2   50 |
         +--------------------------------------------------------------------------------+
    
    sort dyad year
    duplicates tag dyad year, gen(dups)
    list origin dest index_bilateral year dups dyad dyad_id orig_id dest_id in 1/50, sepby(dyad) nolabel
    
         +------------------------------------------------------------------------------+
         | origin   dest   index_b~l   year   dups   dyad   dyad_id   orig_id   dest_id |
         |------------------------------------------------------------------------------|
      1. |    BEL    CHN   -15.19767   2000      0      1       1 4         1         4 |
      2. |    BEL    CHN   -9.647018   2001      0      1       1 4         1         4 |
      3. |    BEL    CHN    .9251509   2002      0      1       1 4         1         4 |
      4. |    BEL    CHN   -4.807569   2003      1      1       1 4         1         4 |
      5. |    CHN    BEL    4.807569   2003      1      1       1 4         4         1 |
      6. |    BEL    CHN    -13.3949   2004      0      1       1 4         1         4 |
      7. |    BEL    CHN   -18.86142   2005      0      1       1 4         1         4 |
      8. |    BEL    CHN   -22.55604   2006      0      1       1 4         1         4 |
      9. |    BEL    CHN   -21.12828   2007      0      1       1 4         1         4 |
     10. |    BEL    CHN   -25.02423   2008      0      1       1 4         1         4 |
     11. |    BEL    CHN   -11.47142   2009      0      1       1 4         1         4 |
     12. |    BEL    CHN   -7.969518   2010      0      1       1 4         1         4 |
     13. |    BEL    CHN   -5.933108   2011      0      1       1 4         1         4 |
     14. |    BEL    CHN    .8011382   2012      0      1       1 4         1         4 |
     15. |    BEL    CHN   -3.225987   2013      1      1       1 4         1         4 |
     16. |    CHN    BEL    3.225987   2013      1      1       1 4         4         1 |
     17. |    BEL    CHN   -4.850701   2014      0      1       1 4         1         4 |
         |------------------------------------------------------------------------------|
     18. |    CHN    BGR    48.62837   2000      1      2       2 4         4         2 |
     19. |    BGR    CHN   -48.62837   2000      1      2       2 4         2         4 |
     20. |    CHN    BGR    58.43769   2001      1      2       2 4         4         2 |
     21. |    BGR    CHN   -58.43769   2001      1      2       2 4         2         4 |
     22. |    BGR    CHN   -47.20781   2002      0      2       2 4         2         4 |
     23. |    BGR    CHN   -44.87311   2003      1      2       2 4         2         4 |
     24. |    CHN    BGR    44.87311   2003      1      2       2 4         4         2 |
     25. |    BGR    CHN   -46.10373   2004      0      2       2 4         2         4 |
     26. |    BGR    CHN   -56.58496   2005      0      2       2 4         2         4 |
     27. |    CHN    BGR    68.55055   2006      1      2       2 4         4         2 |
     28. |    BGR    CHN   -68.55055   2006      1      2       2 4         2         4 |
     29. |    BGR    CHN   -59.63894   2007      0      2       2 4         2         4 |
     30. |    CHN    BGR    61.54499   2008      1      2       2 4         4         2 |
     31. |    BGR    CHN   -61.54499   2008      1      2       2 4         2         4 |
     32. |    BGR    CHN   -46.04298   2009      0      2       2 4         2         4 |
     33. |    BGR    CHN    -25.2457   2010      1      2       2 4         2         4 |
     34. |    CHN    BGR     25.2457   2010      1      2       2 4         4         2 |
     35. |    BGR    CHN   -22.00906   2011      0      2       2 4         2         4 |
     36. |    BGR    CHN   -21.91966   2012      0      2       2 4         2         4 |
     37. |    CHN    BGR    16.15915   2013      1      2       2 4         4         2 |
     38. |    BGR    CHN   -16.15915   2013      1      2       2 4         2         4 |
     39. |    BGR    CHN   -17.00152   2014      0      2       2 4         2         4 |
         |------------------------------------------------------------------------------|
     40. |    BRA    CHN    15.62497   2000      0      3       3 4         3         4 |
     41. |    CHN    BRA   -16.77491   2001      1      3       3 4         4         3 |
     42. |    BRA    CHN    16.77491   2001      1      3       3 4         3         4 |
     43. |    CHN    BRA   -24.25788   2002      1      3       3 4         4         3 |
     44. |    BRA    CHN    24.25788   2002      1      3       3 4         3         4 |
     45. |    BRA    CHN    28.50089   2003      0      3       3 4         3         4 |
     46. |    BRA    CHN    21.86575   2004      0      3       3 4         3         4 |
     47. |    BRA    CHN    18.91014   2005      0      3       3 4         3         4 |
     48. |    BRA    CHN    10.03562   2006      0      3       3 4         3         4 |
     49. |    BRA    CHN    6.474211   2007      0      3       3 4         3         4 |
     50. |    BRA    CHN   -1.002199   2008      0      3       3 4         3         4 |
         +------------------------------------------------------------------------------+
    Last edited by David Benson; 21 Feb 2019, 00:38.

    Comment


    • #3
      Dear David, Thanks for this helpful suggestion. I will give a try.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        Simple comparison is still applicable for string. Thus, below code should work.
        Code:
        gen pair = cond(origin<dest,origin+"-"+dest,dest+"-"+origin)
        bys year pair (origin): keep if _n==1

        Comment


        • #5
          Dear Romalpa, This is a brilliant solution. Thanks a lot.
          Last edited by River Huang; 21 Feb 2019, 18:10.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment


          • #6
            Dear Romalpa, It seems to me that we can use
            Code:
            keep if origin < dest
            to reach the same result.

            Ho-Chuan (River) Huang
            Stata 17.0, MP(4)

            Comment


            • #7
              Yes, it is very sharp logic and would jump directly on the desired target (given that there are no “real” duplicates in your data). Thank you for sharing it out.

              Besides, it should also be noted that ”bysort year” is still needed.

              Comment


              • #8
                Dear Romalpa, Thanks for the reply. I am not quite sure if we need the "bys year" prefix. See the following
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str6(export import) float(year trade)
                "A" "B" 2016  10
                "A" "B" 2017  12
                "A" "C" 2016  -5
                "A" "C" 2017  -9
                "B" "A" 2016 -10
                "B" "A" 2017 -12
                "B" "C" 2016   4
                "B" "C" 2017   8
                "C" "A" 2016   5
                "C" "A" 2017   9
                "C" "B" 2016  -4
                "C" "B" 2017  -8
                end
                and
                Code:
                . preserve 
                
                . keep if export < import
                (6 observations deleted)
                
                . list, sep(0)
                
                     +--------------------------------+
                     | export   import   year   trade |
                     |--------------------------------|
                  1. |      A        B   2016      10 |
                  2. |      A        B   2017      12 |
                  3. |      A        C   2016      -5 |
                  4. |      A        C   2017      -9 |
                  5. |      B        C   2016       4 |
                  6. |      B        C   2017       8 |
                     +--------------------------------+
                
                . restore
                
                . 
                . bys year: keep if export < import
                (6 observations deleted)
                
                . sort export import year
                
                . list, sep(0)
                
                     +--------------------------------+
                     | export   import   year   trade |
                     |--------------------------------|
                  1. |      A        B   2016      10 |
                  2. |      A        B   2017      12 |
                  3. |      A        C   2016      -5 |
                  4. |      A        C   2017      -9 |
                  5. |      B        C   2016       4 |
                  6. |      B        C   2017       8 |
                     +--------------------------------+
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment


                • #9
                  Yes, you are right. bys year is not needed for this logic, by just my overthinking. Thanks again.

                  Comment


                  • #10
                    Dear Romalpa, Thank you for your inspiration as well.
                    Ho-Chuan (River) Huang
                    Stata 17.0, MP(4)

                    Comment

                    Working...
                    X