Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge (intersection)

    I have two datasets, ret and market_return. The ret file has more observations. To find intersection, I tried the following code.
    Code:
    merge 1:m company_id using ret
    variable company_id does not uniquely identify observations in the master data
    r(459);
    Am I using wrong code or is something wrong with my data format?

    Code:
    . describe
    
    Contains data from F:\Stata Working Directory\market_return.dta
      obs:        11,008                          
     vars:             3                          2 Jun 2018 17:20
     size:       110,080                          
    --------------------------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------------------------------------------------
    date            int     %td                   
    company_id      long    %12.0g     company_id2
                                                  
    market_return   long    %45.0g     market_return2
                                                  
    --------------------------------------------------------------------------------------------------------------------------
    Sorted by: company_id
    Code:
    . describe
    
    Contains data from F:\Stata Working Directory\ret.dta
      obs:        94,085                          
     vars:             3                          2 Jun 2018 16:39
     size:       940,850                          
    --------------------------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------------------------------------------------
    date            int     %td                   
    company_id      long    %12.0g     company_id2
                                                  
    ret             long    %45.0g     ret2       
    --------------------------------------------------------------------------------------------------------------------------
    Sorted by: company_id

  • #2
    variable company_id does not uniquely identify observations in the master data
    The message is self explanatory. There are values of the variable company_id that appear more than once in the first data set. If that isn't supposed to be the case, then there is something wrong with that data set and you need to fix the data. To find the offending repetitions of company_id you can run

    Code:
    duplicates tag company_id, gen(flag)
    browse if flag
    and then you can figure out what to do about it.

    If repeated values of company_id in that data set are supposed to be there, then your -merge- syntax is incompatible with that data set. Perhaps you meant to -merge m:1 company_id using ret-? Of course, that will only work if company_id uniquely identifies observations in that second data set.

    Based on the -describe- output, my hunch is that the same values of company_id appear more than once in both data sets and that what you really want to do is -merge company_id date using ret- to match each observation in the first data set with the observations for the same company on the same date in the second one.

    In any case, all of this is speculation because you do not show examples of the data in these two data sets. Without that, we can only guess what is going on. If the above suggestions do not solve your problem, please post back including examples of the data from both data sets. Use the -dataex- command to show the examples. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      You are right Clyde Schechter.
      Based on the -describe- output, my hunch is that the same values of company_id appear more than once in both data sets and that what you really want to do is -merge company_id date using ret- to match each observation in the first data set with the observations for the same company on the same date in the second one.
      Unfortunately, I have not still managed to solve the problem.

      Here is my sample data. Note that in the actual case, ret file has more observations. My goal is to get the "intersection".

      ret
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int date long(company_id ret)
      20494  1  5653
      20573  1  3508
      20494  2   230
      20573  2  7971
      20494  3  1370
      20573  3  2353
      20494  4  2052
      20573  4   178
      20494  5  3474
      20573  5  1956
      20494  6  2250
      20573  6  2413
      20494  7  2116
      20573  7   664
      20494  8   901
      20573  8  3232
      20494  9   310
      20573  9  1383
      20494 10   597
      20573 10  3436
      20494 11  1913
      20573 11  5052
      20494 12   812
      20573 12  1447
      20494 13  5653
      20573 13  5653
      20494 14  5668
      20573 14  3470
      20494 15  5653
      20573 15  5653
      20494 16  5653
      20573 16  5653
      20494 17   753
      20573 17  1312
      20494 18  5653
      20573 18  5653
      20494 19  5653
      20573 19  5653
      20494 20  2093
      20573 20  1150
      20494 21  2048
      20573 21  6177
      20494 22   137
      20573 22 12897
      20494 23  7153
      20573 23   352
      20494 24  5249
      20573 24 10899
      20494 25 12474
      20573 25 11777
      20494 26   777
      20573 26  5629
      20494 27  1865
      20573 27  1303
      20494 28  5300
      20573 28  5645
      20494 29   401
      20573 29  5640
      20494 30  2188
      20573 30  4993
      20494 31  3481
      20573 31   196
      20494 32  1232
      20573 32  9139
      20494 33  4652
      20573 33  9227
      20494 34  5658
      20573 34  6523
      20494 35  2867
      20573 35  8338
      20494 36  5653
      20573 36  5653
      20494 37  4047
      20573 37  7314
      20494 38   353
      20573 38  1533
      20494 39  5653
      20573 39  5653
      20494 40  5311
      20573 40  3487
      20494 41   961
      20573 41  3424
      20494 42   832
      20573 42 12924
      20494 43  7366
      20573 43 12921
      20494 44  4079
      20573 44  2085
      20494 45  1799
      20573 45  1799
      20494 46  3488
      20573 46 11849
      20494 47  7674
      20573 47 11247
      20494 48  5653
      20573 48  5653
      20494 49   470
      20573 49   672
      20494 50   708
      20573 50  5768
      end
      format %td date
      label values company_id company_id2
      label def company_id2 1 "BAPITZRK1005", modify
      label def company_id2 2 "BE0946620946", modify
      label def company_id2 3 "BG1100036026", modify
      label def company_id2 4 "CH0002178348", modify
      label def company_id2 5 "CH0012815459", modify
      label def company_id2 6 "CY0100120910", modify
      label def company_id2 7 "CY0100710710", modify
      label def company_id2 8 "CY0100962113", modify
      label def company_id2 9 "CY0101162119", modify
      label def company_id2 10 "DE0005141907", modify
      label def company_id2 11 "DE0005227201", modify
      label def company_id2 12 "DE0005493514", modify
      label def company_id2 13 "DE0006048267", modify
      label def company_id2 14 "DE0006053952", modify
      label def company_id2 15 "DE0006601602", modify
      label def company_id2 16 "DE0006778905", modify
      label def company_id2 17 "DE0008086000", modify
      label def company_id2 18 "DE0008114323", modify
      label def company_id2 19 "DE000A0B95Y8", modify
      label def company_id2 20 "DE000A0MSEQ3", modify
      label def company_id2 21 "DE000A13STW4", modify
      label def company_id2 22 "DE000A1X3X33", modify
      label def company_id2 23 "DE000A1YCMM2", modify
      label def company_id2 24 "DE000A2E3707", modify
      label def company_id2 25 "DK0015312474", modify
      label def company_id2 26 "ES0105026001", modify
      label def company_id2 27 "ES0113790226", modify
      label def company_id2 28 "ES0119037010", modify
      label def company_id2 29 "FI0009006381", modify
      label def company_id2 30 "FI0009008221", modify
      label def company_id2 31 "FI0009900336", modify
      label def company_id2 32 "FI4000157235", modify
      label def company_id2 33 "FR0000030827", modify
      label def company_id2 34 "FR0000038200", modify
      label def company_id2 35 "FR0000053514", modify
      label def company_id2 36 "FR0000061053", modify
      label def company_id2 37 "FR0000063091", modify
      label def company_id2 38 "FR0000075673", modify
      label def company_id2 39 "FR0000076994", modify
      label def company_id2 40 "FR0000121881", modify
      label def company_id2 41 "FR0000125684", modify
      label def company_id2 42 "FR0004042083", modify
      label def company_id2 43 "FR0004060671", modify
      label def company_id2 44 "FR0004176535", modify
      label def company_id2 45 "FR0010213215", modify
      label def company_id2 46 "FR0010263202", modify
      label def company_id2 47 "FR0010282822", modify
      label def company_id2 48 "FR0010396119", modify
      label def company_id2 49 "FR0010921163", modify
      label def company_id2 50 "FR0011284991", modify
      label values ret ret2
      label def ret2 137 "-1.36", modify
      label def ret2 178 "-1.77", modify
      label def ret2 196 "-1.95", modify
      label def ret2 230 "-10.29", modify
      label def ret2 310 "-11.11", modify
      label def ret2 352 "-11.54", modify
      label def ret2 353 "-11.55", modify
      label def ret2 401 "-12.05", modify
      label def ret2 470 "-12.77", modify
      label def ret2 597 "-14.12", modify
      label def ret2 664 "-14.81", modify
      label def ret2 672 "-14.89", modify
      label def ret2 708 "-15.25", modify
      label def ret2 753 "-15.73", modify
      label def ret2 777 "-15.98", modify
      label def ret2 812 "-16.38", modify
      label def ret2 832 "-16.6", modify
      label def ret2 901 "-17.35", modify
      label def ret2 961 "-17.98", modify
      label def ret2 1150 "-2.09", modify
      label def ret2 1232 "-2.91", modify
      label def ret2 1303 "-20.7", modify
      label def ret2 1312 "-20.79", modify
      label def ret2 1370 "-21.4", modify
      label def ret2 1383 "-21.57", modify
      label def ret2 1447 "-22.35", modify
      label def ret2 1533 "-23.4", modify
      label def ret2 1799 "-26.67", modify
      label def ret2 1865 "-27.63", modify
      label def ret2 1913 "-28.24", modify
      label def ret2 1956 "-28.81", modify
      label def ret2 2048 "-3", modify
      label def ret2 2052 "-3.04", modify
      label def ret2 2085 "-3.39", modify
      label def ret2 2093 "-3.47", modify
      label def ret2 2116 "-3.7", modify
      label def ret2 2188 "-30.6", modify
      label def ret2 2250 "-31.65", modify
      label def ret2 2353 "-33.3", modify
      label def ret2 2413 "-34.18", modify
      label def ret2 2867 "-4.89", modify
      label def ret2 3232 "-46.26", modify
      label def ret2 3424 "-5.15", modify
      label def ret2 3436 "-5.27", modify
      label def ret2 3470 "-5.62", modify
      label def ret2 3474 "-5.66", modify
      label def ret2 3481 "-5.73", modify
      label def ret2 3487 "-5.79", modify
      label def ret2 3488 "-5.8", modify
      label def ret2 3508 "-50", modify
      label def ret2 4047 "-6.59", modify
      label def ret2 4079 "-6.91", modify
      label def ret2 4652 "-7.75", modify
      label def ret2 4993 "-8.029999999999999", modify
      label def ret2 5052 "-8.619999999999999", modify
      label def ret2 5249 "-9.220000000000001", modify
      label def ret2 5300 "-9.74", modify
      label def ret2 5311 "-9.85", modify
      label def ret2 5629 ".76", modify
      label def ret2 5640 ".87", modify
      label def ret2 5645 ".92", modify
      label def ret2 5653 "0", modify
      label def ret2 5658 "1.04", modify
      label def ret2 5668 "1.14", modify
      label def ret2 5768 "10.17", modify
      label def ret2 6177 "11.68", modify
      label def ret2 6523 "12.94", modify
      label def ret2 7153 "15.47", modify
      label def ret2 7314 "16.11", modify
      label def ret2 7366 "16.67", modify
      label def ret2 7674 "18.15", modify
      label def ret2 7971 "2.06", modify
      label def ret2 8338 "22.21", modify
      label def ret2 9139 "29.61", modify
      label def ret2 9227 "3.27", modify
      label def ret2 10899 "46.84", modify
      label def ret2 11247 "50.68", modify
      label def ret2 11777 "6.14", modify
      label def ret2 11849 "6.88", modify
      label def ret2 12474 "7.89", modify
      label def ret2 12897 "8.08", modify
      label def ret2 12921 "8.33", modify
      label def ret2 12924 "8.359999999999999", modify
      market_return
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int date long(company_id market_return)
      20494  1 177
      20573  1 639
      20494  2 263
      20573  2 605
      20494  3 157
      20573  3 824
      20494  4 157
      20573  4 824
      20573  5 824
      20494  5 157
      20573  6 605
      20494  6 263
      20494  7 263
      20573  7 605
      20494  8 263
      20573  8 605
      20494  9 263
      20573  9 605
      20494 10 263
      20573 10 605
      20494 11 263
      20573 11 605
      20494 12 263
      20573 12 605
      20573 13 605
      20494 13 263
      20573 14 605
      20494 14 263
      20494 15 263
      20573 15 605
      20573 16 605
      20494 16 263
      20573 17 605
      20494 17 263
      20494 18 263
      20573 18 605
      20573 19 605
      20494 19 263
      20494 20 263
      20573 20 605
      20494 21 223
      20573 21 417
      20494 22 228
      20573 22 422
      20573 23 422
      20494 23 228
      20573 24 422
      20494 24 228
      20494 25 194
      20573 25 538
      20494 26 194
      20573 26 538
      20573 27 538
      20494 27 194
      20494 28 194
      20573 28 538
      20573 29 384
      20494 29 174
      20573 30 384
      20494 30 174
      20573 31 384
      20494 31 174
      20494 32 174
      20573 32 384
      20573 33 384
      20494 33 174
      20494 34 174
      20573 34 384
      20494 35 174
      20573 35 384
      20573 36 384
      20494 36 174
      20494 37 174
      20573 37 384
      20573 38 384
      20494 38 174
      20573 39 384
      20494 39 174
      20494 40 174
      20573 40 384
      20573 41 384
      20494 41 174
      20494 42 174
      20573 42 384
      20494 43 174
      20573 43 384
      20494 44 174
      20573 44 384
      20573 45 384
      20494 45 174
      20494 46 174
      20573 46 384
      20494 47 174
      20573 47 384
      20494 48 767
      20573 48 811
      20573 49 811
      20494 49 767
      20573 50 811
      20494 50 767
      end
      format %td date
      label values company_id company_id2
      label def company_id2 1 "CH0002178348", modify
      label def company_id2 2 "CH0012815459", modify
      label def company_id2 3 "CY0100120910", modify
      label def company_id2 4 "CY0100962113", modify
      label def company_id2 5 "CY0101162119", modify
      label def company_id2 6 "DE0005141907", modify
      label def company_id2 7 "DE0005227201", modify
      label def company_id2 8 "DE0005493514", modify
      label def company_id2 9 "DE0006048267", modify
      label def company_id2 10 "DE0006053952", modify
      label def company_id2 11 "DE0006601602", modify
      label def company_id2 12 "DE0006778905", modify
      label def company_id2 13 "DE0008086000", modify
      label def company_id2 14 "DE0008114323", modify
      label def company_id2 15 "DE000A0B95Y8", modify
      label def company_id2 16 "DE000A0MSEQ3", modify
      label def company_id2 17 "DE000A13STW4", modify
      label def company_id2 18 "DE000A1X3X33", modify
      label def company_id2 19 "DE000A1YCMM2", modify
      label def company_id2 20 "DE000A2E3707", modify
      label def company_id2 21 "DK0015312474", modify
      label def company_id2 22 "ES0105026001", modify
      label def company_id2 23 "ES0113790226", modify
      label def company_id2 24 "ES0119037010", modify
      label def company_id2 25 "FI0009006381", modify
      label def company_id2 26 "FI0009008221", modify
      label def company_id2 27 "FI0009900336", modify
      label def company_id2 28 "FI4000157235", modify
      label def company_id2 29 "FR0000030827", modify
      label def company_id2 30 "FR0000038200", modify
      label def company_id2 31 "FR0000053514", modify
      label def company_id2 32 "FR0000061053", modify
      label def company_id2 33 "FR0000063091", modify
      label def company_id2 34 "FR0000075673", modify
      label def company_id2 35 "FR0000076994", modify
      label def company_id2 36 "FR0000121881", modify
      label def company_id2 37 "FR0000125684", modify
      label def company_id2 38 "FR0004042083", modify
      label def company_id2 39 "FR0004060671", modify
      label def company_id2 40 "FR0004176535", modify
      label def company_id2 41 "FR0010213215", modify
      label def company_id2 42 "FR0010263202", modify
      label def company_id2 43 "FR0010282822", modify
      label def company_id2 44 "FR0010396119", modify
      label def company_id2 45 "FR0010921163", modify
      label def company_id2 46 "FR0011284991", modify
      label def company_id2 47 "FR0013182235", modify
      label def company_id2 48 "GB0000031285", modify
      label def company_id2 49 "GB0000282623", modify
      label def company_id2 50 "GB0000608009", modify
      label values market_return market_return2
      label def market_return2 157 "-11.87", modify
      label def market_return2 174 "-12.25", modify
      label def market_return2 177 "-12.32", modify
      label def market_return2 194 "-12.67", modify
      label def market_return2 223 "-13.859", modify
      label def market_return2 228 "-14.11", modify
      label def market_return2 263 "-16.06", modify
      label def market_return2 384 "-3.39", modify
      label def market_return2 417 "-4.039", modify
      label def market_return2 422 "-4.06", modify
      label def market_return2 538 "-5.55", modify
      label def market_return2 605 "-6.55", modify
      label def market_return2 639 "-7.1", modify
      label def market_return2 767 "-9.34", modify
      label def market_return2 811 ".72", modify
      label def market_return2 824 "1.17", modify

      Comment


      • #4
        Well, the example you show does not illustrate your problem. When save each of these files and attempt to -merge 1:1 company_id date...- Stata does the -merge- with no complaint or error message. And, at least in the example data, the two files match exactly.

        So you need to post a different example that illustrates your problem.

        That said, you have a much bigger problem with this data that you need to fix promptly. Your ret and market_return variables are all wrong. Evidently they were originally imported from some source and they came into Stata as string variables. Then somebody tried to turn them numeric by using -encode-, but that means that their values are 1, 2, 3, 4, .... etc. and not the numbers you see when you look at a -list- output or in the browser. If you try to do calculations with these variables you will get garbage. So you need to fix that promptly. Ideally, go back and re-generate both of those files, and if they are again imported as string variables, use -destring-, not -encode-. If you cannot go back to the source data, then in each of these files you need to apply -decode- to re-create the string variable and then run -destring-.

        See -help destring- and -help encode- and -help decode-.

        Comment

        Working...
        X