Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • simple m:1 merge

    Dear Stata Users

    I am performing simple m:1 merge (key variable is code)on two following datasets:

    Master File:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float date long tradenumber str6 code double(tradetime tradeprice)
    19725 560042 "1101  " 41656000  45.4
    19725 517187 "1101  " 40576000 45.35
    19725 358890 "1101  " 36976000  45.4
    19725 743572 "1101  " 46427000 45.25
    19725 344748 "1101  " 36796000  45.5
    19725 626748 "1102  " 43400000 38.15
    19725 197523 "1102  " 34503000  38.4
    19725 346654 "1102  " 36829000  38.1
    19725 613913 "1102  " 43070000  38.2
    19725 530370 "1103  " 40866000  15.7
    19725 547782 "1103  " 41317000 15.75
    19725 371040 "1103  " 37101000  15.7
    19725 331446 "070494" 36551000   .86
    19725 331454 "070973" 36551000  1.38
    end
    format %tdDD/NN/CCYY date
    format %tc_HH:MM:SS tradetime
    Using file:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 code str12 ISIN
    "1101" "TW0001101004"
    "1102" "TW0001102002"
    "1103" "TW0001103000"
    end
    Code:
     merge m:1 code using "G:\CODES.dta"
    Result # of obs.
    -----------------------------------------
    not matched 17
    from master 14 (_merge==1)
    from using 3 (_merge==2)

    matched 0 (_merge==3)
    -----------------------------------------

    Is the length of the code variable a problem? Or what am I missing here?

  • #2
    Hi Olena,
    it seems as if you have blank spaces in the code in the master file! Stata is sensitive to those, so make sure your codes really match precisely.
    I think you can do so by using
    gen new_code = substr(code, 1, 4) if substr(code, 1, 2) == "11"
    replace new_code = code if new_code==.

    and then match using this new code (ensuring that the master and using file have the same variable name).

    Comment


    • #3
      Barbara

      Thank you.

      Your code works for the sample data I've posted. In the original dataset, however, code does not necessarily have the first two numbers =="11".

      How can I change the code variable in the using file to have 2 blank spaces after 4 digits?

      Regards,
      Olena

      Comment


      • #4
        Barbara

        I think I have sorted my issue by removing trailing blanks:

        Code:
        gen code_n=strrtrim( code )
        Thank you.

        Regards,
        Olena
        Last edited by Olena Onishchenko; 19 Nov 2019, 17:17.

        Comment


        • #5
          Perfect! Glad you sorted it out!

          Comment

          Working...
          X