Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge two datesets by ID and closest date

    Hi everyone,

    this question might be asked few times, but I did not find a solution online. I have a dataset 1, which has a date variable including business days (Mondays - Fridays) and weekends.
    Click image for larger version

Name:	123.PNG
Views:	1
Size:	10.1 KB
ID:	1689665


    and dataset 2, which has a date variable including only business days.
    Click image for larger version

Name:	234.PNG
Views:	1
Size:	10.6 KB
ID:	1689666


    I would like to merge dataset1 and dataset2 by id and date. Since dataset1's date has weekends, whenever it is the weekends, match it with last Friday or next Monday (or even the best, always the last Friday) from dataset2's date. So the merged dateset looks like this:
    Click image for larger version

Name:	234545.PNG
Views:	1
Size:	13.7 KB
ID:	1689667


    The original data has more than 1 million observations. Thank you very much for your help in advance.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 person str12 id float date str1 index
    "XXX" "A" 19731 "1"
    "XXX" "A" 19732 "2"
    "XXX" "A" 19733 "3"
    "XXX" "A" 19734 "2"
    "XXX" "A" 19735 "4"
    "XXX" "A" 19736 "3"
    "XXX" "A" 19737 "4"
    "XXX" "A" 19738 "9"
    "XXX" "A" 19739 "2"
    "XXX" "B" 19740 "3"
    "XXX" "B" 19741 "2"
    "XXX" "B" 19742 "4"
    "XXX" "B" 19743 "6"
    "XXX" "B" 19744 "4"
    "XXX" "B" 19745 "6"
    end
    format %td date
    save "data1.dta", replace

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 id int date double price
    "A" 19724 1
    "A" 19725 2
    "A" 19726 3
    "A" 19729 2
    "A" 19730 4
    "A" 19731 2
    "A" 19732 4
    "A" 19733 2
    "A" 19736 3
    "A" 19737 4
    "A" 19738 4
    "B" 19739 5
    "B" 19740 5
    "B" 19743 5
    "B" 19744 5
    "B" 19745 2
    "B" 19746 0
    "B" 19747 2
    "B" 19750 4
    "B" 19751 5
    "B" 19752 4
    "B" 19753 3
    end
    format %td date













  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 person str12 id float date str1 index
    "XXX" "A" 19731 "1"
    "XXX" "A" 19732 "2"
    "XXX" "A" 19733 "3"
    "XXX" "A" 19734 "2"
    "XXX" "A" 19735 "4"
    "XXX" "A" 19736 "3"
    "XXX" "A" 19737 "4"
    "XXX" "A" 19738 "9"
    "XXX" "A" 19739 "2"
    "XXX" "B" 19740 "3"
    "XXX" "B" 19741 "2"
    "XXX" "B" 19742 "4"
    "XXX" "B" 19743 "6"
    "XXX" "B" 19744 "4"
    "XXX" "B" 19745 "6"
    end
    format %td date
    
    tempfile file1
    save `file1', replace
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 id int date double price
    "A" 19724 1
    "A" 19725 2
    "A" 19726 3
    "A" 19729 2
    "A" 19730 4
    "A" 19731 2
    "A" 19732 4
    "A" 19733 2
    "A" 19736 3
    "A" 19737 4
    "A" 19738 4
    "B" 19739 5
    "B" 19740 5
    "B" 19743 5
    "B" 19744 5
    "B" 19745 2
    "B" 19746 0
    "B" 19747 2
    "B" 19750 4
    "B" 19751 5
    "B" 19752 4
    "B" 19753 3
    end
    format %td date
    
    * Actual codes start here:
    merge 1:1 id date using `file1', nogen
    bysort id (date): replace price = price[_n-1] if dow(date) == 6 & date - date[_n-1] == 1
    bysort id (date): replace price = price[_n-2] if dow(date) == 0 & date - date[_n-2] == 2
    keep if !missing(person)
    Results:
    Code:
         +-----------------------------------------+
         | id        date   price   person   index |
         |-----------------------------------------|
      1. |  A   08jan2014       2      XXX       1 |
      2. |  A   09jan2014       4      XXX       2 |
      3. |  A   10jan2014       2      XXX       3 |
      4. |  A   11jan2014       2      XXX       2 |
      5. |  A   12jan2014       2      XXX       4 |
      6. |  A   13jan2014       3      XXX       3 |
      7. |  A   14jan2014       4      XXX       4 |
      8. |  A   15jan2014       4      XXX       9 |
      9. |  A   16jan2014       .      XXX       2 |
         |-----------------------------------------|
     10. |  B   17jan2014       5      XXX       3 |
     11. |  B   18jan2014       5      XXX       2 |
     12. |  B   19jan2014       5      XXX       4 |
     13. |  B   20jan2014       5      XXX       6 |
     14. |  B   21jan2014       5      XXX       4 |
     15. |  B   22jan2014       2      XXX       6 |
         +-----------------------------------------+
    Last edited by Ken Chui; 16 Nov 2022, 06:12.

    Comment


    • #3
      Originally posted by Ken Chui View Post
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str9 person str12 id float date str1 index
      "XXX" "A" 19731 "1"
      "XXX" "A" 19732 "2"
      "XXX" "A" 19733 "3"
      "XXX" "A" 19734 "2"
      "XXX" "A" 19735 "4"
      "XXX" "A" 19736 "3"
      "XXX" "A" 19737 "4"
      "XXX" "A" 19738 "9"
      "XXX" "A" 19739 "2"
      "XXX" "B" 19740 "3"
      "XXX" "B" 19741 "2"
      "XXX" "B" 19742 "4"
      "XXX" "B" 19743 "6"
      "XXX" "B" 19744 "4"
      "XXX" "B" 19745 "6"
      end
      format %td date
      
      tempfile file1
      save `file1', replace
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str12 id int date double price
      "A" 19724 1
      "A" 19725 2
      "A" 19726 3
      "A" 19729 2
      "A" 19730 4
      "A" 19731 2
      "A" 19732 4
      "A" 19733 2
      "A" 19736 3
      "A" 19737 4
      "A" 19738 4
      "B" 19739 5
      "B" 19740 5
      "B" 19743 5
      "B" 19744 5
      "B" 19745 2
      "B" 19746 0
      "B" 19747 2
      "B" 19750 4
      "B" 19751 5
      "B" 19752 4
      "B" 19753 3
      end
      format %td date
      
      * Actual codes start here:
      merge 1:1 id date using `file1', nogen
      bysort id (date): replace price = price[_n-1] if dow(date) == 6 & date - date[_n-1] == 1
      bysort id (date): replace price = price[_n-2] if dow(date) == 0 & date - date[_n-2] == 2
      keep if !missing(person)
      Results:
      Code:
      +-----------------------------------------+
      | id date price person index |
      |-----------------------------------------|
      1. | A 08jan2014 2 XXX 1 |
      2. | A 09jan2014 4 XXX 2 |
      3. | A 10jan2014 2 XXX 3 |
      4. | A 11jan2014 2 XXX 2 |
      5. | A 12jan2014 2 XXX 4 |
      6. | A 13jan2014 3 XXX 3 |
      7. | A 14jan2014 4 XXX 4 |
      8. | A 15jan2014 4 XXX 9 |
      9. | A 16jan2014 . XXX 2 |
      |-----------------------------------------|
      10. | B 17jan2014 5 XXX 3 |
      11. | B 18jan2014 5 XXX 2 |
      12. | B 19jan2014 5 XXX 4 |
      13. | B 20jan2014 5 XXX 6 |
      14. | B 21jan2014 5 XXX 4 |
      15. | B 22jan2014 2 XXX 6 |
      +-----------------------------------------+
      Thanks a lot.

      Comment

      Working...
      X