Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets under conditions

    Hello all!

    I need some help with a merging of two datasets.

    In the first one I have the data of birth of every individual. In the second one I have a list of programs and for each of them I have the begin date and the end date.

    I need to merge the two datasets in order to obtain for every individual the program that was in place when the individual was born.

    The condition should be: date_begin<=date_birth<=date_end

    An additional problem is that for every individual there will be two programs since at each moment there were two programs in place.

    Any suggestions to solve this?


    Thanks a lot,

    Matteo

  • #2
    This is not a -merge- problem. Use the -joinby- command, and then drop observations where date_birth is not in range.

    Comment


    • #3
      I was thinking about this command but how can I use it if the values are different?

      Comment


      • #4
        For -joinby-, you can have, but do not have to have, a common variable. So you have two data sets. Say people.dta has person-level data (ids, dates of birth), and programs.dta has program-level data (start and end date). Then it's just:

        Code:
        use people, clear
        joinby using programs
        keep if inrange(date_birth, date_begin, date_end)
        Note: Assumes that all of the date_* variables are correct Stata numeric date variables. If they are just human-readable strings, you will get incorrect results--you need to convert to Stata numeric date variables first.

        Comment


        • #5
          I did as you said but it gives me this error: "no variables in common to master and using data thus a join is not possible"...

          Comment


          • #6
            Sorry, you are right. It is -cross-, not -joinby- that is needed here:

            Code:
            use people, clear
            cross using programs
            keep if inrange(date_birth, date_begin, date_end)

            Comment


            • #7
              I read and agreed with Clyde's answer, then saw your reply. I think I noticed the same issue when I used joinby recently. I believe I resolved it by adding a constant "variable" to both files. But I now see that Clyde has provided the cross command, which I was unaware of.

              As a comment for Statlist regulars, it seems to me that joinby could subsume cross with an option to allow it to bypass the grouping. Since doing so, and for that matter using cross, could conceivably result in unnecessarily large datasets, adding a keep option akin to what's available with merge might help both joinby and cross avoid that by dropping observations on the fly. Something for the wishlist?
              Last edited by William Lisowski; 30 Oct 2015, 11:22.

              Comment


              • #8
                Thanks a lot! It worked perfectly!

                Comment

                Working...
                X