Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting time spans to match them with variable date interview data

    Hi All,

    I spent a couple days on this problem and so I was hoping if anyone has any help as nothing I do seems quite right. I am using Stata MP 14.2.

    I have two sets of data on the same individuals, one is administrative time-span data and the other a regular longitudinal survey. It should be noted that the actual day of the interview for the survey can be anytime over an 18 month window, so each subject has different interview dates and different time between interviews.

    So I am wanting to split up particular spells so they are only associated with a particular wave of the longitudinal data.

    Here is a mock up of the data structure I am dealing with
    id start finish interview
    1 12jan2012 12feb2012 14mar2012
    1 13feb2012 31jun2013 14mar2012
    2 9aug2013 21nov2013 31oct2013
    2 22nov2013 29nov2013 31oct2013
    As you can see i have spans data for each individual over their history and for each individual I have different but constant (within id) interview data (there is a separate variable for each wave of interviews with a different date).

    What I want the data to be would look like this.
    id start finish interview wave
    1 12jan2012 12feb2012 14mar2012 0
    1 13feb2012 13mar2012 14mar2012 0
    1 14mar2012 31jun2013 14mar2012 1
    2 9aug2013 30oct2013 31oct2013 0
    2 31oct2013 21nov2013 31oct2013 1
    2 22nov2013 29nov2013 31oct2013 1
    So as you can see i am trying to split the spans, however the command
    Code:
    stsplit
    but this command seems to only split at fixed intervals of analysis time not at a variable of calendar points.

    I am open to any suggestions

    Thank you in advance

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id float(start finish interview)
    1 19004 19035 19066
    1 19036 19539 19066
    2 19579 19683 19662
    2 19684 19691 19662
    end
    format %td start
    format %td finish
    format %td interview
    
    isid id start
    gen byte transition = inrange(interview, start, finish)
    expand 2 if transition
    by id transition (start), sort: replace finish = interview-1 if _n == 1
    by id transition (start): replace start = interview if _n > 1
    by id (start), sort: gen wave = (start >= interview)
    drop transition
    In the future, when showing data examples, please use the -dataex- command to do so as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Hi Clyde,

      Was aware of this dataex, I read the guidelines.

      My data is confidential. Hence the use of an analogue. I did not know that had to be stated prior to posting.

      Your solution is much more efficient then the one i came up with since posting. I will look at implementing.

      Comment


      • #4
        I certainly understand confidentiality issues: as an epidemiologist I work under those restrictions all the time. When you need to show example data and you have confidentiality issues, the solution is to make a fake data set, one with data that is not real but close enough to represent the issues and problems you face. In fact, I imagine that is what you did to create the tableau in #1. Then, instead of typing that data into an HTML tableau, open up Stata's data editor and put that data there. Then apply -dataex- to that. It's no more time consuming for you, probably actually a bit less so. And it's a lot more helpful to those who want to try to solve your problem.

        I'm sorry for complaining, but it took me a lot longer to wrestle your tableau into Stata so I could try out my code than it took me to actually figure out, write, and test the code.

        Comment


        • #5
          No fair enough mate, I didn't realise that was the issue (although I totally get it, as anyone who ever had to extract colour coding off an excel file will know). Will keep it in mind for future posts.

          Comment

          Working...
          X