Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time Series: Create a continuous variable based on signals from dummy

    Hello everyone,

    I have a minor problem which I hope someone can help me solve.

    I am working with data on rental cars and aim to end up with a binomial variable indicating whether a given car is active or parked.

    An example of my dataset is as follows:
    Date CarID StartContract EndContract
    01.01.2006 1 1 0
    01.02.2006 1 0 0
    01.03.2006 1 0 1
    01.04.2006 1 1 0
    01.05.2006 1 0 1
    01.01.2006 2 1 0
    01.02.2006 2 0 0
    01.03.2006 2 0 0
    01.04.2006 2 0 1
    01.05.2006 2 0 0
    01.01.2006 3 0 0
    01.02.2006 3 0 0
    01.03.2006 3 1 0
    01.04.2006 3 0 1
    01.05.2006 3 0 0
    Explanation: So for this example, Car No. 1 has been on rental from 01. January 2006 to 03. January 2006, and again from 04. January 2006 to 05. January 2006. Car No. 2 has been on rental from 01. January 2006 to 04. January 2006. Lastly, Car No. 3 only had a rental contract from 03. January 2006 to 04. January 2006

    What I would like Stata to do for me is to fill in the blanks in an "Rental column" containing a dummy variable which is assigned the value "1" if the car is on rental and "0" if the car is parked, like this:
    Date CarID StartContract EndContract On rent (1/0)
    01.01.2006 1 1 0 1
    01.02.2006 1 0 1 1
    01.03.2006 1 0 0 0
    01.04.2006 1 1 0 1
    01.05.2006 1 0 1 1
    01.01.2006 2 1 0 1
    01.02.2006 2 0 0 1
    01.03.2006 2 0 0 1
    01.04.2006 2 0 1 1
    01.05.2006 2 0 0 0
    01.01.2006 3 0 0 0
    01.02.2006 3 0 0 0
    01.03.2006 3 1 0 1
    01.04.2006 3 0 1 1
    01.05.2006 3 1 0 0
    I really hope someone can be able to help with this, and would be very grateful for any tips that can lead me to solve it.

    Regards,
    Morten
    Last edited by Morten VT; 25 Oct 2016, 03:09.

  • #2
    Hi Morten
    This is actually the wrong fora, since your question has little to with mata.
    And to boot that your 2 datasets above are not quite the same.

    I would try something like the code below using -sort-, -by- and _n
    Code:
    generate d = date(date, "DMY")
    format %tdCCYY-NN-DD d
    order date d carid
    sort carid d
    by carid: generate on_rent = startcontract if _n == 1
    by carid:replace on_rent = (on_rent[_n-1] & !endcontract[_n-1]) | (!on_rent[_n-1] & startcontract) if _n > 1
    The -replace- handles
    • that the car is on rent last date and that the contract didn't end there
    • or that the car is not rented at the last date but is rented now
    Hope it works for you
    Kind regards

    nhb

    Comment


    • #3
      Depending on whether your date in your dataset is saved as a date or not you can ignore the first 3 lines (Ignore if date is in date format)
      Kind regards

      nhb

      Comment


      • #4
        Hi Niel Henrik,

        and thank you very much for your reply, I really appreciate it. I am truly sorry for posting this in the wrong forum, I should have made more research in regards of the forum-rules as I am a new member.

        However, the code works almost perfectly for my purpose. The only problem I can address is that the "on_rent" indicator does not seem to pick up the cases where a car's rental contract ends on the same day as a new contract starts. Thus, when this happens the "on_rent" indicator only imply "0" from that date.

        Is there an easy way around that?

        You are truly of great help, as this is a major part of my Master's thesis.

        Regards,
        Morten

        Comment


        • #5
          Try
          Code:
          sort carid d startcontract
          This way registrations on the same day are sorted such that the date with the new start comes last
          Kind regards

          nhb

          Comment


          • #6
            I am sorry to bother you Niels Henrik, however I cannot seem to get this correctly.

            Here is an example for what happens using your code:
            d: carid: startcontract: endcontract on_rent
            2006-01-01 1 1 0 1
            2006-01-02 1 0 0 1
            2006-01-03 1 0 0 1
            2006-01-04 1 1 1 1
            2006-01-05 1 0 0 0
            2006-01-06 1 0 0 0
            2006-01-07 1 0 0 0
            2006-01-08 1 0 1 0
            2006-01-09 1 0 0 0

            Thus, when startcontract and endcontract both equal "1" (indicating that a rental contract is expiring the same day as another is starting), the on_rent variable does not catch the new rental contract. In other words: The contract initiated 2006-01-04 to 2006-01-08 is not being accounted for.

            You have been of big help so far, and I do understand if this is of no interest to you. However, I really appreciate you're expertise.

            Regards,
            Morten

            Comment


            • #7
              Hi Morten
              If possible I would not mix events in a 1 1 event for the same date.
              You have two events on the same date:
              1. A 0 1 event for ending the existing contract
              2. A 1 0 event for starting a new
              If this is possible then
              Code:
              sort carid d startcontract
              should work.

              If you insist on a 1 1 event then
              Code:
              by carid:replace on_rent = (on_rent[_n-1] & !endcontract[_n-1]) | (!on_rent[_n-1] & startcontract) | (startcontract & endcontract) if _n > 1
              might work.
              I haven't tested any of these suggestions.

              As you can see allowing for 1 1 events makes the logic more complex and hence more error prone
              Good luck
              Kind regards

              nhb

              Comment

              Working...
              X