Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop through date variable checking consistency

    Hi Stata Users,

    I have a dataset with date variables as shown below and would like to loop through the dates to flag any inconsistencies in other words
    1. ActualDate1590 being after ActualDate1600, ActualDate1615, and ActualDate1620
    2. ActualDate1600 being after ActualDate1615 ActualDate1620
    3. ActualDate1615 being after ActualDate1620
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id int(ActualDate1590 ActualDate1600 ActualDate1615 ActualDate1620)
     1 21830 21837     .     .
     2 22333 22354     .     .
     3 21858 21775     .     .
     4 20997 20994 22153     .
     5     . 20746 22197     .
     6     . 20248     .     .
     7     . 20830     .     .
     8 21973 21975     .     .
     9 22434 22501     .     .
    10 22517 22510     .     .
    11 21771 21062     .     .
    12 21172 21180 22589     .
    13 21664 21703 22211     .
    14 22396 22424     .     .
    15 22386 22356     .     .
    16 22617     .     .     .
    17     . 20966     .     .
    18 22537 22436     .     .
    19 21951 21977     . 22333
    20 21951 21979 22221     .
    end
    format %tdnn/dd/CCYY ActualDate1590
    format %tdnn/dd/CCYY ActualDate1600
    format %tdnn/dd/CCYY ActualDate1615
    format %tdnn/dd/CCYY ActualDate1620
    Thanks in advance!

  • #2
    It's not entirely clear what you want here. In any case, no loop is needed. You just need to have the data in the much more usable long layout:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id int(ActualDate1590 ActualDate1600 ActualDate1615 ActualDate1620)
     1 21830 21837     .     .
     2 22333 22354     .     .
     3 21858 21775     .     .
     4 20997 20994 22153     .
     5     . 20746 22197     .
     6     . 20248     .     .
     7     . 20830     .     .
     8 21973 21975     .     .
     9 22434 22501     .     .
    10 22517 22510     .     .
    11 21771 21062     .     .
    12 21172 21180 22589     .
    13 21664 21703 22211     .
    14 22396 22424     .     .
    15 22386 22356     .     .
    16 22617     .     .     .
    17     . 20966     .     .
    18 22537 22436     .     .
    19 21951 21977     . 22333
    20 21951 21979 22221     .
    end
    format %tdnn/dd/CCYY ActualDate1590
    format %tdnn/dd/CCYY ActualDate1600
    format %tdnn/dd/CCYY ActualDate1615
    format %tdnn/dd/CCYY ActualDate1620
    
    reshape long ActualDate, i(id) j(seq)
    drop if missing(ActualDate)
    by id (seq): egen byte out_of_order = max(ActualDate > ActualDate[_n+1])
    browse if out_of_order
    This will identify all of the observations that have any of these inconsistencies and show them to you in the browser.

    The main question is what to do about it. It may be that these inconsistencies arose through errors in data management that created the data set. If so, then rather than just rearranging the order, you need to review how the data set was created, find the error (and any other errors that may be there but haven't tripped you up yet), fix that (those), and re-generate the data set.

    If, however, it is reasonable for these to have been out of order in the original data set, and you just need to put them in correct order for your purposes going forward, you can follow up the above code with:

    Code:
    by id (ActualDate), sort: replace seq = 1590 if _n == 1
    by id (ActualDate): replace seq = 1600 if _n == 2
    by id (ActualDate): replace seq = 1615 if _n == 3
    by id (ActualDate): replace seq = 1620 if _n == 4
    And if you have some compelling need to go back to wide layout, -reshape wide- will take you there. But don't do that unless you are sure you must: almost everything in Stata's data management and analysis is easier, or only possible, with long data.

    Comment


    • #3
      Code:
      gen mark1590 = ActualDate1600 <= ActualDate1590 | ///
                     ActualDate1615 <= ActualDate1590 | ///
                     ActualDate1620 <= ActualDate1590 if !missing(ActualDate1590)
                     
      gen mark1600 = ActualDate1615 <= ActualDate1600 | ///
                     ActualDate1620 <= ActualDate1600 if !missing(ActualDate1600)              
                     
      gen mark1615 = ActualDate1620 <= ActualDate1615 if !missing(ActualDate1615)
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Hi Clyde Schechter

        Thanks so much for the awesome. I appreciate you going a step forward and suggesting approaches to dealing with inconsistencies in the data.

        Thanks once again!

        Comment


        • #5
          Thank you Maarten Buis for your proposed approach. Much appreciated.

          Comment

          Working...
          X