Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing date format completely changes the date

    Hi,

    I am at a complete loss. I have a dataset with dates in the format %td (variable name: månedhunt2lq2). I want to change them to only include month and year, as I am comparing them to a dataset that only include month and year, not the date. Those dates are in the format %tm (seen as eg "2007m5). I tried this code: "format månedhunt2lq2 %tm", but that completely changed the dates. For example, 31jan1997 (13545days since 01jan1960) changed into 3088m10. I then tried a code I found elsewhere on this forum: "gen month = month(dofm( månedhunt2lq2 ))" & also "gen year = year(dofm( månedhunt2lq2 ))". This also gives 10 (october) and year 3088. I think it happens because STATA then count 13545 months since jan 1960, but I really don't know how to procede.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(månedhunt2lq2 month year)
    13545 10 3088
    13654 11 3097
    13732  5 3104
    13178  3 3058
    13728  1 3104
    13426 11 3078
    13262  3 3065
    13107  4 3052
    13492  5 3084
    13612  5 3094
    13254  7 3064
    13464  1 3082
    13412  9 3077
    13621  2 3095
        .  .    .
    13149 10 3055
    13550  3 3089
    13419  4 3078
        .  .    .
    13077 10 3049
    12833  6 3029
    13277  6 3066
    13086  7 3050
    13674  7 3099
    13119  4 3053
    13151 12 3055
    13314  7 3069
        .  .    .
    13305 10 3068
        .  .    .
    13171  8 3057
    13502  3 3085
    13473 10 3082
    13548  1 3089
        .  .    .
    13295 12 3067
        .  .    .
    13045  2 3047
    13128  1 3054
    13171  8 3057
    13420  5 3078
    13312  5 3069
    13747  8 3105
    13084  5 3050
    13092  1 3051
    13591  8 3092
        .  .    .
    13131  4 3054
    13157  6 3056
    13681  2 3100
    13676  9 3099
    13041 10 3046
        .  .    .
    13412  9 3077
    13325  6 3070
    12810  7 3027
        .  .    .
    13193  6 3059
    13448  9 3080
    13179  4 3058
    13033  2 3046
    13225  2 3062
    13232  9 3062
    13559 12 3089
        .  .    .
        .  .    .
        .  .    .
    13189  2 3059
    13419  4 3078
        .  .    .
    13089 10 3050
    13204  5 3060
    13129  2 3054
    13536  1 3088
    13490  3 3084
        .  .    .
        .  .    .
    13292  9 3067
        .  .    .
    13095  4 3051
    13585  2 3092
    13036  5 3046
    13226  3 3062
    13125 10 3053
    13591  8 3092
    13231  8 3062
    13565  6 3090
    13539  4 3088
    13067 12 3048
    13118  3 3053
    13199 12 3059
    13468  5 3082
    13414 11 3077
    13204  5 3060
    13660  5 3098
    13530  7 3087
    13169  6 3057
        .  .    .
    13031 12 3045
    13537  2 3088
    end
    format %td månedhunt2lq2


  • #2
    Code:
    format %tdMon_CCYY månedhunt2lq2
    As you suspected, the %tm format is only for use with dates that are coded as number of months since January 1960. What you want to do is use a %td format that is modified to just show the month and year. That is what the code above does. Read -help datetime display formats- for more details on displaying Stata date variables.

    Added: Be careful, though, in using this. This code does not create a monthly date variable. All that it has done is changed what Stata shows you. If you calculate the difference between two values of this variable, it is still the number of days, not the number of months. If you use time-series operators on it, the first lag of this variable is a variable whose value is the immediately preceding day, not the preceding month.

    If you want a genuine monthly variable, one that gives number of months between when subtracting, and whose first lag is the previous month, that cannot be accomplished by changing the display format. For that you need a different approach:
    Code:
    gen mdate = mofd(månedhunt2lq2)
    format mdate %tm
    See -help datetime functions- (N.B. not datetime display functions) for information about other ways of transforming and creating date variables.

    Yes, it's a lot of information to absorb. I don't think anybody ever fully absorbs and retains it all. I have a few favorite datetime display formats that I like and use regularly. When I need to use anything else, I always have to go back to the help file to check the details.

    Note: Added material crossed with #3.
    Last edited by Clyde Schechter; 04 Apr 2023, 12:33.

    Comment


    • #3
      The numeric values that are stored are unchanged by changing the format; what you see is changed as a matter of presentation only.

      That is a subtle difference and hence this is a common misunderstanding, common enough for someone to think it worth writing a short paper on it, as at https://www.stata-journal.com/articl...article=dm0067

      Further: what you have is already a daily date variable and the technique to get a monthly date out of it is, and is only, to push it through mofd() and then assign a monthly date display format.

      What you did is first say to Stata: this date variable is really a monthly date variable, so make a daily date variable out of it, and then make a month of year variable out of THAT. But the first step is wrong and the other steps don't do anything except mangle the data further, as you found. The second command to get a year out of the variable makes the same mistake initially.

      This code should be enough to demonstrate technique:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(månedhunt2lq2 month year)
      12810  7 3027
      12833  6 3029
      13031 12 3045
      13033  2 3046
      13036  5 3046
      13041 10 3046
      13045  2 3047
      13067 12 3048
      13077 10 3049
      13084  5 3050
      13086  7 3050
      13089 10 3050
      13092  1 3051
      13095  4 3051
      13107  4 3052
      13118  3 3053
      13119  4 3053
      13125 10 3053
      13128  1 3054
      13129  2 3054
      end
      
      
      gen ydate = year(månedhunt2lq2)
      gen mdate = mofd(månedhunt2lq2)
      
      format mdate %tm 
      
      
           +-------------------------------------------+
           | månedh~2   month   year   ydate     mdate |
           |-------------------------------------------|
        1. |    12810       7   3027    1995    1995m1 |
        2. |    12833       6   3029    1995    1995m2 |
        3. |    13031      12   3045    1995    1995m9 |
        4. |    13033       2   3046    1995    1995m9 |
        5. |    13036       5   3046    1995    1995m9 |
           |-------------------------------------------|
        6. |    13041      10   3046    1995    1995m9 |
        7. |    13045       2   3047    1995    1995m9 |
        8. |    13067      12   3048    1995   1995m10 |
        9. |    13077      10   3049    1995   1995m10 |
       10. |    13084       5   3050    1995   1995m10 |
           |-------------------------------------------|
       11. |    13086       7   3050    1995   1995m10 |
       12. |    13089      10   3050    1995   1995m11 |
       13. |    13092       1   3051    1995   1995m11 |
       14. |    13095       4   3051    1995   1995m11 |
       15. |    13107       4   3052    1995   1995m11 |
           |-------------------------------------------|
       16. |    13118       3   3053    1995   1995m12 |
       17. |    13119       4   3053    1995   1995m12 |
       18. |    13125      10   3053    1995   1995m12 |
       19. |    13128       1   3054    1995   1995m12 |
       20. |    13129       2   3054    1995   1995m12 |
           +-------------------------------------------+
      
      .
      So, as you know your month and year variables are just garbage.

      Don't blame Stata for this and don't blame yourself. Dates and times are complicated and the fact that we all knew most of what we need to know by about age 11 doesn't help at all, as dates and times come in many different forms and there are many human and computer conventions to handle them.

      Comment


      • #4
        Thank you Clyde and Nick!

        This code was exactly what I needed:
        gen mdate = mofd(månedhunt2lq2) format mdate %tm

        Comment


        • #5
          Hi again,

          I have now encountered a very strange issue! This did not happen when I used this code a year ago.

          I am working on a new dataset, and used the code Clyde offered here previously. (changing a date-month-year variable to a month-year variable). Everything looks correct, with the number being months since 1960, and after formatting shows the correct month and year. But, when I try to do some math comparing it to another month-year variable, it does not work. I was wondering if that was due to the formatting, so I made two new variable that are exactly the same, but without fromatting (%tm). (code: gen newvar1=oldvar1, & gen newvar2=oldvar2). Now the math works. I was curious to see if the formatting really was the problem, som I formatted the two new variables, and doing the math still works. Somehow the first variable made with the code "gen mdate = mofd(månedhunt2lq2)" cannot be use, but making an exact copy (code: gen newvar1=oldvar1) fixes the problem.

          Why STATA, why? :P

          Any ideas?

          Comment


          • #6
            I can't see what the problem is in #5. As you imply, copying the variables shouldn't make a difference, and the point already made is that display format doesn't matter beyond the intent, of tweaking what you see displayed.

            Unfortunately, "does not work" does not tell us precisely what went wrong.

            Comment


            • #7
              Hi Nick,

              I was trying to figure out how many were operated before the date of inclusion. Below you see that the result was that no one had a surgery date after inclusion, which I knew was incorrect.
              Click image for larger version

Name:	til stataforum1.PNG
Views:	1
Size:	8.5 KB
ID:	1742384


              Further I made the tho new variables based on inclusiondate and surgery date, and suddenly I get the correct results. I tried with the original variables again, and as you can see in the pictures that gives me another result
              Click image for larger version

Name:	til stataforum2.PNG
Views:	1
Size:	13.8 KB
ID:	1742385

              Comment


              • #8
                I can't see any of your dates, but it's clear that you have many missing values. Now missing counts as larger than any non-missing value and so some of your tables are bloated by missing values.

                If you go

                Code:
                tab masterop if inclusiondate > masteropdate &  !missing(inclusiondate, masteropdate)
                that should help. Some tabulations feature inclusionmonth and others inclusiondate, so results should be different unless the variables are the same.

                Also if you're comparing monthly dates and daily dates, Stata won't adjust on the fly for you.

                Comment


                • #9
                  Masterop has many missing, while inclusionmonth has none. Both are monthly dates, not daily. Adding "& !missing(inclusiondate, masteropdate)" didn't change the result.

                  If you look in the last picture you see that test=inclusionmonth (without any changes) and test2=masteropdate (without any changes). But using test and test2 instead of inclusiondate and masteropdate completely changes the result. Btw, using masteropdate instead of test2 also works, so the only variable that acts strangely is "inclusionmonth", which I made from a daily date using this code:

                  gen inclusionmonth = mofd( inclusiondate )
                  format inclusionmonth %tm


                  I just made another similar variable, with the same code, where everything works perfectly right away, so I think this might just be some sort of weird thing that happened to "inclusionmonth".



                  EDIT! Oh man! I just figured it out. I have interchanged "inclusiondate" and "inclusionmonth" by mistake. I even see it now all over my text. Sorry for wasting your time!
                  Last edited by Tina Rosland; 07 Feb 2024, 08:15.

                  Comment


                  • #10
                    But "some sort of weird thing" must have an explanation. Someone else may have a better idea.

                    Comment

                    Working...
                    X