Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date formatting in Stata

    Dear all,

    I have a string variable for my date with the format "mmmyyyy" (e.g. "Jan2020") and I want to turn it into a stata date with the format 2020m1 so as to match the rest of the date variables in the merging data.

    I have tried:

    rename date datestring
    gen date = date(datestring, "MY")
    format date %tm

    but for some reason I get a float variable which for Jan2020 is 3786m4 instead of 2020m1. What am I doing wrong?

    Thank you very much in advance,

    Joan

  • #2
    you have not provided a -dataex- data sample (please read the FAQ), so I can't really give exact code but you want the "monthly" function for monthly dates; the "date" function is for daily dates; see
    Code:
    help datetime##s3
    follow the help file on what a "mask" should actually look like as one M and one Y is, again, almost certainly wrong
    Last edited by Rich Goldstein; 21 Dec 2020, 11:37.

    Comment


    • #3
      To Rich's advice let me add the following.

      Stata's "date and time" variables are complicated and there is a lot to learn. If you have not already read the very detailed Chapter 24 (Working with dates and times) of the Stata User's Guide PDF, do so now. If you have, it's time for a refresher. After that, the help datetime documentation will usually be enough to point the way. You can't remember everything; even the most experienced users end up referring to the help datetime documentation or back to the manual for details. But at least you will get a good understanding of the basics and the underlying principles. An investment of time that will be amply repaid.

      All Stata manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

      Comment


      • #4
        I would heed the advice of Rich and WIlliam, as it will pay off multiples to properly understand these concepts.

        Code:
        set obs 1
        gen have = "jan2020"
        gen want = monthly(have, "MY")
        format want %tm

        Comment


        • #5
          As Rich Goldstein pointed out, date() is for daily dates. William and Leonardo added excellent advice,

          I strongly recommend that people use, and think in terms of, the same function under its different name, daily()

          Code:
          . di daily("jan2020", "MY")
          21915
          
          . di %tm daily("jan2020", "MY")
           3786m4
          By default, given a monthly date string, daily() gives you the daily date that is the first of that month. 1 January 2020 is day 21915 on a scale in which 1 January 1960 is 0. To see that directly,


          Code:
          . di mdy(1, 1, 1960)
          0
          
          . di mdy(1, 1, 2020)
          21915
          When you assign that a monthly format, Stata takes you literally. 21915 months from January 1960 is some long years into the future.

          Comment


          • #6
            Thank you very much all, I (re-)read the documents you suggested and followed your advice, and it works perfectly.

            Thank you again!

            Joan

            Comment


            • #7
              Hello again, thank you again for all the advice above. I am trying to format some variables conditional on the date (which is formatted as "YYYYMM" (e.g. "2020m1").

              The code I am using is
              Code:
               gen VAR=0 if date < date("20200228","YMD")
              but it does not seem to be working (as suggested in some of the comments above). However, I have not managed to tweak the code based on the format of my data. I was wondering if someone knows how I should code the date() part of the command?

              Thank you very much!

              Joan

              Comment


              • #8
                What does

                it does not seem to be working
                mean? As recommended in the FAQs, provide a reproducible example and explain how it does not fulfill your expectation. I see nothing wrong with the code:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float date
                21971
                21972
                21974
                21975
                21967
                end
                format %td date
                gen VAR=0 if date < date("20200228","YMD")
                Res.:

                Code:
                . l
                
                     +-----------------+
                     |      date   VAR |
                     |-----------------|
                  1. | 26feb2020     0 |
                  2. | 27feb2020     0 |
                  3. | 29feb2020     . |
                  4. | 01mar2020     . |
                  5. | 22feb2020     0 |
                     +-----------------+

                Comment


                • #9
                  The bit we can test directly looks fine to me.

                  Code:
                  . di date("20200228","YMD")
                  21973
                  
                  . di %td 21973
                  28feb2020
                  so I think you need to expand on "does not seem to be working". If your other variable(s) is (are) monthly dates then numerically they are not comparable without adjustment.

                  That's not quite a complete picture but the results of Stata's rules are easy to explain.

                  Stata's monthly dates are 0 for January 1960 and increase or decrease by 1 for each month before or after.

                  Stata's daily dates are 0 for 1 January 1960 and increase or decrease by 1 for each day before or after.

                  So, evidently the monthly date for the month containing 1 January 1960 is equal to the daily date for 1 January 1960. Also, the monthly date for the month containing 31 December 1959 (-1) is equal to the daily date for 31 December 1959, also -1.

                  Otherwise it makes no more sense to compare monthly dates and daily dates without adjusting for units than it would to compare lengths in cm and mm without adjusting for units.

                  Comment


                  • #10
                    Sorry for not being more clear. When I run the code:
                    Code:
                    gen VAR=0 if date < date("20200228","YMD")
                    Then VAR=0 is created for all dates, including those after the 28th of February. The date variable is formatted as %tm, which I believe is what makes my command wrong but am not sure how to change it for it to work.

                    Comment


                    • #11
                      As Nick alludes, you are probably mixing up daily dates and year-month dates. The only way to give you a definitive answer is to show a data example. For monthly dates, the following is a (horrible) way to specify your condition:

                      Code:
                      gen VAR=0 if date < ym(year(date("20200228","YMD")), month(date("20200228","YMD")))
                      Better way:

                      Code:
                      gen VAR=0 if date < tm(2020m2)
                      Last edited by Andrew Musau; 09 Apr 2021, 06:30.

                      Comment


                      • #12
                        On #11

                        Code:
                        mofd(mdy(2, 28, 2020)) 
                        
                        ym(2020, 2)
                        are some other ways to do it.

                        Comment


                        • #13
                          This worked perfectly, thank you very much again!

                          Comment

                          Working...
                          X