Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Converting a 'float' date variable into proper date format

    I have been playing around with this for a while and do not seem to be arriving at a correct solution.

    I have a float variable storing dates of the format '%tm.. '. For example, one value is 'November 2005', another is 'March 2006', that is, there is a space between the month and year and there is no date.

    I want to convert the variable into a proper date format. I played with the substr() and tostring() options for a while, but to no avail.

    Any ideas would be very much appreciated.


  • #2
    Please define proper date format.

    Comment


    • #3
      A format that would make it more readable than ´November 2005´, like the format you'd expect when running a date() function. There is no day information, only month and year, which was making it harder to find a proper function with which I can do the conversion correctly. I tried using substr() and using the below logic:
      1. Transform the date in string, using the tostring function
      2. Gen a variable called length, equal to the length of the string containing the date.
      3. Generate two variables: month, year. Depending on the length of the string, the day can be in the 7th position, and having a length of 1 (ex. 6 November 1995), or in the 6th position and having a length of 2 (ex. 22 June 1999)
      4. Use the function “mdy” to link day, month and year together.
      However, no transformation worked correctly, so I wonder if my approach was correct.

      For context, this will be a time series type of analysis/regression, so I'd have to have the date variable stored in the most user-friendly possible way.

      Thanks for your help in advance again!

      Comment


      • #4
        If there is no information about day in the data set, then where do the 6 and 22 in 6 November 1995 and 22 June 1999 come from?

        What is unreadable about November 1995?

        Comment


        • #5
          I suspect you want (1) or maybe (2):

          Code:
          . clear
          
          . set obs 12
          Number of observations (_N) was 0, now 12.
          
          .
          . /* (1) generate a float tm date and give it a pretty custom format */
          . gen float t = _n + 755
          
          . format t %tm
          
          . head
               2023m1  
               2023m2  
               2023m3  
               2023m4  
               2023m5  
               2023m6  
               2023m7  
               2023m8  
               2023m9  
              2023m10  
          
          . format t %tm_M_CY
          
          . head
                January 2023  
               February 2023  
                  March 2023  
                  April 2023  
                    May 2023  
                   June 2023  
                   July 2023  
                 August 2023  
              September 2023  
                October 2023  
          
          .
          . /* (2) generate a string date, convert it to tm, and give it a pretty custom format */
          . tostring t, usedisplayformat gen(string_t) force
          string_t generated as str14
          string_t was forced to string; some loss of information
          
          . gen t2 = monthly(string_t,"MY")
          
          . format t2 %tm_M_CY
          
          . list if t >= tm(2023m3), clean noobs
          
                           t         string_t               t2  
                  March 2023       March 2023       March 2023  
                  April 2023       April 2023       April 2023  
                    May 2023         May 2023         May 2023  
                   June 2023        June 2023        June 2023  
                   July 2023        July 2023        July 2023  
                 August 2023      August 2023      August 2023  
              September 2023   September 2023   September 2023  
                October 2023     October 2023     October 2023  
               November 2023    November 2023    November 2023  
               December 2023    December 2023    December 2023

          Comment


          • #6
            Two details, one of some importance in that users (it seems) are often misled by it -- and the other just terminology.

            tostring is a command, not a function. That's just terminology, but it's Stata's terminology. In Stata commands and functions are disjoint.

            The function name date() goes back to Stata's introduction of support of daily dates as the first date type given special support. (Years don't usually need special support.) And it matches common usage insofar as a question in conversation What's the date? will usually elicit a daily date like 18 June (the year usually being presumed familiar). But date() as a function has no bearing on other kinds of date, such as monthly dates. StataCorp introduced daily() as a equivalent name some versions back and I wish they would use it more. I certainly try to use it in expositions. There could no question of withdrawing date() as doing that would break countless commands and do-files. Using daily() instead has negligible downsides, the only one that is serious being that it won't work for very old versions of Stata -- and if so, users can easily edit the code.

            Comment


            • #7
              Many thanks, all.

              Apologies, when giving my own example, as in, ´5 November 2022´, I did insert a date. However, my actual dataset does not contain dates - apologies for the confusion, I was giving a random example.

              What I think is 'unclear' is that my date variable is a float. Hence, I want to convert it to string format while keeping the same information on month and year since I have no way of obtaining information regarding exact dates. Perhaps it is not needed?

              Comment


              • #8
                Like others, I guess, I am still struggling mightily to see what the problem is here. The word format is overloaded in computing but in this Stata context the leading sense is display format. Evidently your date variable is numeric (float) -- that is a variable or storage type, not a format -- but it has a display format that lets you or anybody else see (e.g.) "November 2022" rather than 754. So a numeric date variable with a display format that shows the date in human-readable terms gives the best of both worlds.

                You can also get a string variable version of that using tostring or indeed strofreal() but there is no obvious point to doing so. The string variable is useless for most statistical or graphical purposes and holds the same information any way.
                Last edited by Nick Cox; 18 Jun 2023, 05:13.

                Comment


                • #9
                  Okay, I will explain in more detail.

                  I have a dataset that contains counts (Nr_attend) of the number of attendances at Cardiology outpatient clinics for 11 different health board areas (HB_n) in Scotland from March 2014 until June 2018. Health board areas mark different regions in Scotland. From January 2016 onwards a Scotland-wide intervention targeted at providing more services in GP practices was introduced with the aim to reduce the number of Cardiology outpatient appointments and attendances. I need to preprocess the data and then produce graphs that track the development of number of attendances over time for: * i) two health board areas of your choice (HB_n) * ii) Scotland overall.

                  So, in order to do that, I thought I would need to follow the below steps:

                  Convert the date variable to a Stata date format if it is not already in that format.
                  Code:
                  gen date = date(mydate, "DMY")
                  Generate a variable for the year and month from the date variable. This will help in aggregating the attendances by month and year. You can use the ym() function to extract the year and month. For example:
                  Code:
                  gen year = year(date)
                  gen month = month(date)
                  Calculate the total number of attendances for each health board area and for Scotland overall, grouped by year and month.
                  Code:
                  collapse (sum) Nr_attend, by(year month HB_n)
                  Generate separate datasets for the two health board areas I want to analyze and for Scotland overall.
                  Code:
                  keep if HB_n == "HealthBoard1"
                  save "healthboard1_data.dta", replace
                  keep if HB_n == "HealthBoard2"
                  save "healthboard2_data.dta", replace
                  save "scotland_data.dta", replace
                  To generate/extract month and year, I need a string date variable. So, my float 'mydate' containing just year and month (e.g., 'November 2005') does not seem to be suited.

                  I am trying to figure out how to extract the month and year separately so as to then perform the required time series analysis.

                  Many thanks again!
                  Last edited by Angelina Kancheva; 19 Jun 2023, 05:49.

                  Comment


                  • #10
                    To generate/extract month and year, I need a string date variable.
                    Not so, unless you mean that your data arrive in that form. To extract year and month you can start with a Stata daily date variable, which is necessarily numeric.

                    Also, a monthly date variable is equivalent to the combination of year and month, for which you could use mofd().

                    Further, different statements here remain contradictory or at least puzzling.

                    1.

                    Code:
                     
                     gen date = date(mydate, "DMY")
                    won't workunless mydate contains date information. But you also state that mydate contains values like "November 2005" with no day explicit. Possibly there is a resolution of this that the display format suppresses the day of the month. 2.

                    I am trying to figure out how to extract the month and year separately
                    But your post shows that you know how to do this, namely use year() and month() to pull year and month out of a numeric daily date.

                    So far as I can tell, all the information you need is contained in your own posts or in answers in this thread. Sorry, but I am at a loss to know what else you still need.

                    Comment


                    • #11
                      My general confusion stems from the fact that I need to perform time series analysis using this float variable as my time variable, which only contains information on the year and month, but not exact date.

                      Comment


                      • #12
                        I'm still a bit confused about exactly what sort of variable you are starting with, and I'm going to assume you have a numeric variable that Stata can interpret as a monthly date. I'm also going to infer that perhaps what you are looking for is a variable that will have a string for the month, such as "January", "February" etc.

                        If both those are true, then the following code may prove useful: (the bit before the gen commands can be ignored, it is only used to create an example):

                        Code:
                        clear
                        set obs 10
                        gen t = 500 + _n
                        format t %tm_M_CY
                        
                        gen month_str = word(strofreal(t,"%tm_M_CY"),1)
                        gen year = year(dofm(t))
                        which produces:

                        Code:
                        . list, noobs sep(0) abbrev(10)
                        
                          +----------------------------------+
                          |             t   month_str   year |
                          |----------------------------------|
                          |  October 2001     October   2001 |
                          | November 2001    November   2001 |
                          | December 2001    December   2001 |
                          |  January 2002     January   2002 |
                          | February 2002    February   2002 |
                          |    March 2002       March   2002 |
                          |    April 2002       April   2002 |
                          |      May 2002         May   2002 |
                          |     June 2002        June   2002 |
                          |     July 2002        July   2002 |
                          +----------------------------------+

                        Comment


                        • #13
                          My variable, 'mydate', is a float variable. An example value is 'November 2014' where each observation contains a year between 2014 and 2018 and a month. No date information.

                          Do I need to convert it in order to run a time series analysis using this variable as my time variable?

                          And many thanks all for your continued advice.

                          Comment


                          • #14
                            No, you don't need to convert it.

                            Consider this:

                            Code:
                            . clear
                            
                            . set seed 123
                            
                            . set obs 100
                            Number of observations (_N) was 0, now 100.
                            
                            . gen y = runiformint(1,30)
                            
                            . gen float x = 500 + _n
                            
                            . format x %tm_M_CY
                            
                            . tsset x
                            
                            Time variable: x, October 2001 to January 2010
                                    Delta: 1 month
                            
                            . gen month = month(dofm(x))
                            
                            . gen year = year(dofm(x))
                            
                            . list in 1/10, noobs
                            
                              +-----------------------------------+
                              |  y               x   month   year |
                              |-----------------------------------|
                              | 11    October 2001      10   2001 |
                              | 18   November 2001      11   2001 |
                              | 24   December 2001      12   2001 |
                              |  7    January 2002       1   2002 |
                              |  7   February 2002       2   2002 |
                              |-----------------------------------|
                              | 10      March 2002       3   2002 |
                              | 27      April 2002       4   2002 |
                              | 16        May 2002       5   2002 |
                              |  9       June 2002       6   2002 |
                              | 19       July 2002       7   2002 |
                              +-----------------------------------+
                            
                            . regress y L2.x i.month i.year
                            note: 2010.year omitted because of collinearity.
                            
                                  Source |       SS           df       MS      Number of obs   =        98
                            -------------+----------------------------------   F(20, 77)       =      0.72
                                   Model |  1129.54337        20  56.4771684   Prob > F        =    0.7931
                                Residual |    6032.875        77   78.349026   R-squared       =    0.1577
                            -------------+----------------------------------   Adj R-squared   =   -0.0611
                                   Total |  7162.41837        97  73.8393646   Root MSE        =    8.8515
                            
                            ------------------------------------------------------------------------------
                                       y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                                       x |
                                     L2. |  -.1516204   .1229375    -1.23   0.221    -.3964202    .0931794
                                         |
                                   month |
                                      2  |   -1.84838   4.406927    -0.42   0.676    -10.62369    6.926933
                                      3  |  -1.196759   4.391467    -0.27   0.786    -9.941288    7.547769
                                      4  |   .2048611   4.379405     0.05   0.963    -8.515648    8.925371
                                      5  |   .6064815   4.370769     0.14   0.890    -8.096831    9.309794
                                      6  |  -3.241898   4.365579    -0.74   0.460    -11.93488     5.45108
                                      7  |  -2.715278   4.363848    -0.62   0.536    -11.40481    5.974253
                                      8  |  -4.063657   4.365579    -0.93   0.355    -12.75664    4.629321
                                      9  |   1.712963   4.370769     0.39   0.696     -6.99035    10.41628
                                     10  |  -4.760417   4.379405    -1.09   0.280    -13.48093    3.960093
                                     11  |  -2.733796   4.391467    -0.62   0.535    -11.47832    6.010732
                                     12  |   .2928241   4.174447     0.07   0.944    -8.019562     8.60521
                                         |
                                    year |
                                   2002  |  -5.909722   8.743267    -0.68   0.501    -23.31979    11.50035
                                   2003  |  -4.756944   7.961566    -0.60   0.552    -20.61045    11.09656
                                   2004  |    -2.4375   7.394666    -0.33   0.743    -17.16216    12.28716
                                   2005  |  -1.951389   7.094249    -0.28   0.784    -16.07784    12.17507
                                   2006  |   1.784722   7.094249     0.25   0.802    -12.34173    15.91118
                                   2007  |   1.604167   7.394666     0.22   0.829     -13.1205    16.32883
                                   2008  |   4.756944   7.961566     0.60   0.552    -11.09656    20.61045
                                   2009  |   13.40972   8.743267     1.53   0.129    -4.000346    30.81979
                                   2010  |          0  (omitted)
                                         |
                                   _cons |   99.66898   67.93672     1.47   0.146    -35.61032    234.9483
                            ------------------------------------------------------------------------------

                            Comment


                            • #15
                              Many thanks!

                              Comment

                              Working...
                              X