Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming a time variable into a categorical time variable in order to apply "if" command"

    Hello Stata users;

    I have this code:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 Code int(Nombre_Voyageurs_Mois_Ligne Mois) float Voyageurs_par_Ligne int Distance float Temps_de_Trajet
    "170"  526 744   8279 160 180
    "512" 1212 744 102064 541 475
    "178"   27 744    510 556 565
    "503" 1003 744  70957 563 570
    "129"    9 744    423 165 180
    "189" 3307 744  84738 551 525
    "133"   54 744   3038 181 210
    "101" 2092 744  82755 370 390
    "142"  948 744  12672 590 480
    "138"  409 744  28700 172 180
    "152" 2000 744  54501 171 195
    "106"  191 744   2508 291 285
    "118" 1623 744  51833 207 210
    "107" 2330 744  74227 323 310
    "502" 1012 744  86735 590 600
    "506" 2530 744  81870 509 510
    "113" 3311 744 105130 246 270
    "105" 2419 744  78207 270 240
    "135" 2248 744  64820 286 300
    "122" 2197 744  84414 202 230
    "175"    3 744    453 248 210
    "117"   34 744    777 520 480
    "300" 2232 744  58633 160 165
    "146"   18 744   6382 471 490
    "510"    2 744   4261 452 450
    "136"   50 744    265 323 300
    "103"    4 744  20402 404 400
    "134"    4 744    238 323 330
    "110"  754 744  18088 226 240
    "168"    7 744    372 186 300
    "115" 2227 744  77793 207 220
    "183" 2941 744  82723 465 420
    "116"    3 744   2353 286 300
    "109"   32 744    569 172 180
    "123" 2822 744  90892 212 245
    "151"    8 744   9720 181 210
    "131" 1845 744  56519 157 165
    "125"    1 744    235 177 195
    "145"  879 744  39787 200 225
    "174"  337 744   8466 227 230
    "139" 2843 744  94037 172 180
    "141" 2517 744  77068 232 270
    "124"  111 744  10299 181 210
    "509" 1616 744  71740 446 405
    "508" 1848 744  75901 333 360
    "518" 2953 744  80214 594 530
    "194" 3018 744  81208 605 570
    "219"    9 744    425 554 360
    "149" 3540 744  95732 485 435
    "173"  117 744  25727 160 180
    "182"    7 744    540 288 270
    "507"  495 744  36231 547 495
    "162" 3189 744  82181 397 450
    "167"   20 744    420 323 300
    "153"    3 744    214 171 195
    "192"  468 744  13568 232 270
    "169"   61 744   6882 202 240
    "102" 3895 744 107070 488 510
    "501" 1092 744  82224 487 445
    "143" 1443 744  84410 465 420
    "321"    1 744    130 160 165
    "119"  185 744   1620 172 180
    "100" 3146 744  88500 370 280
    "104" 2220 744  63134 563 510
    "517" 1572 744  88206 606 660
    "179" 3183 744 106033 696 660
    "176"   89 744   1611 214 240
    "504" 1105 744  69920 551 530
    "108" 3155 744  91167 246 270
    "164" 2012 744  74352 246 250
    "209" 4110 744  88627 502 465
    "114" 2013 744  80847 232 270
    "206" 2018 744  62786 177 195
    "182"    6 745    540 288 270
    "149" 3443 745  95732 485 435
    "168"    1 745    372 186 300
    "119"   39 745   1620 172 180
    "518" 2880 745  80214 594 530
    "162" 2502 745  82181 397 450
    "209" 3696 745  88627 502 465
    "124"   58 745  10299 181 210
    "507"  599 745  36231 547 495
    "135" 2382 745  64820 286 300
    "113" 3634 745 105130 246 270
    "502" 1407 745  86735 590 600
    "106"    5 745   2508 291 285
    "134"    6 745    238 323 330
    "504" 1529 745  69920 551 530
    "175"   21 745    453 248 210
    "176"    6 745   1611 214 240
    "138"  581 745  28700 172 180
    "153"    4 745    214 171 195
    "189" 3050 745  84738 551 525
    "132"    1 745  21312 149 160
    "503" 1320 745  70957 563 570
    "151"   23 745   9720 181 210
    "194" 2414 745  81208 605 570
    "108" 2894 745  91167 246 270
    "167"   18 745    420 323 300
    "164" 2111 745  74352 246 250
    end
    format %tmMonth_CCYY Mois
    As you can see, the third variable "Mois" is a time variable, yet its type is int, which means that it doesn't show as a "month-year" variable in that data example already, and I can apply an "if" command on it to generate another variable.

    My goal is to keep that time variable as it is, but to convert it in a way to generate a variable called "season" which applies a season number for each group of months.

    Is there any help? With thanks!

  • #2
    Aziz:
    youmay want to consider what follows:
    Code:
    . gen year=year( Mois )+60
    
    . bysort year: gen wanted=quarter( Mois )
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      To extend on Carlo's answer: There is a whole lot you can do with date and time variables, see help datetime and don't forget the pdf-manual (link near the top of the helpfile) which contains even more information.
      Last edited by Maarten Buis; 26 Mar 2024, 04:12.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Carlo Lazzaro Hi, and thanks for the response;

        Perhaps I was a bit unclear about my explanation, since your advice generates the year, but not the month. What I want to do is to convert the "Mois" varibla in a format where it would give me the possibility to use the "if" command to generate a new variable out of that same "Mois" variable, so I don't know if it could be possible to convert it or to generate a new one.

        Thanks!

        Comment


        • #5
          The if command exists in Stata, but you probably mean the if condition. You don't need to create a variable for that, as you can often use the functions (which are also different from commands, just to add more confusing terminology...) directly inside the if condition. Say you only want to do some action for the observations in summer then you type some_command some_variables if inlist(month(Mois), 6,7,8), where some_command is whatever command you like, and some_variables are whatever variables you like.

          Alternatively you can do it like so:
          Code:
          local summer inlist(month(Mois), 6,7,8)
          some_command some_variables if `summer'
          Or you can create the variables first if you like. Especially if if conditions get complicated, it can make a big difference to split the condition up in several smaller steps, and intermediate variables can help with that.
          Code:
          gen byte summer = inlist(month(Mois),6,7,8) if !missing(Mois)
          some_command some_variables if summer == 1
          Last edited by Maarten Buis; 26 Mar 2024, 05:24.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            I agree with the spirit of @Maarten Buis's answer as usual, but note that month() feeds on daily dates, not monthly dates.

            745 is an example monthly date. It corresponds to February 2022 and the month of the year it belongs can be extracted by using month(mofd()) or directly
            using mod(). Other way round month() alone will usually give you a wrong answer unless you feed it a daily date.

            Code:
            . di %tmMonth_CCYY 745
             February 2022
            
            . di month(745)
            1
            
            di %td 745
            15jan1962
            
            . di month(dofm(745))
            2
            
            di mod(745, 12) + 1 
            2
            I can't see any need to change your monthly data to anything else, although I don't know what a "categorical time variable" would look like in Stata. Perhaps you have in mind some entity defined in other software.

            Comment


            • #7
              @Maarten Buis Hello and Thanks for the response;

              Exactly, your response seems the most suitable for my request, thanks for that. Still, in the mean time, I did have the chance to clean my data, so I've solved some of the problem. Here's the cleaned data (if you want to call it that way):

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(Code_Ligne Nombre_Voyageurs_Mois__Ligne) str14 Mois long Voyageurs_par_Ligne int(Distance_en_Kms Temps_du_Trajet_en_Minutes)
              170  526 "January 2022"    8279 160 180
              512 1212 "January 2022"  102064 541 475
              178   27 "January 2022"     510 556 565
              503 1003 "January 2022"   70957 563 570
              129    9 "January 2022"     423 165 180
              189 3307 "January 2022"   84738 551 525
              133   54 "January 2022"    3038 181 210
              101 2092 "January 2022"   82755 370 390
              142  948 "January 2022"   12672 590 480
              138  409 "January 2022"   28700 172 180
              152 2000 "January 2022"   54501 171 195
              106  191 "January 2022"    2508 291 285
              118 1623 "January 2022"   51833 207 210
              107 2330 "January 2022"   74227 323 310
              502 1012 "January 2022"   86735 590 600
              506 2530 "January 2022"   81870 509 510
              113 3311 "January 2022"  105130 246 270
              105 2419 "January 2022"   78207 270 240
              135 2248 "January 2022"   64820 286 300
              122 2197 "January 2022"   84414 202 230
              175    3 "January 2022"     453 248 210
              117   34 "January 2022"     777 520 480
              300 2232 "January 2022"   58633 160 165
              146   18 "January 2022"    6382 471 490
              510    2 "January 2022"    4261 452 450
              136   50 "January 2022"     265 323 300
              103    4 "January 2022"   20402 404 400
              134    4 "January 2022"     238 323 330
              110  754 "January 2022"   18088 226 240
              168    7 "January 2022"     372 186 300
              115 2227 "January 2022"   77793 207 220
              183 2941 "January 2022"   82723 465 420
              116    3 "January 2022"    2353 286 300
              109   32 "January 2022"     569 172 180
              123 2822 "January 2022"   90892 212 245
              151    8 "January 2022"    9720 181 210
              131 1845 "January 2022"   56519 157 165
              125    1 "January 2022"     235 177 195
              145  879 "January 2022"   39787 200 225
              174  337 "January 2022"    8466 227 230
              139 2843 "January 2022"   94037 172 180
              141 2517 "January 2022"   77068 232 270
              124  111 "January 2022"   10299 181 210
              509 1616 "January 2022"   71740 446 405
              508 1848 "January 2022"   75901 333 360
              518 2953 "January 2022"   80214 594 530
              194 3018 "January 2022"   81208 605 570
              219    9 "January 2022"     425 554 360
              149 3540 "January 2022"   95732 485 435
              173  117 "January 2022"   25727 160 180
              182    7 "January 2022"     540 288 270
              507  495 "January 2022"   36231 547 495
              162 3189 "January 2022"   82181 397 450
              167   20 "January 2022"     420 323 300
              153    3 "January 2022"     214 171 195
              192  468 "January 2022"   13568 232 270
              169   61 "January 2022"    6882 202 240
              102 3895 "January 2022"  107070 488 510
              501 1092 "January 2022"   82224 487 445
              143 1443 "January 2022"   84410 465 420
              321    1 "January 2022"     130 160 165
              119  185 "January 2022"    1620 172 180
              100 3146 "January 2022"   88500 370 280
              104 2220 "January 2022"   63134 563 510
              517 1572 "January 2022"   88206 606 660
              179 3183 "January 2022"  106033 696 660
              176   89 "January 2022"    1611 214 240
              504 1105 "January 2022"   69920 551 530
              108 3155 "January 2022"   91167 246 270
              164 2012 "January 2022"   74352 246 250
              209 4110 "January 2022"   88627 502 465
              114 2013 "January 2022"   80847 232 270
              206 2018 "January 2022"   62786 177 195
              182    6 "February 2022"    540 288 270
              149 3443 "February 2022"  95732 485 435
              168    1 "February 2022"    372 186 300
              119   39 "February 2022"   1620 172 180
              518 2880 "February 2022"  80214 594 530
              162 2502 "February 2022"  82181 397 450
              209 3696 "February 2022"  88627 502 465
              124   58 "February 2022"  10299 181 210
              507  599 "February 2022"  36231 547 495
              135 2382 "February 2022"  64820 286 300
              113 3634 "February 2022" 105130 246 270
              502 1407 "February 2022"  86735 590 600
              106    5 "February 2022"   2508 291 285
              134    6 "February 2022"    238 323 330
              504 1529 "February 2022"  69920 551 530
              175   21 "February 2022"    453 248 210
              176    6 "February 2022"   1611 214 240
              138  581 "February 2022"  28700 172 180
              153    4 "February 2022"    214 171 195
              189 3050 "February 2022"  84738 551 525
              132    1 "February 2022"  21312 149 160
              503 1320 "February 2022"  70957 563 570
              151   23 "February 2022"   9720 181 210
              194 2414 "February 2022"  81208 605 570
              108 2894 "February 2022"  91167 246 270
              167   18 "February 2022"    420 323 300
              164 2111 "February 2022"  74352 246 250
              end
              As you can see, I do have the "Mois" variable, which is basically the months. I've already studied the distribution of the demand by month and I saw that there was some seasonality in it, so what I want to do is give a season code (1, 2, 3 or 4) to each monthly period, according to the following:
              Season 1: Starting from the first of February and ending in the last of March
              Season 2: Starting in April and ending in last May
              Season 3: Starting in June and ending in the last of September
              Season 4: Starting inOctober and ending in last of December

              Is there a suitable code so I could create that "Season" variable for both 2022 and 2023?

              With thanks!

              Comment


              • #8
                No, you made it much worse. Do not use strings for dates. Please make it easy on yourself and go back to Stata dates.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Maarten Buis Ok, Thanks ! I was thinking that I could use that variable as a categorical variable. Thanks for the help!

                  Comment


                  • #10
                    No, strings don't function as categorical variables in Stata.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      @Maarten Buis I understand, Thanks for the advice!

                      Could it be possible to ask if it is possible to have a command that gives the total of demand (for the two years 2022 and 2023) for each "Code_Ligne". I've tried to do so but I could not get the right command, I guess my data needs a reshape but I don't know about that. It would be helpful if you could just give me a further idea on that one. As you can see from the data, the "Code_Ligne" values are repeated with each observation, tho I want to have the total demand for each "Code_Ligne" as the total of the Monthly demand for 2022 and 2023 together.

                      Could you help me with that, please? It will be even better to have the result in another independant database.

                      With many thanks!
                      ​​​​​​​

                      Comment


                      • #12
                        extract the year from Mois and than use collapse
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment

                        Working...
                        X