Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variables for day of the week and month of the year

    Hi, I am trying to deseasonalise my data and want to create dummy variables for the days of the week and all the months. My data is numerical so how do I create weekday variables when I do not know which data corresponds to which day?

  • #2
    Assuming that you have daily dates in a variable whose name you don't give, this script shows you some technique:

    Code:
    . clear 
    
    . set obs 365 
    Number of observations (_N) was 0, now 365.
    
    . gen ddate = mdy(12,31,2021) + _n
    
    . 
    . gen dow = dow(ddate)
    
    . gen month = month(ddate)
    
    . 
    . tab month, gen(month) 
    
          month |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         31        8.49        8.49
              2 |         28        7.67       16.16
              3 |         31        8.49       24.66
              4 |         30        8.22       32.88
              5 |         31        8.49       41.37
              6 |         30        8.22       49.59
              7 |         31        8.49       58.08
              8 |         31        8.49       66.58
              9 |         30        8.22       74.79
             10 |         31        8.49       83.29
             11 |         30        8.22       91.51
             12 |         31        8.49      100.00
    ------------+-----------------------------------
          Total |        365      100.00
    
    . 
    . d 
    
    Contains data
     Observations:           365                  
        Variables:            15                  
    -----------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    -----------------------------------------------------------------------------------------
    ddate           float   %9.0g                 
    dow             float   %9.0g                 
    month           float   %9.0g                 
    month1          byte    %8.0g                 month== 1.0000
    month2          byte    %8.0g                 month== 2.0000
    month3          byte    %8.0g                 month== 3.0000
    month4          byte    %8.0g                 month== 4.0000
    month5          byte    %8.0g                 month== 5.0000
    month6          byte    %8.0g                 month== 6.0000
    month7          byte    %8.0g                 month== 7.0000
    month8          byte    %8.0g                 month== 8.0000
    month9          byte    %8.0g                 month== 9.0000
    month10         byte    %8.0g                 month== 10.0000
    month11         byte    %8.0g                 month== 11.0000
    month12         byte    %8.0g                 month== 12.0000
    -----------------------------------------------------------------------------------------
    For most modelling purposes you don't need indicator variables but can just use factor variable notation, so new variables like dow and month are the only crucial steps.

    At some point, you can waste more time than you save by not reading the documentation, here


    Code:
    help datetime 
    to explain basics about dates of various kinds. It is a good idea still to skim and skip what you don't need right now, nevertheless noting that there are details you can come back to.

    Comment


    • #3
      Thank you so much Nick!

      Comment


      • #4
        Just a question, I have over 1 million observations as this is time series data- setting the obs as 365 does not work for some reason, is there a way to fix this?

        Comment


        • #5
          #3 Glad it helped.

          #4 Correct; that does not work for you.

          You didn't give an example in #1 (compare our request at https://www.statalist.org/forums/help#stata) -- so I created one myself with 365 observations, although the point could have been made differently too.

          But set obs 365 is both unnecessary and wrong in your case:

          1. unnecessary, because you already have data in memory and don't need to create a data example.

          2. wrong, because Stata will only allow you to use set obs to make a dataset bigger (including the very useful case of bigger than 0 observations) and set obs 365 could only imply a contraction of your dataset.

          Typing
          Code:
          help set obs


          would have led you to an explanation. Please see also https://www.statalist.org/forums/help#before

          Comment

          Working...
          X