Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with splitting up dates and times, and categorizing date ranges.

    Greetings,

    I am using Stata 18. I have the following data, on the date on which bank branches have opened.
    While the below image is generated by 'dataex', the column values are programmed to look like dates & times, for example -
    26mar1992 00:00:00

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double rbi_date_of_open
    1.0171872e+12
     6.899904e+11
    1.6799616e+12
    1.6940448e+12
    1.6454016e+12
    1.6952544e+12
    1.4926464e+12
    1.2293856e+12
     1.680048e+12
     7.140096e+11
     7.572096e+11
     6.148224e+11
     7.221312e+11
    1.5619392e+12
    1.6802208e+12
    1.4956704e+12
      3.62016e+11
    1.8461952e+12
    -1.722816e+12
     6.810912e+11
     3.332448e+11
    1.2343968e+12
     6.244128e+11
     4.375296e+11
    1.8120672e+12
    1.3175136e+12
    1.2331872e+12
     1.869696e+12
    1.7116704e+12
    1.0140768e+12
     4.472064e+11
    1.8539712e+12
     8.672832e+11
     3.901824e+11
     1.869696e+12
     1.673136e+12
     1.672272e+12
    1.4573088e+12
    1.6592256e+12
     7.855488e+11
     1.385856e+12
     7.048512e+11
     7.139232e+11
    1.3175136e+12
     5.262624e+11
     6.785856e+11
    1.4700096e+12
     7.197984e+11
     1.167696e+12
    1.3709952e+12
     9.113472e+11
      4.17744e+11
    1.4979168e+12
    1.6720992e+12
     1.772064e+12
     1.404432e+12
    1.7117568e+12
      1.74312e+12
    1.6699392e+12
    1.7198784e+12
     4.166208e+11
     4.996512e+11
    -5.146848e+11
    1.4926464e+12
      6.51888e+11
     4.562784e+11
      5.17536e+11
    1.6564608e+12
     6.677856e+11
    1.3175136e+12
    1.8513792e+12
     7.363872e+11
    1.3281408e+12
     3.147552e+11
     5.363712e+11
      4.63104e+10
    -1.335744e+11
    1.3175136e+12
     3.619296e+11
        1.512e+12
     1.769904e+12
    1.4217984e+12
    1.6415136e+12
    1.6903296e+12
     1.291248e+12
    1.6564608e+12
      5.61168e+11
    1.4948064e+12
    1.7592768e+12
     6.701184e+11
      3.40416e+11
     1.497744e+12
    1.6172352e+12
      4.87728e+11
     5.038848e+11
    1.4979168e+12
    1.6401312e+12
      6.57072e+11
    1.4698368e+12
    1.5402528e+12
    end
    format %tc rbi_date_of_open
    I request help to -
    (i) Carve out just the date of opening, as I do not need the time
    (ii) Generate a 'branchopen' variable that takes the value 2013 for all branches opened during or before 2013, and thereafter 2014, 2015 etc. for branches that have opened during or before 2014, 2015 etc.

    Thank you,
    Sneha Thayyil

  • #2
    (i) is straightforward:
    Code:
    gen opening_date = dofc(rbi_date_of_open)
    format opening_date %td
    (ii) is unclear. If you want a separate variable for each year:
    Code:
    gen opening_year = year(opening_date)
    
    forvalues y = 2013/2024 {
        gen byte branchopen_`y' = (opening_year <= `y')
        replace branchopen_`y' = `y'*branchopen_`y'
    }
    But if what you want is a single variable, then I don't understand your meaning. After all, if a branch is opened during or before 2014, it is also opened before 2015, 2016, ... So which value would you want it to take? Perhaps what you really want is a single variable giving the year in which the branch opened. That is the variable opening_year created in the code above.

    Finally, I can't think of a good reason for a series of branchopen_* variables to take on the values 2013, 2014, etc.. since, by your description, they are intended to show whether or not a branch is open in a given year. That kind of yes-no variable is most commonly, and most effectively, represented by a 0 (no) vs 1 (yes) variable. So I would eliminate the-replace- command and just leave the original result.

    Comment


    • #3
      Dear Clyde,

      Firstly, question (i) is solved. Thank you so much, this forum and your replies are a godsend.
      Secondly, in response to question (ii) - the objective is to examine the maximum number of branches in a location in a given year. Hence, I was generating the cumulative frequency, by location, per year - within a single variable.

      Thank you once again,
      Sneha Thayyil
      Last edited by Sneha Thayyil; 13 Apr 2024, 21:40.

      Comment


      • #4
        the objective is to examine the maximum number of branches in a location in a given year. Hence, I was generating the cumulative frequency, by location, per year - within a single variable
        Well, the example data doesn't even have a location variable. But ignoring that, your data organization doesn't make that possible. There is no sensible way to create that variable in that dataset. What you need to do is create a different data set to contain that:

        Code:
        gen opening_year = year(dofc(rbi_date_of_open))
        
        frame create wanted int year int branches_open
        
        gen byte open_this_year = .
        forvalues y = 2013/2024 {
            replace open_this_year = (opening_year <= `y')
            summ open_this_year, meanonly
            frame post wanted (`y') (`r(sum)')
        }
        drop open_this_year
        
        frame change wanted
        list
        This will need to be modified to work by location as well.

        Note, too, that the logic only works if we assume that no branch, once open, ever closes.

        Comment


        • #5
          Thank you, Clyde. This worked exactly.

          Comment

          Working...
          X