Help with splitting up dates and times, and categorizing date ranges.

Sneha Thayyil

Join Date: Jun 2020
Posts: 10

Help with splitting up dates and times, and categorizing date ranges.

12 Apr 2024, 23:59

Greetings,

I am using Stata 18. I have the following data, on the date on which bank branches have opened.
While the below image is generated by 'dataex', the column values are programmed to look like dates & times, for example -
26mar1992 00:00:00

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double rbi_date_of_open
1.0171872e+12
 6.899904e+11
1.6799616e+12
1.6940448e+12
1.6454016e+12
1.6952544e+12
1.4926464e+12
1.2293856e+12
 1.680048e+12
 7.140096e+11
 7.572096e+11
 6.148224e+11
 7.221312e+11
1.5619392e+12
1.6802208e+12
1.4956704e+12
  3.62016e+11
1.8461952e+12
-1.722816e+12
 6.810912e+11
 3.332448e+11
1.2343968e+12
 6.244128e+11
 4.375296e+11
1.8120672e+12
1.3175136e+12
1.2331872e+12
 1.869696e+12
1.7116704e+12
1.0140768e+12
 4.472064e+11
1.8539712e+12
 8.672832e+11
 3.901824e+11
 1.869696e+12
 1.673136e+12
 1.672272e+12
1.4573088e+12
1.6592256e+12
 7.855488e+11
 1.385856e+12
 7.048512e+11
 7.139232e+11
1.3175136e+12
 5.262624e+11
 6.785856e+11
1.4700096e+12
 7.197984e+11
 1.167696e+12
1.3709952e+12
 9.113472e+11
  4.17744e+11
1.4979168e+12
1.6720992e+12
 1.772064e+12
 1.404432e+12
1.7117568e+12
  1.74312e+12
1.6699392e+12
1.7198784e+12
 4.166208e+11
 4.996512e+11
-5.146848e+11
1.4926464e+12
  6.51888e+11
 4.562784e+11
  5.17536e+11
1.6564608e+12
 6.677856e+11
1.3175136e+12
1.8513792e+12
 7.363872e+11
1.3281408e+12
 3.147552e+11
 5.363712e+11
  4.63104e+10
-1.335744e+11
1.3175136e+12
 3.619296e+11
    1.512e+12
 1.769904e+12
1.4217984e+12
1.6415136e+12
1.6903296e+12
 1.291248e+12
1.6564608e+12
  5.61168e+11
1.4948064e+12
1.7592768e+12
 6.701184e+11
  3.40416e+11
 1.497744e+12
1.6172352e+12
  4.87728e+11
 5.038848e+11
1.4979168e+12
1.6401312e+12
  6.57072e+11
1.4698368e+12
1.5402528e+12
end
format %tc rbi_date_of_open

I request help to -
(i) Carve out just the date of opening, as I do not need the time
(ii) Generate a 'branchopen' variable that takes the value 2013 for all branches opened during or before 2013, and thereafter 2014, 2015 etc. for branches that have opened during or before 2014, 2015 etc.

Thank you,
Sneha Thayyil

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

13 Apr 2024, 09:31

(i) is straightforward:

Code:

gen opening_date = dofc(rbi_date_of_open) format opening_date %td

(ii) is unclear. If you want a separate variable for each year:

Code:

gen opening_year = year(opening_date) forvalues y = 2013/2024 { gen byte branchopen_`y' = (opening_year <= `y') replace branchopen_`y' = `y'*branchopen_`y' }

But if what you want is a single variable, then I don't understand your meaning. After all, if a branch is opened during or before 2014, it is also opened before 2015, 2016, ... So which value would you want it to take? Perhaps what you really want is a single variable giving the year in which the branch opened. That is the variable opening_year created in the code above.

Finally, I can't think of a good reason for a series of branchopen_* variables to take on the values 2013, 2014, etc.. since, by your description, they are intended to show whether or not a branch is open in a given year. That kind of yes-no variable is most commonly, and most effectively, represented by a 0 (no) vs 1 (yes) variable. So I would eliminate the-replace- command and just leave the original result.
Comment
Sneha Thayyil

Join Date: Jun 2020

Posts: 10
#3

13 Apr 2024, 20:24

Dear Clyde,

Firstly, question (i) is solved. Thank you so much, this forum and your replies are a godsend.
Secondly, in response to question (ii) - the objective is to examine the maximum number of branches in a location in a given year. Hence, I was generating the cumulative frequency, by location, per year - within a single variable.

Thank you once again,
Sneha Thayyil

Last edited by Sneha Thayyil; 13 Apr 2024, 20:40.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

13 Apr 2024, 21:49

the objective is to examine the maximum number of branches in a location in a given year. Hence, I was generating the cumulative frequency, by location, per year - within a single variable

Well, the example data doesn't even have a location variable. But ignoring that, your data organization doesn't make that possible. There is no sensible way to create that variable in that dataset. What you need to do is create a different data set to contain that:

Code:

gen opening_year = year(dofc(rbi_date_of_open)) frame create wanted int year int branches_open gen byte open_this_year = . forvalues y = 2013/2024 { replace open_this_year = (opening_year <= `y') summ open_this_year, meanonly frame post wanted (`y') (`r(sum)') } drop open_this_year frame change wanted list

This will need to be modified to work by location as well.

Note, too, that the logic only works if we assume that no branch, once open, ever closes.
Comment
Sneha Thayyil

Join Date: Jun 2020

Posts: 10
#5

20 Jun 2024, 23:14

Thank you, Clyde. This worked exactly.
Comment

Announcement

Help with splitting up dates and times, and categorizing date ranges.

Comment

Comment

Comment

Comment