Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Symbols When Substr For Year

    Very simple, may have been answered in another post, I just wanted to get the question out before looking. I need to get my the year from my date variable. For example, with 01-01-1999, I need 1999, then destring it so I can get a number. However, I encounter an issue. Sometimes, when I run -gen year = substr(data_var, -4, 4)- , no matter what date variable, sometimes one of the numbers... its like you hit shift on the keyboard when typing it out, meaning 1994 becomes 199$, or 1991 becomes 199!, and I am not sure why. When I go back to the original date variable, sure enough the date is all numbers, so 199$ was originally 1994 in the original variable, I am not sure why this is doing this.

    Load.... lets say 10mil random dates, I am sure when you substring them to get the year some of them (not a lot, maybe 5 max) will be in the weird format I mentioned.

  • #2
    Very simple, may have been answered in another post, I just wanted to get the question out before looking.
    Personally, I'll wait for you to look first. Sorry.

    Comment


    • #3
      Load.... lets say 10mil random dates, I am sure when you substring them to get the year some of them (not a lot, maybe 5 max) will be in the weird format I mentioned.
      Nope!

      Code:
      . clear*
      
      . set obs 10000000
      Number of observations (_N) was 0, now 10,000,000.
      
      . set seed 1234
      
      . gen date = runiformint(-50000, 50000)
      
      . gen str_date = string(date, "%tdNN-DD-CCYY")
      
      .
      . gen year = substr(str_date, -4, 4)
      
      . count if missing(real(year))
        0
      Every single extracted year comes out entirely numeric.

      Some possibilities occur to me as explanations for what you are getting:
      1. Your data are not what you think they are. The original string date variables may contain some of the non-numeric characters that are then showing up later. Install Robert Picard's -chartab- from SSC and run -chartab str_date- (or whatever the name of your string date variable is) so you can see what's really there.
      2. Your Stata installation is somehow corrupted. There's a three stage ritual to go through for this: first reboot your computer and try again to see if the problem goes away. If that doesn't work, run -update all, force- to make sure that your Stata is fully updated and all your executable and ado files are in synch. If that doesn't work, uninstall Stata, do a fresh install, and then a full update.
      If none of that works, post back with a data example, posted using the -dataex- command, that reproduces your problem. (Simplest way to do that would be to find the observations in your existing data set that exhibit this problem and -keep- just those, and then post that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Comment

      Working...
      X