Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting the last four digits of a numeric variable

    Hello,

    I am using Stata 17 and have ran into a data problem. I have a day-month-year variable which is inconsistently inputted: some dates have a '0' in from of the combination (01012021 for January 1, 2021) and some do not (1012021 for January 1, 2021). As such, I am not confident in using the day and month parts of the date variable, nor do I need them. I am wondering how I can extract the last four digits of this variable (i.e. the year)? I tried to experiment by changing the format (by going to data drop down menu, variable manager, create) to do this, but it does not work (presumably because of the inconsistency in how many numbers the date variable provides across observations). I would like to end up with a four digit year variable only. If anyone has any ideas or suggestions, they are most welcome.

    Thank you!

  • #2
    Try this
    Code:
    destring date, gen(datestring)
    gen fourdigits = real(substr(datestring, -4, 4)

    Comment


    • #3
      Sorry destring should be tostring. I assumed your date variable wasn't already formatted as a string but if it is then you can lose that part of the code.

      Comment


      • #4
        Thank you, Tom everything worked great!

        Comment


        • #5
          Also,
          Code:
          generate year = mod(date,10000)
          Code:
          . display mod(1012021,10000)
          2021
          
          . display mod(01012021,10000)
          2021

          Comment


          • #6
            Hello again,

            I have a similar issue with a different variable. I have a source file variable that includes the year, the value is abc2000.dta and it is a string variable.

            I again want to extract the year, but any combination of numbers in the substrate formula above just gives me missing values.

            For example, I use the command

            gen year2= real(substr(year, -4, 4)) gives me missing values.
            gen year2= real(substr(year, 4, 7)) gives me missing values.

            I've tried several combinations and it doesn't seem to work..is it because the string variable contains a period?

            Comment


            • #7
              #6 Data example please

              Comment


              • #8
                Do you mean that "abc2000.dta" is a value in a string variable from which you wish to extract the year?

                If so, then rather than trial and error you should refer to the output of.
                Code:
                help substr()
                to learn how to specify the starting position and length arguments.
                Code:
                . * Example generated by -dataex-. For more info, type help dataex
                . clear
                
                . input str11 var1
                
                            var1
                  1. "abc2000.dta"
                  2. end
                
                . generate s1 = substr(var1,4,4)
                
                . generate s2 = substr(var1,-8,4)
                
                . generate y1 = real(substr(var1,4,4))
                
                . generate y2 = real(substr(var1,-8,4))
                
                . list, clean
                
                              var1     s1     s2     y1     y2  
                  1.   abc2000.dta   2000   2000   2000   2000  
                
                .
                Last edited by William Lisowski; 19 Oct 2021, 15:59.

                Comment


                • #9
                  Hi William,

                  Yes, the value of the variable itself is exactly how you understood it. I tried your code above and it worked perfectly! Thank you so much for your help.

                  Elena

                  Comment

                  Working...
                  X