Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • extracting the first digit of a numeric variable

    To extract a portion of a string, we can use the - substr - command. My variable is numeric, so this command doesn't work. So I used the - tostring - command in order to use the - substr - command afterwards. I did as below and I don't know why I got the r(109) error.

    tostring cod_proc, replace
    replace gp_proc = substr(cod_proc,1,1)
    destring cod_proc gp_proc, replace

    But, anyway, it would be better for me to use a command for numeric variable once my dataset is big and it takes a long time to "tostring" the cod_proc variable. Any suggestions?

  • #2
    Hi Paula,

    I don't know if there is a way to do this in a single line, but whenever I need to string something I always use

    Code:
     gen temp_var1 = string(cod_proc,"%XX.0g")
    where XX needs to be replaced with some length for your string (since you only want the first digit, keep it short). So, you could do something along the lines of:

    Code:
     gen temp_var1 = string(cod_proc,"%5.0g")
    replace gp_proc = substr(temp_var1,1,1)
    destring gp_proc, replace
    Hope this helps,

    Josh

    Comment


    • #3
      Try this:

      gen gp_proc = floor(cod_proc/(10^floor(log10(cod_proc))))

      In case it's not obvious,
      floor(log10(cod_proc)) = digits of cod_proc - 1


      Last edited by Liam Clegg; 08 Jun 2015, 01:14.

      Comment


      • #4
        As Liam's answer also shows, there is often no need to create intermediate new variables here. Functions can be called within functions so you can do things like this

        Code:
         
        gen first_digit = real(substr(string(cod_proc, "%5.0g"), 1, 1))
        with adjustments needed if values are negative. (That's an extra call needed to abs().)

        The situation necessarily resembles elementary algebra: what's on the inside is evaluated first and you need to be careful with parentheses.

        Note that substr() is a function, not a command.

        Comment


        • #5
          Hi Liam and Nick,
          Thanks for your suggestion!
          A small problem is when the number is 0.01 or 0.001 or 0.0001, the first digit shown by the code above is 0 rather than 1, right? We need to add one back.

          Sorry about my name, I am contacting the administration to change it.

          Claire

          Comment


          • #6
            Paper Paper: I think you changed the problem to first non-zero digit. Naturally it's a convention that 123 is not written 0123, and a convention that 0.000123 is so written. Consider something like

            Code:
            . di real(substr(string(1e-4, "%5.0e"), 1, 1))
            1
            
            . di real(substr(string(2e-5, "%5.0e"), 1, 1))
            2
            
            . di real(substr(string(abs(-0.000003), "%5.0e"), 1, 1))
            3

            Comment


            • #7
              Hi guys, my case is I have a birth variable, which records the birth year, month and day of the observation. This birth variable is a numeric one, like "20240210", mens that the birth day of the observation is the 10th of Feb in 2024. The current question of mine is that I just want the year information, like I just want create a variable that records the year only. In this case, how can I do this in Stata? Many thanks!

              Comment


              • #8
                If you do not want to use Stata's date functions:
                Code:
                . input baddate
                
                       baddate
                  1.  20240210
                  2. end
                
                . format baddate %11.0g
                
                .
                . gen year = int(baddate/10000)
                
                . list, noob
                
                  +-----------------+
                  |  baddate   year |
                  |-----------------|
                  | 20240210   2024 |
                  +-----------------+
                If you want to know why I named the birth variable "baddate", see Stata tip 130 by Nick Cox: http://www.stata-journal.com/article...article=dm0096. Using Stata's date functions, an alternative would be to transform "baddate" into the date variable "birthday" and subsequently use the date function year:
                Code:
                clear
                
                . input baddate
                
                       baddate
                  1.  20240210
                  2. end
                
                . format baddate %11.0g
                
                .
                . gen birthdate = date(string(baddate, "%11.0g"), "YMD")
                
                . format birthdate %td
                
                .
                . gen year = year(birthdate)
                
                . list, noob
                
                  +-----------------------------+
                  |  baddate   birthdate   year |
                  |-----------------------------|
                  | 20240210   10feb2024   2024 |
                  +-----------------------------+
                Last edited by Dirk Enzmann; 25 Feb 2024, 04:52.

                Comment

                Working...
                X