Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String to numeric fails

    Hi,

    Consider the following dataset:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 grade float id
    "-1.565247655"   1
    "-5.57115221"    2
    "-6.743679047"   3
    "0.7140107751"   4
    "-.5790126324"   5
    "0.4549108744"   6
    "-.5358353257"   7
    "0.8597764373"   8
    "0.5713045597"   9
    "1.07125771"    10
    "-5.094827175"  11
    "0.8716378212"  12
    "NULL"          13
    "-5.167676449"  14
    "-0.8569977283" 15
    "-11.75496864"  16
    "-0.5923922658" 17
    "-0.9578948617" 18
    "-1.828699946"  19
    end
    I want to convert the grade variable into numeric.
    First I recode the "NULL" to ".a".

    Code:
    replace grade=".a" if grade=="NULL"
    destring grade, gen(newvar) force
    But my conversion still fails. And when I use the force option, everything becomes lost.

    grade: contains nonnumeric characters; newvar generated as byte
    (18 missing values generated)


    What seems to be the issue?

  • #2
    Try

    Code:
    list grade if missing(real(grade))
    and see Section 2.7 in https://journals.sagepub.com/doi/pdf...867X1801800413

    As the putative author of destring I want to flag that -- as in the rest of life -- force is the last option and the least desirable. The name is intended to convey that your data may be damaged by what you do and using it should imply that you don't mind, i.e. that the reason destring won't work is that you have garbage in some observations.
    Last edited by Nick Cox; 19 Jan 2024, 10:15.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Try

      Code:
      list grade if missing(real(grade))
      and see Section 2.7 in https://journals.sagepub.com/doi/pdf...867X1801800413

      As the putative author of destring I want to flag that -- as in the rest of life -- force is the last option and the least desirable. The name is intended to convey that your data may be damaged by what you do and using it should imply that you don't mind, i.e. that the reason destring won't work is that you have garbage in some observations.

      Thanks, it shows all of the observations. How can I fix this?

      Comment


      • #4
        using your example, I get:
        Code:
        . ta grade if real(grade)==.
        
                grade |      Freq.     Percent        Cum.
        --------------+-----------------------------------
                 NULL |          1      100.00      100.00
        --------------+-----------------------------------
                Total |          1      100.00
        r; t=0.00 11:19:38
        
        . list grade if missing(real(grade))
        
             +-------+
             | grade |
             |-------|
         13. |  NULL |
             +-------+
        this is quite different from what you say in #4 and thus this can be converted using -destring- with the "ignore("NUL") option

        Comment


        • #5
          This doesn't yet make sense to me. #1 says that all but 18 values of your variable can be regarded as numeric.

          Now #3 says none of them can be. I agree with Rich Goldstein that your data example contradicts that too.

          Comment


          • #6
            Originally posted by Rich Goldstein View Post
            using your example, I get:
            Code:
            . ta grade if real(grade)==.
            
            grade | Freq. Percent Cum.
            --------------+-----------------------------------
            NULL | 1 100.00 100.00
            --------------+-----------------------------------
            Total | 1 100.00
            r; t=0.00 11:19:38
            
            . list grade if missing(real(grade))
            
            +-------+
            | grade |
            |-------|
            13. | NULL |
            +-------+
            this is quite different from what you say in #4 and thus this can be converted using -destring- with the "ignore("NUL") option
            I had to manually re enter the data to post it here because I cannot use dataex on the data server I use, that is probably the reason.
            I just checked the data on the server and the type of the grade variable is str10. Could that be the issue?

            Comment


            • #7

              Being str10 is not an issue itself. The point of destring is to take a string variable and extract numeric contents. But why then is the variable str13 in your data example?

              I wonder if your variable is contaminated by some exotic character that is not visible. charlist or chartab from SSC could help there.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                Being str10 is not an issue itself. The point of destring is to take a string variable and extract numeric contents. But why then is the variable str13 in your data example?

                I wonder if your variable is contaminated by some exotic character that is not visible. charlist or chartab from SSC could help there.
                I get ,-0123456789LNU for charlist. Could it perhaps be the comma?

                Comment


                • #9
                  The comma does indeed need special action. See

                  Code:
                  help destring
                  to learn about the dpcomma option. There weren't any commas in #1, so we could not see this problem.

                  Comment

                  Working...
                  X