Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating negative decimal numbers from hyphens in string variables

    Hello,

    I have a variable that captures data on various measures of accuracy, including standard errors and confidence intervals. The variable includes negative numbers on their own, and for confidence intervals, negative numbers are sometimes paired with hyphens to separate lower and upper confidence intervals. I've been unable to figure out a way to create a variable that removes the hyphens without removing the negative sign. Splitting the variable and ignoring the hyphen won't work. I also found this code on a website to tag negative values, but found that this doesn't work either.

    gen tagonenegative = regexm(moavalue_c11, "^-[.0-9])")

    Here are some values from the variable as an example (I'm including only a few) :

    moavalue_c11
    -.14
    -0.12-0.14
    .00345
    0.09–0.98
    0.687–1.012
    -0.270

    Any help with this would be greatly appreciated!

    Thanks,
    Fatima

  • #2
    Well, in the example data you show, with one possible exception, the minus signs are represented as ASCII character 45 (minus/hyphen), whereas the connectors between numbers are represented as en-dashes (Unicode \u2013). So
    Code:
    replace moavalue_c11 = subinstr(moavalue_c11, ustrunescape("\u2013"), " ", .)
    will replace the en-dashes with spaces and leave the minus/hyphens alone.

    But there is one problematic observation in your example, "-0.12-0.14" It isn't clear to me what this one is supposed to be. The - between the 2 and 0 is a minus/hyphen character, which leaves the string appearing to be a concatenation of two negative numbers, with no connector dash between them. So what is the intended understanding of this value? Is that - between the 2 and 0 supposed to be a connector dash, like specifying a range between -0.12 and +0.14 (but without a + sign)?

    Comment


    • #3
      Thank you, Clyde! I appreciate it. Yes, there are a few cases where there is a negative number with a positive number separated by a hyphen, such as -0.12 and +0.14, but it appears as though the hyphen is a negative sign. Would you suggest manually changing these for the connector dash to be more clear? There are a handful of such values so it's possible to do that, but open to another suggestion as well.

      Comment


      • #4
        It is generally not a good idea to do things by hand unless there is simply no alternative. In this case, it is easily handled because a minus sign (-) that occurs between two digits is a mistyped en dash and should be replaced by a space for your purposes. So the complete code is:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str13 moavalue_c11
        "-.14"         
        "-0.12-0.14"   
        ".00345"       
        "0.09–0.98"  
        "0.687–1.012"
        "-0.270 "      
        end
        
        replace moavalue_c11 = subinstr(moavalue_c11, ustrunescape("\u2013"), " ", .)
        
        replace moavalue_c11 = ustrregexra(moavalue_c11, "([\d])-([\d])", "\1 \2")

        Comment


        • #5
          Noted! That worked perfectly, thank you so much!

          Comment

          Working...
          X