Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove part of value label

    Dear all, I have an issue with the value labels of one of my variables.

    The value labels appear as:

    Code:
     9259 Elementary sales occupations nec.
    Where 9259 is the value of that category which is repeated in the label. I would simply like to remove that part and only have the last non-numeric bit.

    I'm sure that this is fairly easy and that something like labvalch3 should do the trick, but can't figure out the exact syntax. Any help is appreciated.

  • #2
    Here is some way, although I am sure you can do it more efficiently with elabel from the Stata Journal, by daniel klein.

    Code:
    sysuse auto, clear
    *ADD LEADING DIGITS
    lab def origin 0 "00 Domestic", modify
    lab def origin 1 "11 Foreign", modify
    lab list
    *START HERE
    levelsof foreign, local(levels)
    foreach l of local levels{
        lab def `:val lab foreign' `l' "`=ustrregexra("`:lab (foreign) `l''", "(^\d+)\s+(\w+)", "$2")'", modify
    }
    lab list
    I highlight in red only what you need to change viz. the variable name.

    Res.:

    Code:
    . 
    . lab list
    origin:
               0 00 Domestic
               1 11 Foreign
    
    . 
    . *START HERE
    
    . 
    . levelsof foreign, local(levels)
    0 1
    
    . 
    . foreach l of local levels{
      2. 
    .     lab def `:val lab foreign' `l' "`=ustrregexra("`:lab (foreign) `l''", "(^\d+)\s+(\w+)", "$2")'", modify
      3. 
    . }
    
    . 
    . lab list
    origin:
               0 Domestic
               1 Foreign

    Comment


    • #3
      Let's just note that
      Code:
      numlabel
      can be used here.


      Code:
      . sysuse auto, clear
      (1978 automobile data)
      
      . label def foreign 0 "0 domestic" 1 "1 foreign"
      
      . label list
      foreign:
                 0 0 domestic
                 1 1 foreign
      origin:
                 0 Domestic
                 1 Foreign
      
      . numlabel foreign, remove mask(#)
      
      . label list
      foreign:
                 0  domestic
                 1  foreign
      origin:
                 0 Domestic
                 1 Foreign

      Comment


      • #4
        Dear Andrew and Nick, thanks a lot for the replies. The solution in #3 works perfectly and as I supposed it was fairly easy. Thanks also to Andrew, I tried your loop. It was working perfectly in the sysuse auto dataset, but in mine, it was not. I suspect that the issue (which I did not mention in my first message) was that I have also two labels containing negative numbers (for "Does not apply" and "No answer").

        Comment


        • #5
          elabel (SSC) does not have a specific hook for this as numlabel does. Note that you probably want

          Code:
          numlabel ... , remove mask("# ")
          to avoid ending up with leading spaces in value labels as in #3


          The suggested solution in #2 boils down to

          Code:
          elabel define * (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
          Note that elabel always applies to all labels (i.e., text) and does not rely on all values being observed in the data, as the loop following levelsof does.


          Edit: I just re-read the original question, which refers to the value label of one variable. elabel has an easy way of referring to value labels attached to variables:

          Code:
          elabel define (varname) (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
          will change the value label attached to varname.
          Last edited by daniel klein; 21 Oct 2021, 10:50.

          Comment


          • #6
            Thanks also to Andrew, I tried your loop. It was working perfectly in the sysuse auto dataset, but in mine, it was not. I suspect that the issue (which I did not mention in my first message) was that I have also two labels containing negative numbers (for "Does not apply" and "No answer").
            You can incorporate the negative sign into the regular expression. daniel klein's point about looping across all levels of the variable does not bite here as the regex changes nothing if no label is present or if there is no match (although you should add capture in front of the command), but elabel is efficient in that it avoids this.

            Code:
            sysuse auto, clear
            *ADD LEADING DIGITS
            lab def origin 0 "-00 Domestic", modify
            lab def origin 1 "11 Foreign", modify
            lab list
            *START HERE
            levelsof foreign, local(levels)
            foreach l of local levels{
                cap lab def `:val lab foreign' `l' "`=ustrregexra("`:lab (foreign) `l''", "(^[\-]?\d+)\s+(\w+)", "$2")'", modify
            }
            lab list
            Res.:

            Code:
            .
            . lab list
            origin:
                       0 -00 Domestic
                       1 11 Foreign
            
            .
            . *START HERE
            
            .
            . levelsof foreign, local(levels)
            0 1
            
            .
            . foreach l of local levels{
              2.
            .     lab def `:val lab foreign' `l' "`=ustrregexra("`:lab (foreign) `l''", "(^[\-]?\d+)\s+(\w+)", "$2")'", modify
              3.
            . }
            
            .
            . lab list
            origin:
                       0 Domestic
                       1 Foreign

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              daniel klein's point about looping across all levels of the variable does not bite here as the regex changes nothing if no label is present or if there is no match
              My concern is more so with values that have labels attached but do not appear in the data. Try running your commands on the subset of foreign cars for an illustration of my point:

              Code:
              . lab list
              origin:
                         0 00 Domestic
                         1 11 Foreign
              
              . *START HERE
              . keep if foreign == 1
              (52 observations deleted)
              
              . levelsof foreign, local(levels)
              1
              
              . foreach l of local levels{
                2.     lab def `:val lab foreign' `l' "`=ustrregexra("`:lab (foreign) `l''", "(^[\-]?\d+)\s+(\w+)", "$2")'", modify
                3. }
              
              . lab list
              origin:
                         0 00 Domestic
                         1 Foreign

              Comment


              • #8
                Yes, I see what you mean. Thanks.

                Comment


                • #9
                  Hi, can this be applied to remove the numbers and leading space from the variable categories?

                  1 NOT AT ALL 2 SEVERAL DAYS 3 MORE THAN HALF THE DAYS 4 NEARLY EVERY DAY 5 REFUSED 6 DON'T KNOW.

                  If possible, I'd like to loop over all the variables whose levels contain numbers.

                  I couldn't use 3# , #5 and #7 to solve this.

                  Comment


                  • #10
                    Data example, please!

                    Comment


                    • #11
                      Originally posted by daniel klein View Post
                      Data example, please!
                      Thaks Daniel! Here is the dataex:


                      . dataex

                      ----------------------- copy starting from the next line -----------------------
                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input double(age gender srh)
                      2 2 3
                      6 2 3
                      2 1 5
                      6 2 2
                      4 1 4
                      1 2 4
                      3 1 3
                      2 2 3
                      1 2 3
                      6 2 3
                      5 2 3
                      3 1 2
                      4 2 4
                      5 1 3
                      2 2 3
                      4 1 4
                      3 2 4
                      4 1 3
                      2 1 1
                      4 1 2
                      1 1 5
                      1 1 3
                      4 2 3
                      3 1 1
                      3 2 2
                      3 2 4
                      6 1 1
                      3 1 2
                      1 1 5
                      3 1 3
                      3 2 3
                      3 2 2
                      3 2 3
                      6 1 4
                      5 2 2
                      5 2 5
                      3 2 4
                      6 2 1
                      2 1 4
                      6 2 5
                      3 1 3
                      3 1 4
                      3 2 1
                      2 1 4
                      3 1 4
                      2 2 2
                      3 1 2
                      1 2 4
                      5 1 3
                      2 2 2
                      4 1 5
                      1 2 2
                      6 1 1
                      5 2 3
                      4 1 3
                      4 1 2
                      4 2 3
                      1 1 2
                      5 1 2
                      6 1 4
                      5 2 5
                      4 1 1
                      5 2 2
                      2 2 1
                      5 2 4
                      2 1 4
                      4 1 2
                      4 2 3
                      6 2 4
                      1 1 4
                      4 1 2
                      1 2 4
                      4 2 3
                      5 2 2
                      3 1 4
                      6 2 4
                      1 1 3
                      4 2 5
                      4 1 4
                      3 2 2
                      6 1 4
                      4 2 4
                      3 1 2
                      4 1 3
                      4 1 1
                      2 1 2
                      1 1 3
                      3 2 3
                      5 1 4
                      5 1 4
                      6 1 2
                      2 2 5
                      4 1 3
                      1 1 2
                      5 2 2
                      5 2 4
                      4 2 3
                      1 1 3
                      4 1 3
                      3 2 3
                      end
                      label values age r12d2intvrage
                      label def r12d2intvrage 1 "1 65 To 69", modify
                      label def r12d2intvrage 2 "2 70 To 74", modify
                      label def r12d2intvrage 3 "3 75 To 79", modify
                      label def r12d2intvrage 4 "4 80 To 84", modify
                      label def r12d2intvrage 5 "5 85 To 89", modify
                      label def r12d2intvrage 6 "6 90+", modify
                      label values gender r12dgender
                      label def r12dgender 1 "1 Male", modify
                      label def r12dgender 2 "2 Female", modify
                      label values srh hc12health
                      label def hc12health 1 "1 Excellent", modify
                      label def hc12health 2 "2 Very Good", modify
                      label def hc12health 3 "3 Good", modify
                      label def hc12health 4 "4 Fair", modify
                      label def hc12health 5 "5 Poor", modify
                      ------------------ copy up to and including the previous line ------------------

                      Listed 100 out of 6327 observations

                      Comment


                      • #12
                        This (from #5) works for me:

                        Code:
                        foreach var of varlist *{
                            elabel define (`var') (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
                        }
                        Res.:

                        Code:
                        . lab list
                        hc12health:
                                   1 Excellent
                                   2 Very Good
                                   3 Good
                                   4 Fair
                                   5 Poor
                        r12dgender:
                                   1 Male
                                   2 Female
                        r12d2intvrage:
                                   1 65 To 69
                                   2 70 To 74
                                   3 75 To 79
                                   4 80 To 84
                                   5 85 To 89
                                   6 90+
                        What did you try and what didn't work?

                        Comment


                        • #13
                          The loop in #12 is even implemented in elabel:

                          Code:
                          elabel define (*) (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
                          should be equivalent and, no surprise, works for me, too.

                          I am not sure why you want to do this for certain variables (and only in the current label language). I would typically prefer

                          Code:
                          elabel define * (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify

                          Anyways, perhaps the data example does not represent the problem(s)?

                          Comment


                          • #14
                            Originally posted by Andrew Musau View Post
                            This (from #5) works for me:

                            Code:
                            foreach var of varlist *{
                            elabel define (`var') (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
                            }
                            Res.:

                            Code:
                            . lab list
                            hc12health:
                            1 Excellent
                            2 Very Good
                            3 Good
                            4 Fair
                            5 Poor
                            r12dgender:
                            1 Male
                            2 Female
                            r12d2intvrage:
                            1 65 To 69
                            2 70 To 74
                            3 75 To 79
                            4 80 To 84
                            5 85 To 89
                            6 90+
                            What did you try and what didn't work?
                            Thank you Andrew. Perhaps the dataex is free from the issue the main data is suffering from (I don't know what it is):

                            Code:
                            . foreach var of varlist race{
                            2.     elabel define (`var') (= #) (=    ustrregexra(@,    "(^\d+)\s    (\w    )", "$2"))    ,    modify
                            3. }
                            
                            . ta race
                            
                            r12 d race and hispanic ethnicity when    
                            added    Freq.    Percent        Cum.
                                            
                            1 White, non-hispanic    4,024    63.60        63.60
                            2 Black, non-hispanic    1,309    20.69        84.29
                            3 Other (Am Indian/Asian/Native Hawaii    265    4.19        88.48
                            4 Hispanic    729    11.52        100.00
                                            
                            Total    6,327    100.00

                            Comment


                            • #15
                              Originally posted by daniel klein View Post
                              The loop in #12 is even implemented in elabel:

                              Code:
                              elabel define (*) (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify
                              should be equivalent and, no surprise, works for me, too.

                              I am not sure why you want to do this for certain variables (and only in the current label language). I would typically prefer

                              Code:
                              elabel define * (= #) (= ustrregexra(@, "(^\d+)\s+(\w+)", "$2")) , modify

                              Anyways, perhaps the data example does not represent the problem(s)?
                              Hi Daniel. Yes, it looks like the elabel method works perfectly on the dataex but not on the main dataset (about 40% of the variables remain unaffected perhaps because of the missing values).

                              Thanks so much for the help!

                              Comment

                              Working...
                              X