Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String function to identify a numeric character anywhere in the string?

    I have string data like:
    1/2/4
    2/3/7
    1/2/3/4/5
    4/2/1
    2

    The numerals represent an affirmative answer to the items on a checklist. A 1 anywhere in the series means the participant answered yes to item 1. A pattern like 2/3/7 means the person affirmed items 2,3, & 7. Etc. So the strings vary in length and a numeral will only appear once in a string, but could be located in any position.

    Is there a string function or some kind of wildcard character that would let me create an indicator variable coded 1 if a "1" appears anywhere in the string? I've not worked much with string variables. I've read through the string function section of the manual and don't see a function that would do that.Thanks.

  • #2
    Code:
    clear
    input str9 var
    "1/2/4"
    "2/3/7"
    "1/2/3/4/5"
    "4/2/1"
    "2"
    end
    
    split var, gen(v) parse("/")
    destring v? , replace
    list
    
    label define yesno 0 "no" 1 "yes"
    
    forvalues i = 1/7 {
        egen d`i' = anymatch(v?), values(`i')
        label values d`i' yesno
        label var d`i' "item `i'"
    }
    list
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks so much Maarten! That works.

      Comment


      • #4
        Here's another approach, which crossed with Maarten's:

        The function -strpos()- is commonly used to detect whether a particular string (a single character in this case) is present in a string. Per the documentation, it returns 0 if the candidate string is not found, otherwise a nonzero integer denoting its position. So, you could detect and record the presence of a "1" by doing:
        Code:
        gen has1 = strpos(YourString, "1") > 0
        Assuming your checklist does not have any two-digit numbers, you could create indicators for all of your possible values by including a -generate- command for each possible value, or by using a loop:
        Code:
        local possible = "1 2 3 4 5 6 7"  // 7 item checklist
        foreach c of local posssible {
           gen has`c' = strpos(YourString, "`c'" > 0)
        }
        If instead you might have response lists with two digit items, like "1/12/13/15", I'd pad the items in your string with blanks, and search for padded strings, thereby avoiding mistakenly finding a "3" that appears only as part of (say) "13."
        Code:
        replace YourString = " " + subinstr(YourString, "/", " ", .) + " "
        local possible = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15"
        foreach c of local possible {
           gen has`c' = strpos(YourString, " `c' ")
        }

        Comment


        • #5
          In Mike Lacy 's helpful post,

          Code:
           
           strpos(YourString, "`c'" > 0)
          should be
          Code:
            
           strpos(YourString, "`c'") > 0

          Comment


          • #6
            Taking just the very narrow question of finding observations with a 1 coded in a string of numbers separated by slashes, where the numbers can be one or more digits, the tool of choice is Stata's unicode regular expression functions.
            Code:
            generate has1 = ustrregexm(var,"\b1\b")
            Code:
            . list, clean noobs
            
                      var   has1  
                    1/2/4      1  
                    2/3/7      0  
                1/2/3/4/5      1  
                    4/2/1      1  
                        2      0  
                    11/12      0  
            
            .
            For a more general solution, I might take the following approach.
            Code:
            generate id = _n
            split var, gen(V) parse("/") destring
            reshape long V, i(id) j(val)
            drop if missing(V)
            drop val
            generate has = 1
            reshape wide has, i(id) j(V)
            mvencode has*, mv(.=0)
            order id var
            list, noobs clean
            Code:
            . list, noobs clean
            
                id         var   has1   has2   has3   has4   has5   has7   has11   has12  
                 1       1/2/4      1      1      0      1      0      0       0       0  
                 2       2/3/7      0      1      1      0      0      1       0       0  
                 3   1/2/3/4/5      1      1      1      1      1      0       0       0  
                 4       4/2/1      1      1      0      1      0      0       0       0  
                 5           2      0      1      0      0      0      0       0       0  
                 6       11/12      0      0      0      0      0      0       1       1

            Comment


            • #7
              -separate- provides a tricky and effective way.
              Code:
              split var, gen(V) p("/") destring
              
              reshape long V, i(var) string
              
              separate V, by(V) gen(has)
              collapse (count) has*, by(var)

              Comment


              • #8
                Dear Mike, I tried what you suggested but got no results. I wonder if the
                Code:
                . local possible = "1 2 3 4 5 6 7"  // 7 item checklist
                . display `possible'
                1234567
                is correct or not? Thanks.
                Ho-Chuan (River) Huang
                Stata 17.0, MP(4)

                Comment


                • #9
                  River Huang -display `possible' - would need to have quotes around the dereferenced macro -`possible'- so that it would know to interpret that as a string. I don't see that in what I posted above; perhaps it was there before I edited and corrected some typographical error in my post.

                  Comment


                  • #10
                    A minor variation on #4:
                    Code:
                    forvalues i = 1/15 {
                    
                        gen byte _`i' = strpos("/"+var+"/", "/`i'/") > 0
                    }

                    Comment


                    • #11
                      Dear Mike, I see, and thanks.

                      Ho-Chuan (River) Huang
                      Stata 17.0, MP(4)

                      Comment

                      Working...
                      X