Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • check whether string variable contains characters from another variable

    Hi all,

    one short question:
    I want to check if - as in the title - a string variable (var1) contains characters (or the exact realization) from another string variable (var2).

    Example:
    var1 var2 var3
    blabla AAA blabla AAA 1
    blabla blabla AAA 0
    bla bla AAA 0
    bla bla bla BBB 0
    bla bla BBB BBB 1
    bla BBB blabla BBB 1
    CCC bla bla CCC 1
    thus something like

    gen var3=0
    replace var3=1 if strpos(var1,"var2")

    which of course does not work because s2 in strpos is not supposed to be a variable but a certain word.
    The variable thing makes it complicated to me.

    Thanks in advance

    Tim

  • #2
    s2 can certainly be a string variable, or indeed any string expression.

    Code:
    gen var3=strpos(var1, var2) > 0

    Comment


    • #3
      Stupid me.
      Thank you, Nick! Helpful as always!

      Comment


      • #4
        I am just interested, is there a way to have a check like this, but from among any of the observations of a variable, not only direct matches. In other words, to be able to go line by line and take a portion of a string and look across all observations in the other variable to check whether that partial string is contained within any of the observations of the other variable.

        Comment


        • #5
          Some technique:


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str4(y1 y2)
          "frog" "toad"
          "frog" "toad"
          "toad" "toad"
          "newt" "newt"
          "frog" "newt"
          end
          
          . levelsof y1 , local(levels)
          `"frog"' `"newt"' `"toad"'
          
          . foreach level of local levels {
            2. list y2 if strpos(y2, "`level'")
            3. }
          
               +------+
               |   y2 |
               |------|
            4. | newt |
            5. | newt |
               +------+
          
               +------+
               |   y2 |
               |------|
            1. | toad |
            2. | toad |
            3. | toad |
               +------+
          
          . forval j = 1/`=_N
          { required
          r(100);
          
          
          . forval j = 1/`=_N' {
            2. di "`j'"
            3. list y2 if strpos(y2, y1[`j'])
            4. }
          1
          2
          3
          
               +------+
               |   y2 |
               |------|
            1. | toad |
            2. | toad |
            3. | toad |
               +------+
          4
          
               +------+
               |   y2 |
               |------|
            4. | newt |
            5. | newt |
               +------+
          5

          Comment


          • #6
            Thank you as always Nick Cox. Your solution worked very well. Now, if there would only be a way to condition the items returned by the -list- command based on another shared field. For example, what I am working with is a list of cities located within states. I would like to end up with a list of cities (var1) that are contained within the other variable (var2) only if they share the same state. That is, If there would be a way to introduce some sort of an if command/function in the code. I include below a sample of the data that I am working with. I want to check whether (and which ones) cities in variable "City" (_merge==2) are found as part of the variable containing a more complete version of place names (NAMELSAD) in _merge==1. I am really trying to check whether I could increase my overall match count by finding any places/cities not captured by using an exactly matching common ID. In this direction, I wish there was a way to use a more flexible ID option when merging two datasets - for example, being able to merge on a partial match (portion of a string) between the two ID fields, master and using.
            Code:
            . list
            
                 +-------------------------------------------------------------------------------------+
                 |           City                          NAMELSAD            State            _merge |
                 |-------------------------------------------------------------------------------------|
              1. |                                     Tri-City CDP           Oregon   master only (1) |
              2. |                                 South Weldon CDP   North Carolina   master only (1) |
              3. |                                 Vaughnsville CDP             Ohio   master only (1) |
              4. |                                 Hunts Point town       Washington   master only (1) |
              5. |                              West Livingston CDP            Texas   master only (1) |
                 |-------------------------------------------------------------------------------------|
              6. |                                 Freeland borough     Pennsylvania   master only (1) |
              7. |                                       Piedra CDP         Colorado   master only (1) |
              8. |                  Parcelas La Milagrosa comunidad      Puerto Rico   master only (1) |
              9. |                                      Upsala city        Minnesota   master only (1) |
             10. |                                     Savannah CDP            Texas   master only (1) |
                 |-------------------------------------------------------------------------------------|
             11. |                                       Adams city        Wisconsin   master only (1) |
             12. |                                   Mansfield town       Washington   master only (1) |
             13. |     Rocky Hill                                        Connecticut    using only (2) |
             14. |    Mont Vernon                                      New Hampshire    using only (2) |
             15. |      Chikaming                                           Michigan    using only (2) |
                 |-------------------------------------------------------------------------------------|
             16. |         Pelham                                      New Hampshire    using only (2) |
             17. | Sagamore Hills                                               Ohio    using only (2) |
             18. |         Covert                                           Michigan    using only (2) |
             19. |      Dartmouth                                      Massachusetts    using only (2) |
             20. |      Lexington                                           Kentucky    using only (2) |
                 |-------------------------------------------------------------------------------------|
             21. |         Benton                                           Michigan    using only (2) |
             22. |            Lee                                      New Hampshire    using only (2) |
             23. |         Palmer                                      Massachusetts    using only (2) |
                 +-------------------------------------------------------------------------------------+

            Comment

            Working...
            X