Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate indicator variables from string variable

    Dear All,

    I want to generate indicator variables from string variable q26_name_ghfac which contains 1,2,3,....,14 separated by comma. Indicator variables are expected to be like scheme_1, scheme_2...scheme_14 indicating 1 for presence of the no in q26_name_ghfac, 0 therwise.
    I tried split q26_name_ghfac, gen(scheme_) p(,) but failed to achieve desired result.

    Data is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 q26_name_ghfac
    "1"         
    "1"         
    "1,10"      
    "6"         
    ""          
    "7,11,12,13"
    ""          
    "11,12,13"  
    "6,12,13"   
    "12,13"     
    "12,13"     
    "1"         
    ""          
    "14"        
    "1"         
    "14"        
    "14"        
    ""          
    ""          
    "1,10"      
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    "6"         
    "1"         
    ""          
    ""          
    ""          
    ""          
    ""          
    "10"        
    ""          
    ""          
    "6"         
    "1"         
    ""          
    ""          
    ""          
    ""          
    ""          
    "10"        
    ""          
    ""          
    ""          
    ""          
    "10"        
    ""          
    ""          
    "6"         
    "1"         
    ""          
    ""          
    ""          
    "6"         
    "1"         
    ""          
    "12"        
    "12"        
    "12"        
    ""          
    ""          
    "14"        
    ""          
    "10"        
    ""          
    ""          
    ""          
    ""          
    "14"        
    "6"         
    "1"         
    ""          
    "13"        
    ""          
    "5,12,13"   
    ""          
    ""          
    ""          
    "6"         
    "1,5,8,14"  
    ""          
    "1,5,14"    
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    ""          
    "13"        
    "10"        
    end

    Thanks in advance

  • #2
    split parses into substrings. It doesn't yield a bundle of indicator variables.

    Technique is discussed within

    SJ-22-4 . . . . . . . . . . Stata tip 148: Searching for words within strings
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q4/22 SJ 22(4):998--1003 (no commands)
    tip on searching for words within strings
    https://journals.sagepub.com/doi/pdf...6867X221141068


    The main pitfall is that if we look for 1 we will -- if we are not careful -- find 1 within 11 12 13 14 as well as in 1, looking for 2 will find 2 within 12 as well as in 2, and so forth.

    Here is one of several ways to do it.

    The first trick is to work with a copy of the original.

    Then we look in turn for instances of 14 13 12 11 10 and remove whatever we find from the copy.

    Then we are safe in looking for 9 down to 1.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 q26_name_ghfac
    "1"         
    "1"         
    "1,10"      
    "6"         
    "7,11,12,13"
    "11,12,13"  
    "6,12,13"   
    "12,13"     
    "12,13"     
    "1"         
    "14"        
    "1"         
    "14"        
    "14"        
    "1,10"      
    "6"         
    "1"         
    "10"        
    "6"         
    "1"         
    "10"        
    "10"        
    "6"         
    "1"         
    "6"         
    "1"         
    "12"        
    "12"        
    "12"        
    "14"        
    "10"        
    "14"        
    "6"         
    "1"         
    "13"        
    "5,12,13"   
    "6"         
    "1,5,8,14"  
    "1,5,14"    
    "13"        
    "10"        
    end
    
    sort q26 
    
    clonevar work = q26 
    forval j = 14(-1)1 { 
        gen is`j' = strpos(work, "`j'") > 0 
        replace work = subinstr(work, "`j'", "", 1)
    }
    
    quietly forval j = 1/14 { 
        count if is`j' 
        if r(N) local toshow `toshow' is`j'
    }
    
    list q26 `toshow' 
    
         +-----------------------------------------------------------------------------+
         | q26_name~c   is1   is5   is6   is7   is8   is10   is11   is12   is13   is14 |
         |-----------------------------------------------------------------------------|
      1. |          1     1     0     0     0     0      0      0      0      0      0 |
      2. |          1     1     0     0     0     0      0      0      0      0      0 |
      3. |          1     1     0     0     0     0      0      0      0      0      0 |
      4. |          1     1     0     0     0     0      0      0      0      0      0 |
      5. |          1     1     0     0     0     0      0      0      0      0      0 |
         |-----------------------------------------------------------------------------|
      6. |          1     1     0     0     0     0      0      0      0      0      0 |
      7. |          1     1     0     0     0     0      0      0      0      0      0 |
      8. |          1     1     0     0     0     0      0      0      0      0      0 |
      9. |          1     1     0     0     0     0      0      0      0      0      0 |
     10. |       1,10     1     0     0     0     0      1      0      0      0      0 |
         |-----------------------------------------------------------------------------|
     11. |       1,10     1     0     0     0     0      1      0      0      0      0 |
     12. |     1,5,14     1     1     0     0     0      0      0      0      0      1 |
     13. |   1,5,8,14     1     1     0     0     1      0      0      0      0      1 |
     14. |         10     0     0     0     0     0      1      0      0      0      0 |
     15. |         10     0     0     0     0     0      1      0      0      0      0 |
         |-----------------------------------------------------------------------------|
     16. |         10     0     0     0     0     0      1      0      0      0      0 |
     17. |         10     0     0     0     0     0      1      0      0      0      0 |
     18. |         10     0     0     0     0     0      1      0      0      0      0 |
     19. |   11,12,13     0     0     0     0     0      0      1      1      1      0 |
     20. |         12     0     0     0     0     0      0      0      1      0      0 |
         |-----------------------------------------------------------------------------|
     21. |         12     0     0     0     0     0      0      0      1      0      0 |
     22. |         12     0     0     0     0     0      0      0      1      0      0 |
     23. |      12,13     0     0     0     0     0      0      0      1      1      0 |
     24. |      12,13     0     0     0     0     0      0      0      1      1      0 |
     25. |         13     0     0     0     0     0      0      0      0      1      0 |
         |-----------------------------------------------------------------------------|
     26. |         13     0     0     0     0     0      0      0      0      1      0 |
     27. |         14     0     0     0     0     0      0      0      0      0      1 |
     28. |         14     0     0     0     0     0      0      0      0      0      1 |
     29. |         14     0     0     0     0     0      0      0      0      0      1 |
     30. |         14     0     0     0     0     0      0      0      0      0      1 |
         |-----------------------------------------------------------------------------|
     31. |         14     0     0     0     0     0      0      0      0      0      1 |
     32. |    5,12,13     0     1     0     0     0      0      0      1      1      0 |
     33. |          6     0     0     1     0     0      0      0      0      0      0 |
     34. |          6     0     0     1     0     0      0      0      0      0      0 |
     35. |          6     0     0     1     0     0      0      0      0      0      0 |
         |-----------------------------------------------------------------------------|
     36. |          6     0     0     1     0     0      0      0      0      0      0 |
     37. |          6     0     0     1     0     0      0      0      0      0      0 |
     38. |          6     0     0     1     0     0      0      0      0      0      0 |
     39. |          6     0     0     1     0     0      0      0      0      0      0 |
     40. |    6,12,13     0     0     1     0     0      0      0      1      1      0 |
         |-----------------------------------------------------------------------------|
     41. | 7,11,12,13     0     0     0     1     0      0      1      1      1      0 |
         +-----------------------------------------------------------------------------+

    Comment


    • #3
      Here is another way, building upon OP's idea of splitting the original string variable:

      Code:
      split q26_name_ghfac, gen(num_) p(,)
      destring num_*, replace
      
      forval i = 1/14 {
          egen byte scheme_`i' = anymatch(num_*) , val(`i')
      }
      
      drop num_*
      which yields (showing the first 10 observations as an example):

      Code:
      . list in 1/10, sep(0) abbrev(14) noobs
      
        +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
        | q26_name_ghfac   scheme_1   scheme_2   scheme_3   scheme_4   scheme_5   scheme_6   scheme_7   scheme_8   scheme_9   scheme_10   scheme_11   scheme_12   scheme_13   scheme_14 |
        |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
        |              1          1          0          0          0          0          0          0          0          0           0           0           0           0           0 |
        |              1          1          0          0          0          0          0          0          0          0           0           0           0           0           0 |
        |           1,10          1          0          0          0          0          0          0          0          0           1           0           0           0           0 |
        |              6          0          0          0          0          0          1          0          0          0           0           0           0           0           0 |
        |                         0          0          0          0          0          0          0          0          0           0           0           0           0           0 |
        |     7,11,12,13          0          0          0          0          0          0          1          0          0           0           1           1           1           0 |
        |                         0          0          0          0          0          0          0          0          0           0           0           0           0           0 |
        |       11,12,13          0          0          0          0          0          0          0          0          0           0           1           1           1           0 |
        |        6,12,13          0          0          0          0          0          1          0          0          0           0           0           1           1           0 |
        |          12,13          0          0          0          0          0          0          0          0          0           0           0           1           1           0 |
        +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
      Last edited by Hemanshu Kumar; 21 Jun 2024, 10:23.

      Comment

      Working...
      X