Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data cleaning: Select all that apply

    Hi! I have a Qualtrics survey with several questions that have "select all that apply" options. I want to separate out all the responses into dummy variables so they can be a bit more flexible. I originally did this:

    gen feelings_traumatized = strmatch(feelings, "*4*")
    label var feelings_traumatized "Experienced feeling traumatized"
    label val feelings_traumatized feelings_traumatized


    However, when I ran into a "select all that apply" question that had more than 9 responses available, the whole thing fell apart. I tried stuff like gen var_a = strmatch(var, "*1*" & !"*11*" & !"*21*"), but that didn't work.

    Does anyone have a good way for dealing with cleaning these types of survey questions? If it helps, here is some of my output when I tab the
    variable:

    Feelings | Freq. Percent Cum.
    ---------------------+-----------------------------------
    1 | 1 0.29 0.29
    1,2,3,5,7,8,11,16 | 1 0.29 0.59
    1,3 | 2 0.59 1.17
    1,3,5 | 1 0.29 1.47
    1,3,5,11 | 1 0.29 1.76
    1,3,5,13 | 1 0.29 2.05
    1,3,5,6,7 | 1 0.29 2.35
    1,3,5,7 | 6 1.76 4.11
    1,3,5,7,11 | 2 0.59 4.69
    1,3,5,7,11,16 | 1 0.29 4.99

    I've been stuck on this for a while and would really appreciate any insight anyone can offer. Thank you!

  • #2
    Code:
    split feelings, gen(response) parse(",") destring
    egen byte var_a = anymatch(response*), values(1)
    In the future, when asking for help with code, please show example data, and please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.
    Last edited by Clyde Schechter; 18 Jan 2022, 16:26.

    Comment


    • #3
      Clyde Schechter gave excellent advice as usual. In addition a perhaps ancient-looking paper remains pertinent. https://www.stata-journal.com/articl...article=pr0008

      Here is another approach. I give results only for reasons 1 to 7 but the principle extends to 16 or whatever number is needed.

      Note the tricky detail. It's not enough to look for say the occurrence of characters 1 2 3 because then the occurrence of 10 11 12 13 would yield false positives.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str31 whatever
      "1"                
      "1,2,3,5,7,8,11,16"
      "1,3"              
      "1,3,5"            
      "1,3,5,11"         
      "1,3,5,13"         
      "1,3,5,6,7"        
      "1,3,5,7"          
      "1,3,5,7,11"       
      "1,3,5,7,11,16"    
      end
      
      
      forval j = 1/7 {
          gen ans`j' = strpos("," + whatever + ",", ",`j',") > 0
      }
      
      list 
      
           +--------------------------------------------------------------------+
           |          whatever   ans1   ans2   ans3   ans4   ans5   ans6   ans7 |
           |--------------------------------------------------------------------|
        1. |                 1      1      0      0      0      0      0      0 |
        2. | 1,2,3,5,7,8,11,16      1      1      1      0      1      0      1 |
        3. |               1,3      1      0      1      0      0      0      0 |
        4. |             1,3,5      1      0      1      0      1      0      0 |
        5. |          1,3,5,11      1      0      1      0      1      0      0 |
           |--------------------------------------------------------------------|
        6. |          1,3,5,13      1      0      1      0      1      0      0 |
        7. |         1,3,5,6,7      1      0      1      0      1      1      1 |
        8. |           1,3,5,7      1      0      1      0      1      0      1 |
        9. |        1,3,5,7,11      1      0      1      0      1      0      1 |
       10. |     1,3,5,7,11,16      1      0      1      0      1      0      1 |
           +--------------------------------------------------------------------+

      Comment

      Working...
      X