Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing cases with string variables.

    Hi Everyone, I've come up against an issue I can't seem to solve!
    I have a data file from Survey Monkey in which a participant completed an online survey twice. But they only supplied answers in the second attempt that they hadn’t already supplied in the first. In other words, there are two cases but no conflicting responses across variable. So I would like to collapse the two cases into a single one so that I can maintain all of the responses, but have it all in one single case. I cannot however figure out a way to write some sort of code to combine the two cases into one. Collapse won’t work because many of the variables are string variables so it gives me an error. Can anyone help??
    Thanks!
    Candace

  • #2
    use one of the options for string variables, e.g.,

    Code:
    collapse (mean) somenumericvar (first) stringvar

    Comment


    • #3

      Suppose you identify the culprit as identifier 42. Then

      Code:
      count if id == 42 
      assert r(N) == 2 
      
      sort id 
      
      foreach v of var * { 
               by id: replace `v' =  `v'[2] if missing(`v'[1])  & id == 42 
               by id: replace `v' =  `v'[1] if missing(`v'[2])  & id == 42 
      }
      and then drop duplicates.

      Frankly, this could be easiest to do interactively and that will leave code in its wake. It just won't be self-explanatory code.

      Comment


      • #4
        Thank you both! Jorrit, yours worked perfect. Very easy!! Thank you so much!

        Comment


        • #5
          Hi All! Thanks for the help. No luck with either though. Jorrit, yours just kept one of the cases and deleted the responses from the second. Nick, yours worked but only for the numeric variables in my data set. For some reason the string variable did not carry over! Any other ideas? I'm not super familiar with Stata so I'm not even sure how to do it interactively to get the code.

          Comment


          • #6
            For anyone new joining the thread, in case the dilemma is not clear, the data currently look like this...
            id q1 q2 q3 q4 q5 q6
            1 1 5 string response
            1 6 3 string response
            2 5 6 string response 7 8 string response
            3 6 8 string response 4 1 string response
            want it to look like this
            id q1 q2 q3 q4 q5 q6
            1 1 5 string response 6 3 string response
            2 5 6 string response 7 8 string response
            3 6 8 string response 4 1 string response

            Comment


            • #7

              This works for me.

              Code:
              clear 
              input id  numvar str4 strvar 
              1   56  "cat" 
              2   78  "dog"
              42  12  ""
              42   .  "frog"
              end
              
              count if id == 42 
              assert r(N) == 2 
              
              sort id 
              
              foreach v of var * { 
                       by id: replace `v' =  `v'[2] if missing(`v'[1])  & id == 42 
                       by id: replace `v' =  `v'[1] if missing(`v'[2])  & id == 42 
              }
              
              list 
              
                   +----------------------+
                   | id   numvar   strvar |
                   |----------------------|
                1. |  1       56      cat |
                2. |  2       78      dog |
                3. | 42       12     frog |
                4. | 42       12     frog |
                   +----------------------+
              In the absence of any data example or evidence that it doesn't work as intended, my only guess is that your missing values are not really missing. For example, spaces don't count as missing, only entirely empty strings.

              Comment


              • #8
                Collapse does support for string in certain scope. The below code seems
                directly serving for this issue.

                Code:
                collapse (lastnm) var*, by(id)

                Comment


                • #9
                  Hi Nick!! You're right. It looks like Survey monkey makes a blank string variable a space rather than leaving it completely blank and that is why it's not working!! Any suggestions on where to go from here knowing that? Thank you so much for your help!!!

                  Comment


                  • #10
                    Perhaps
                    Code:
                    replace strvar = "" if strvar==" "

                    Comment


                    • #11
                      Or

                      Code:
                      replace strvar = trim(strvar)

                      Comment


                      • #12
                        Nick's answer is better than mine, and good practice when string data isn't behaving as expected. An extra space in the middle of a string is easy to spot; at the ends of a string, less so, and leading to mystifying results like yours, and tabulations that seem to show the same string value more than once. And of course, if Survey Monkey were to provide two spaces rather than one or none, my answer wouldn't work, and Nick's would.

                        Comment


                        • #13
                          Code:
                          foreach var of var q*{
                          capture replace `var'=trim(`var')
                          }
                          collapse (lastnm) q*, by(id)

                          Comment


                          • #14
                            I got it all to work using Nick's suggestions (replace strvar = "" if strvar==" ") Thank you soooo much!!

                            Comment


                            • #15
                              Pleased to hear that, but that wasn't my suggestion and it's not optimal....

                              Comment

                              Working...
                              X