Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse Strings

    Hello,

    I am trying to collapse data that looks like this...

    id activitydt status
    1 jan a
    2 jan b
    1 feb a
    2 feb c

    I want to collapse this data so that I can summarize status by activity date by counting the id's under each status at each activitydt.

    I tried the collapse command but id is a nonnumeric string that I cannot destring and it is too long to encode.

    Any ideas?

  • #2
    Code:
    clear
    input id str3 act str1 status
    1 jan a
    2 jan b
    1 feb a 
    2 feb c
    end
    levelsof status, local(s)
    foreach x in `s' {
        gen status_`x'= (status=="`x'")
    }
    collapse (sum) status_* , by(act)
    list
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      What do you mean by "too long to encode" ? Can you post examples? Post exactly what you tried with example data, please.
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment


      • #4
        This encoding works:

        Code:
        *clear all
        set more off
        
        input ///
        str50 somevar
        "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
        "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
        end
        
        encode somevar, gen(encovar)
        
        list, nolabel
        See help encode.

        Strings can be very long. See help data types.
        Last edited by Roberto Ferrer; 28 Apr 2014, 16:21.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Jorge-

          Your code seems to be working except i get the error "' invalid name" so something must be wrong with the syntax of the variable i am generating.

          Comment


          • #6
            Maybe your strings contain characters that can not be used for Stata variable names.
            Try;

            Code:
            clear
            input id str3 act str1 status
            1 jan a
            2 jan b
            1 feb a 
            2 feb c
            end
            levelsof status, local(s)
            foreach x in `s' {
                loc name=strtoname("status_`x'")
                gen `name'=(status=="`x'")
            }
            collapse (sum) status_* , by(act)
            list
            Jorge Eduardo Pérez Pérez
            www.jorgeperezperez.com

            Comment


            • #7
              Regardless of the mysterious error in code you don't show us, I don't see that whether your id is string has any bearing on counting the number of distinct values.

              Code:
               
              egen tag = tag(id date) 
              collapse (sum) tag, by(date)
              tells you the number of distinct values of your identifier at each date.

              See also

              SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
              (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
              Q4/08 SJ 8(4):557--568
              shows how to answer questions about distinct observations
              from first principles; provides a convenience command

              for a discussion from first principles. That's at http://www.stata-journal.com/sjpdf.h...iclenum=dm0042

              Comment


              • #8
                Originally posted by elehman View Post
                I want to collapse this data so that I can summarize status by activity date by counting the id's under each status at each activitydt.
                1. It is most confusing in the Stata world to write "summarize status" since about half of the readers would imagine that you want to run the command summarize on variable status. Since it is a string variable this would not give you much.
                2. Note how Jorge started writing the program in his solution with the input statement. Showing the data with list is wonderful. It communicates information to be able to solve the problem. But actually writing the input statement yourself in the question would reduce the costs of getting into the problem and encourage more people actually trying to help. It also resolves ambiguities with variable types: "is status a string variable with values a,b,c? or is it numeric with labeled values a,b,c? or is it numeric variable without labels with some irrelevant numeric values which we will just denote a,b,c for this example?". In this case Jorge considers id to be numeric (according to his input statement), but you are writing in your original message that it is a string. If it is a string, then it probably has some non-numeric characters in it, why not make it explicit in the example data?
                3. A (standard) way of dealing with ids is using egen n=group(id) to obtain numeric ids going 1,2,3....N which are more convenient to work with. Original id can be anything, including multiple variables (list). See if that helps.
                4. My take at your problem is something like this:
                Code:
                clear
                input str1 id str3 act str1 status
                1 jan a
                2 jan b
                1 feb a
                2 feb c
                1 jan a
                2 jan b
                x feb a
                x feb c
                end
                
                local uniqueonly=1
                if (`uniqueonly') {
                  duplicates drop id act status, force
                }
                
                generate byte one=1
                collapse (count) one, by(act status)
                
                list, clean noobs
                which creates the following:
                Code:
                    act   status   one 
                    feb        a     2 
                    feb        c     2 
                    jan        a     1 
                    jan        b     1
                adjust uniqueonly to your preference.

                Regards, Sergiy Radyakin

                Comment

                Working...
                X