Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count the frequency of change in a set of variables

    Hello folks, I ran into a small problem when I was trying to count the frequency of change in a set of variables , below is a sample of how the data looks like:

    Code:
    ID    cat1    cat2    cat3    cat4    cat5
    110    1    1    1    1    3
    134    1    2    3    1    1
    123    3    3    3    1    5
    145    4    5    1    1    1
    155    5    3    2    4    1
    167    4    4    3    1    2
    147    5    5    5    5    5
    166    4    3    3    3    1
    148    4    5    4    4    1
    in which ID stands for participant ID and there are five variables cat1 through cat5. Each variable has the same range of values from 1 to 5. However, I need to create a new variable counting the times of change from cat1 to cat5. For instance, for ID 110, the frequency of change from cat1 to cat5 is 1, for ID 155, the frequency of change from cat1 to cat 5 is 4. I hope I make myself clear. Is there any way for me to capture this information cuz I have a quite large data set. Thanks!

  • #2
    Try this:

    Code:
    gen count = 0
     foreach var of varlist cat1-cat5 {
       replace count = count +  (`var'>1)
    }
    Roman

    Comment


    • #3
      Hi Roman, thanks for the reply but I don't think your syntax works accurately.. For instance, I have an ID which has the same value for cat1 through cat5 but the newly created count variable has a value of 5 for that ID, which means the value from cat1 to cat5 changes 5 times, which is not accurate.

      Comment


      • #4
        I think Roman Mostazir's code in #2 will tell you the number of values of cat1 through cat5 that are > 1. I understand you to want the number of times that cat_i is different from cat_i-1.

        I would not do this in wide layout. I would go to long, and then it becomes very easy.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int id byte(cat1 cat2 cat3 cat4 cat5)
        110 1 1 1 1 3
        134 1 2 3 1 1
        123 3 3 3 1 5
        145 4 5 1 1 1
        155 5 3 2 4 1
        167 4 4 3 1 2
        147 5 5 5 5 5
        166 4 3 3 3 1
        148 4 5 4 4 1
        end
        
        reshape long cat, i(id) j(_j)
        by id (_j), sort: egen wanted = total(cat != cat[_n-1] & _n > 1)
        
        reshape wide
        Note: Actually, given the way Stata works, whatever you do with this data subsequently is likely to be easier if you leave the data in long layout. So I advise you not to use the -reshape wide- that appears at the end unless there is a compelling reason to go back to wide layout.

        In the future, please use the -dataex- command to show data examples. It took me longer to import your example data into Stata than it did to solve your problem. Please be considerate of those who want to help you. Run -ssc install dataex- to get the -dataex- command, and run -help dataex- for simple instructions for using it. When you use -dataex- you enable those who want to help you to create a complete and faithful replica of your Stata example with a simple copy/paste operation.

        Added: Crossed with #3

        Comment


        • #5
          Hi Clyde, thanks sooo much for the syntax and it worked perfectly. Sorry about the inconvenience and I will make sure to use detaex in the future.

          Comment


          • #6
            Apologies for misunderstanding the problem domain
            Roman

            Comment


            • #7
              Clyde's advice is good as always.

              Two footnotes:

              1. The help for egen does warn about not depending on subscripts such as _n, but this application should be fine.

              2. It's also not too difficult to do this without a reshape

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int id byte(cat1 cat2 cat3 cat4 cat5)
              110 1 1 1 1 3
              134 1 2 3 1 1
              123 3 3 3 1 5
              145 4 5 1 1 1
              155 5 3 2 4 1
              167 4 4 3 1 2
              147 5 5 5 5 5
              166 4 3 3 3 1
              148 4 5 4 4 1
              end
              
              gen count = 0
              
              quietly forval j = 2/5 {
                   replace count = count + (cat`j' != cat`=`j'-1')
              }
              
              list
              
                   +------------------------------------------------+
                   |  id   cat1   cat2   cat3   cat4   cat5   count |
                   |------------------------------------------------|
                1. | 110      1      1      1      1      3       1 |
                2. | 134      1      2      3      1      1       3 |
                3. | 123      3      3      3      1      5       2 |
                4. | 145      4      5      1      1      1       2 |
                5. | 155      5      3      2      4      1       4 |
                   |------------------------------------------------|
                6. | 167      4      4      3      1      2       3 |
                7. | 147      5      5      5      5      5       0 |
                8. | 166      4      3      3      3      1       2 |
                9. | 148      4      5      4      4      1       3 |
                   +------------------------------------------------+

              Comment


              • #8
                Hi Nick, thanks for the new approach and it also works perfectly in my case.

                Comment

                Working...
                X