Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding the behavior of the group function

    Dear all,

    I am a bit puzzled about the behavior of the group function. My dataset is like this (excluding the group column). I ran the following code and expected Stata to produce the column 'group' you see below. However, the result is a group column containing many different group numbers within the same year like 234, 23, 12, 1,...etc. Do you know by any chance the reason of this behavior?

    Code:
    bysort ID (year): gen group=group(year)
    ID year var1 group
    445678 2006 454 1
    445678 2006 788 1
    445678 2006 67567 1
    445678 2006 546 1
    445678 2007 678 2
    445678 2007 7868 2
    445678 2008 67878 3
    445678 2009 6709 4
    445678 2009 546 4

    Many thanks for your help!
    Riccardo

  • #2
    I dont think anything "weird" with the results. Perhaps the right question is why what you did is different from what you were expecting, and more important what is exactly what you were expecting?
    Fernando

    Comment


    • #3
      I was expecting a column like 'group expected' below. Instead, I got something like 'group obtained'. It's like -group- did not consider my by (varlist): statement.
      ID year var1 group expected group obtained
      445678 2006 454 1 234
      445678 2006 788 1 23
      445678 2006 67567 1 344
      445678 2006 546 1 21
      445678 2007 678 2 12
      445678 2007 7868 2 567
      445678 2008 67878 3 53
      445678 2009 6709 4 29
      445678 2009 546 4 78
      445679 2007 656 1 435
      445679 2007 568 1 12
      Last edited by Riccardo Valboni; 19 Jun 2014, 12:55.

      Comment


      • #4
        Try this:
        egen group_ob=group(id year)

        Comment


        • #5
          To get what you want, you should use the -egen- function group. (However, it cannot be used with -by-.)

          Code:
          egen group = group(year)
          will produce what you expected. The non-egen group() function you are using is a relic from earlier versions of Stata. It has been kept around so that old code that used it will still run. But it is not recommended for current use, and it disappeared long enough ago that I, at least, don't remember what it actually does (did).

          All of that said, what are you trying to accomplish here. You could have also gotten the same result with just

          Code:
          gen group = year - 2005
          And, even so, why do you need a variable that just encodes the year in a slightly different way? Although the -egen group()- function can certainly be run with just one variable, as here, in most applications the idea is to designate combinations of two or more variables. Perhaps that is what you had in mind by putting -bysort ID- in your code. Your example doesn't show us what you expect to get when there is more than one value of the ID variable in the data set. But perhaps what you are really looking for is

          Code:
          egen group = group(ID year)
          If none of this covers what you want, you should provide a more detailed explanation that also includes multiple values for the ID variable.

          Comment


          • #6
            My example was unclear with respect to what I wanted. I would like -group- to restart counting every time the ID changes. See the edit of my example above. Apologies for the confusion.
            Last edited by Riccardo Valboni; 19 Jun 2014, 12:56.

            Comment


            • #7
              As Fernando implies, the group() function invoked by generate is quite different from egen's group() function. It is now undocumented, but see e.g.

              http://www.stata.com/statalist/archi.../msg00406.html

              and its refererences.

              Comment


              • #8
                .
                Last edited by FernandoRios; 19 Jun 2014, 13:04.

                Comment


                • #9
                  Thanks Nick.
                  That is something I didnt know, but I have used egen group to create fixed effects for a while.
                  nd Riccardo, if you provide a better example, it might probably be easier to provide more accurate suggestions.

                  Comment


                  • #10
                    Thank you for the responses. I extended the table above a bit. In the first year I see an ID, I want to give it 1, in the second I want to give it 2...etc The count restarts from 1 for every new ID Hope this clarifies
                    ID year var1 group expected
                    445678 2006 454 1
                    445678 2006 788 1
                    445678 2006 67567 1
                    445678 2006 546 1
                    445678 2007 678 2
                    445678 2007 7868 2
                    445678 2008 67878 3
                    445678 2009 6709 4
                    445678 2009 546 4
                    445679 2007 656 1
                    445679 2007 568 1
                    445679 2008 453 2
                    445679 2008 345 2

                    Last edited by Riccardo Valboni; 19 Jun 2014, 13:11.

                    Comment


                    • #11
                      I think I got it. My solution

                      Code:
                      use file.dta
                      keep ID year
                      duplicates drop
                      by ID (year): gen gruppo=_n
                      joinby ID year using file.dta
                      ​save file.dta, replace
                      Thank you all for answering!
                      Riccardo

                      Comment


                      • #12
                        No need for file choreography:

                        Code:
                         
                        bysort ID year : gen gruppo = _n == 1 
                        by ID: replace gruppo = sum(gruppo)

                        Comment


                        • #13
                          Wow. That's brilliant.

                          Comment

                          Working...
                          X