Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Limitations of "by"

    Hi,
    I wonder whether there are limitations to the number of variables one can list after the by command?
    Is it possible for instance to use a command like this:
    by a b c d e f g, sort: gen x=1 if a==1 & b==2 & c==1 & d==1 & e==2 & g==3

    Or how about this:
    by a b c year id date, sort: gen x=_n if a==1 & b==1 & c==0
    Would this command eventually sort by "id" and "date" within "id"? What would then be the difference compared to the following:
    by a b c year id (date), sort: gen x=_n if a==1 & b==1 & c==0
    Would one of these commands create the a list of "_n" sorted by id and date within id given all those other "bys" and "ifs"?

    I want something looking like this (randomly created in excel - did not fill in all the dates, just the ones I am interested in):
    a b c year id date _n
    0 1 0 2004 x
    0 1 0 2004 x
    0 1 0 2004 x
    0 1 0 2004 x
    0 1 0 2004 x
    0 1 0 2004 xx
    0 1 0 2005 xx
    0 1 0 2005 xx
    0 1 1 2004 x
    0 1 2 2004 x
    0 1 2 2005 xx
    0 2 0 2004 x
    0 2 0 2004 x
    0 2 0 2005 x
    0 2 0 2006 xx
    0 2 1 2004 y
    0 2 1 2004 xx
    0 2 2 2004 x
    0 2 2 2005 xx
    1 1 0 2004 a 1.1.2004 1
    1 1 0 2004 a 2.1.2004 2
    1 1 0 2004 a 3.1.2004 3
    1 1 0 2004 a 4.1.2004 4
    1 1 0 2004 a 5.1.2004 5
    1 1 0 2004 b 1.1.2004 1
    1 1 0 2004 b 2.1.2004 2
    1 1 0 2005 a 1.1.2005 1
    1 1 0 2005 a 2.1.2005 2
    1 1 0 2005 b 1.1.2005 1
    1 1 0 2005 b 2.1.2005 2
    1 1 0 2005 c 1.1.2005 1
    1 1 0 2005 c 2.1.2005 2
    1 1 0 2006 a 1.1.2006 1
    1 1 0 2006 a 2.1.2006 2
    1 1 0 2006 b 1.1.2006 1
    1 1 0 2006 c 1.1.2006 1
    1 1 1 2004 a
    1 1 1 2005 b
    1 1 2 2006 c
    1 2 0 2004 a
    1 2 0 2004 a
    1 2 0 2005 a
    1 2 0 2006 b
    1 2 1 2004 b
    1 2 1 2004 c
    1 2 1 2004 c
    1 2 2 2006 c
    Thankful for any advice.
    Last edited by Ellinor Reven; 15 Feb 2022, 03:41.

  • #2
    I don't think there are any limits. My guess is that you could mention thousands of variable names following by: and the command would still be legal so far as that was concerned.


    Code:
    by a b c d e f g, sort: gen x=1 if a==1 & b==2 & c==1 & d==1 & e==2 & g==3
    is possible and harmless but pointless as it is the same calculation as

    Code:
    gen x=1 if a==1 & b==2 & c==1 & d==1 & e==2 & g==3
    in the same way as 2 + 2 is the same calculation in the room you are in or a room next to it or in my room: that is, where you do the calculation is not material to the rule used or the result it produces.

    Code:
    by a b c year id date, sort: gen x=_n if a==1 & b==1 & c==0
    on the other hand is a calculation where the by: context is vital as the observation number _n is calculated within groups so defined and following the sort order so defined.

    In datasets in which id and date jointly identify observations, the new variable would be 1 if the condition a==1 & b==1 & c==0 was true and missing otherwise.

    Code:
    by a b c year id (date), sort: gen x=_n if a==1 & b==1 & c==0 
    is quite different from the previous command in principle and only identical in effect if id date jointly identify observations.

    Although we ask that you give example data, the key points can be made more simply with generally accessible datasets. Thus in the following example counter1 and counter2 are different in principle and usually different in practice. One is a counter within companies ordered by date (and a re-creation of the existing variable time and the other is only ever 1 as company year pairs identify observations uniquely.

    Code:
    . webuse grunfeld, clear
    
    . ds
    company  year     invest   mvalue   kstock   time
    
    . bysort company (year) : gen counter1 = _n
    
    . bysort company year : gen counter2 = _n
    
    . list company year time counter? in 1/25
    
         +---------------------------------------------+
         | company   year   time   counter1   counter2 |
         |---------------------------------------------|
      1. |       1   1935      1          1          1 |
      2. |       1   1936      2          2          1 |
      3. |       1   1937      3          3          1 |
      4. |       1   1938      4          4          1 |
      5. |       1   1939      5          5          1 |
         |---------------------------------------------|
      6. |       1   1940      6          6          1 |
      7. |       1   1941      7          7          1 |
      8. |       1   1942      8          8          1 |
      9. |       1   1943      9          9          1 |
     10. |       1   1944     10         10          1 |
         |---------------------------------------------|
     11. |       1   1945     11         11          1 |
     12. |       1   1946     12         12          1 |
     13. |       1   1947     13         13          1 |
     14. |       1   1948     14         14          1 |
     15. |       1   1949     15         15          1 |
         |---------------------------------------------|
     16. |       1   1950     16         16          1 |
     17. |       1   1951     17         17          1 |
     18. |       1   1952     18         18          1 |
     19. |       1   1953     19         19          1 |
     20. |       1   1954     20         20          1 |
         |---------------------------------------------|
     21. |       2   1935      1          1          1 |
     22. |       2   1936      2          2          1 |
     23. |       2   1937      3          3          1 |
     24. |       2   1938      4          4          1 |
     25. |       2   1939      5          5          1 |
         +---------------------------------------------+
    .

    Comment

    Working...
    X