Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bysort command

    Hello,
    I have a clarificatory question about the bysort command.
    Suppose in my datafile HHID is the ID given to each HH that is not unique.
    I want to run a command to see how many households that have daughters
    The command I used was ,

    Code:
    bys HHID: egen HHCH= max(relation==11)
    where relation is the variable that shows relation to the household head.

    What is the difference between using this first command and the following command?

    Code:
    bys STATEID DISTID PSUID HHID: egen HHDaughter= max(relation==11)[
    I have a unique identifier IDHH for the household.

    Heres my data

    [CODE]
    dataex STATEID DISTID PSUID HHSPLITID HHID IDHH

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int(STATEID DISTID PSUID HHSPLITID HHID) double IDHH
    1 2 1 0  1  10201010
    1 2 1 0  1  10201010
    1 2 1 0  1  10201010
    1 2 1 0  1  10201010
    1 2 1 0  1  10201010
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  2  10201020
    1 2 1 0  3  10201030
    I understand I can use
    Code:
    bys IDHH: egen HHdaughter= max(relation==11)
    .

    But what I need clarification on is the intuition between using bys STATEID DISTID PSUID HHID.
    How is the sorting taking place here vs "bys HHID" ?

    ThankYou

  • #2
    To clarify we need to know what is unclear. Faced with a big complicated dataset, there is a simple strategy: play with a simpler fake dataset to understand the principles.

    Here is a start.

    Sorting is one thing. If you sort on two or more variables, the data will be sorted first by the first variable you name, then by the next variable you name, and so on.

    by: is another thing. Doing anything by by varlist: means to calculate separately by groups of observations defined by the by: variables.

    bysort is both at once. You will get the sort order needed -- which may be the same as that at present.

    In your case, if (and only if)a particular IDHH is never repeated in other states, districts, etc. then you will get the same results for your calculation, but the sort order is likely to be different.

    In this fake example, the household identifier is never repeated in the other state of two, so all that varies is the sort order of results.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(stateid hhid relation)
    2 1  1
    2 1 11
    2 2  1
    2 2  2
    1 3  1
    1 3 11
    1 4  1
    1 4  2
    end
    
    bysort stateid hhid : egen daughter = max(relation == 11)
    
    list, sepby(stateid hhid)
    
    bysort hhid : egen DAUGHTER  = max(relation == 11)
    
    list, sepby(hhid stateid)
    Listings:

    Code:
    . list, sepby(stateid hhid)
    
         +--------------------------------------+
         | stateid   hhid   relation   daughter |
         |--------------------------------------|
      1. |       1      3          1          1 |
      2. |       1      3         11          1 |
         |--------------------------------------|
      3. |       1      4          1          0 |
      4. |       1      4          2          0 |
         |--------------------------------------|
      5. |       2      1         11          1 |
      6. |       2      1          1          1 |
         |--------------------------------------|
      7. |       2      2          2          0 |
      8. |       2      2          1          0 |
         +--------------------------------------+
    
    
    . list, sepby(hhid stateid)
    
         +-------------------------------------------------+
         | stateid   hhid   relation   daughter   DAUGHTER |
         |-------------------------------------------------|
      1. |       2      1          1          1          1 |
      2. |       2      1         11          1          1 |
         |-------------------------------------------------|
      3. |       2      2          2          0          0 |
      4. |       2      2          1          0          0 |
         |-------------------------------------------------|
      5. |       1      3         11          1          1 |
      6. |       1      3          1          1          1 |
         |-------------------------------------------------|
      7. |       1      4          1          0          0 |
      8. |       1      4          2          0          0 |
         +-------------------------------------------------+
    If in doubt then the only practical solution is to repeat the calculation and check that results are the same.

    assert daughter == DAUGHTER

    Comment

    Working...
    X