Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting distinct donor_id for company_id and year

    company_id year grant donor_id
    co_001 2003 1 do_100
    co_001 2004 0
    co_001 2005 1 do_200
    co_001 2005 1 do_100
    co_002 2005 0
    co_002 2006 1 do_300
    co_002 2006 1 do_300
    co_002 2006 1 do_500
    co_003 2007 1 do_300
    co_003 2007 1 do_600
    co_003 2007 1 do_600
    co_003 2007 1 do_400
    co_004 2008 0
    co_004 2011 1 do_100

    Create a variable unique_donors that counts the number of distinct donors for each company each year. E.g. for co_001 in year 2005, there are 2 different donors- do_200 and do_100. Similarly, for co_003 in year 2007, there are a total of 4 grants but 3 unique donors- do_300, do_600 and do_400
    Code: egen id= group( company_id)
    xtset id
    if i do xtset id year then result box is showing repeated years. Then I did
    bysort id year donor_id : gen unique_id :_n== 1
    But I am unable to get my result. I ask my seniors here where am I going wrong ?

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 company_id int year byte grant str6 donor_id
    "co_001" 2003 1 "do_100"
    "co_001" 2004 0 ""      
    "co_001" 2005 1 "do_200"
    "co_001" 2005 1 "do_100"
    "co_002" 2005 0 ""      
    "co_002" 2006 1 "do_300"
    "co_002" 2006 1 "do_300"
    "co_002" 2006 1 "do_500"
    "co_003" 2007 1 "do_300"
    "co_003" 2007 1 "do_600"
    "co_003" 2007 1 "do_600"
    "co_003" 2007 1 "do_400"
    "co_004" 2008 0 ""      
    "co_004" 2011 1 "do_100"
    end
    
    by company_id year (donor_id), sort: egen donors = total(donor_id!=donor_id[_n-1] & donor_id!="")
    list, sepby(company_id year)
    Code:
    . list, sepby(company_id year)
    
         +---------------------------------------------+
         | compan~d   year   grant   donor_id   donors |
         |---------------------------------------------|
      1. |   co_001   2003       1     do_100        1 |
         |---------------------------------------------|
      2. |   co_001   2004       0                   0 |
         |---------------------------------------------|
      3. |   co_001   2005       1     do_100        2 |
      4. |   co_001   2005       1     do_200        2 |
         |---------------------------------------------|
      5. |   co_002   2005       0                   0 |
         |---------------------------------------------|
      6. |   co_002   2006       1     do_300        2 |
      7. |   co_002   2006       1     do_300        2 |
      8. |   co_002   2006       1     do_500        2 |
         |---------------------------------------------|
      9. |   co_003   2007       1     do_300        3 |
     10. |   co_003   2007       1     do_400        3 |
     11. |   co_003   2007       1     do_600        3 |
     12. |   co_003   2007       1     do_600        3 |
         |---------------------------------------------|
     13. |   co_004   2008       0                   0 |
         |---------------------------------------------|
     14. |   co_004   2011       1     do_100        1 |
         +---------------------------------------------+

    Comment


    • #3
      William Lisowski's code works fine in practice.

      In principle, users of egen in the past were warned in its documentation off using subscripts such as _n - 1 . That was because egen feels at liberty to sort observations temporarily to do what it has to do, and so subscript intentions may be mangled. This warning seems to have disappeared -- perhaps egen has been tightened up in this respect.

      In practice. William carefully arranges the
      sort order he needs and egen has no reason to change it.


      This equivalent way to proceed is based on https://www.stata-journal.com/articl...article=dm0042 -- see especially p.563. There is an
      egen function nvals() in egenmore from SSC, but I prefer to do the calculation this way and the author of nvals() advises similarly.


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str6 company_id int year byte grant str6 donor_id
      "co_001" 2003 1 "do_100"
      "co_001" 2004 0 ""      
      "co_001" 2005 1 "do_200"
      "co_001" 2005 1 "do_100"
      "co_002" 2005 0 ""      
      "co_002" 2006 1 "do_300"
      "co_002" 2006 1 "do_300"
      "co_002" 2006 1 "do_500"
      "co_003" 2007 1 "do_300"
      "co_003" 2007 1 "do_600"
      "co_003" 2007 1 "do_600"
      "co_003" 2007 1 "do_400"
      "co_004" 2008 0 ""      
      "co_004" 2011 1 "do_100"
      end
      
      egen tag = tag(company_id donor_id year) if donor_id != ""
      egen wanted = total(tag), by(company_id year)
      sort company_id year
      list, sepby(company_id year)
      
          +---------------------------------------------------+
           | compan~d   year   grant   donor_id   tag   wanted |
           |---------------------------------------------------|
        1. |   co_001   2003       1     do_100     1        1 |
           |---------------------------------------------------|
        2. |   co_001   2004       0                0        0 |
           |---------------------------------------------------|
        3. |   co_001   2005       1     do_200     1        2 |
        4. |   co_001   2005       1     do_100     1        2 |
           |---------------------------------------------------|
        5. |   co_002   2005       0                0        0 |
           |---------------------------------------------------|
        6. |   co_002   2006       1     do_300     1        2 |
        7. |   co_002   2006       1     do_300     0        2 |
        8. |   co_002   2006       1     do_500     1        2 |
           |---------------------------------------------------|
        9. |   co_003   2007       1     do_300     1        3 |
       10. |   co_003   2007       1     do_600     1        3 |
       11. |   co_003   2007       1     do_600     0        3 |
       12. |   co_003   2007       1     do_400     1        3 |
           |---------------------------------------------------|
       13. |   co_004   2008       0                0        0 |
           |---------------------------------------------------|
       14. |   co_004   2011       1     do_100     1        1 |
           +---------------------------------------------------+


      Last edited by Nick Cox; 15 Mar 2022, 10:29.

      Comment


      • #4
        Thank you so much Senior. This helped alot . I wonder how can we use the collapse command to solve the above posted one.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          William Lisowski's code works fine in practice.

          In principle, users of egen in the past were warned in its documentation off using subscripts such as _n - 1 . That was because egen feels at liberty to sort observations temporarily to do what it has to do, and so subscript intentions may be mangled. This warning seems to have disappeared -- perhaps egen has been tightened up in this respect.

          In practice. William carefully arranges the
          sort order he needs and egen has no reason to change it.


          This equivalent way to proceed is based on https://www.stata-journal.com/articl...article=dm0042 -- see especially p.563. There is an
          egen function nvals() in egenmore from SSC, but I prefer to do the calculation this way and the author of nvals() advises similarly.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str6 company_id int year byte grant str6 donor_id
          "co_001" 2003 1 "do_100"
          "co_001" 2004 0 ""
          "co_001" 2005 1 "do_200"
          "co_001" 2005 1 "do_100"
          "co_002" 2005 0 ""
          "co_002" 2006 1 "do_300"
          "co_002" 2006 1 "do_300"
          "co_002" 2006 1 "do_500"
          "co_003" 2007 1 "do_300"
          "co_003" 2007 1 "do_600"
          "co_003" 2007 1 "do_600"
          "co_003" 2007 1 "do_400"
          "co_004" 2008 0 ""
          "co_004" 2011 1 "do_100"
          end
          
          egen tag = tag(company_id donor_id year) if donor_id != ""
          egen wanted = total(tag), by(company_id year)
          sort company_id year
          list, sepby(company_id year)
          
          +---------------------------------------------------+
          | compan~d year grant donor_id tag wanted |
          |---------------------------------------------------|
          1. | co_001 2003 1 do_100 1 1 |
          |---------------------------------------------------|
          2. | co_001 2004 0 0 0 |
          |---------------------------------------------------|
          3. | co_001 2005 1 do_200 1 2 |
          4. | co_001 2005 1 do_100 1 2 |
          |---------------------------------------------------|
          5. | co_002 2005 0 0 0 |
          |---------------------------------------------------|
          6. | co_002 2006 1 do_300 1 2 |
          7. | co_002 2006 1 do_300 0 2 |
          8. | co_002 2006 1 do_500 1 2 |
          |---------------------------------------------------|
          9. | co_003 2007 1 do_300 1 3 |
          10. | co_003 2007 1 do_600 1 3 |
          11. | co_003 2007 1 do_600 0 3 |
          12. | co_003 2007 1 do_400 1 3 |
          |---------------------------------------------------|
          13. | co_004 2008 0 0 0 |
          |---------------------------------------------------|
          14. | co_004 2011 1 do_100 1 1 |
          +---------------------------------------------------+

          Thank you so much Senior. I wonder how can we use collapse command for the above.

          Comment


          • #6
            Originally posted by William Lisowski View Post
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str6 company_id int year byte grant str6 donor_id
            "co_001" 2003 1 "do_100"
            "co_001" 2004 0 ""
            "co_001" 2005 1 "do_200"
            "co_001" 2005 1 "do_100"
            "co_002" 2005 0 ""
            "co_002" 2006 1 "do_300"
            "co_002" 2006 1 "do_300"
            "co_002" 2006 1 "do_500"
            "co_003" 2007 1 "do_300"
            "co_003" 2007 1 "do_600"
            "co_003" 2007 1 "do_600"
            "co_003" 2007 1 "do_400"
            "co_004" 2008 0 ""
            "co_004" 2011 1 "do_100"
            end
            
            by company_id year (donor_id), sort: egen donors = total(donor_id!=donor_id[_n-1] & donor_id!="")
            list, sepby(company_id year)
            Code:
            . list, sepby(company_id year)
            
            +---------------------------------------------+
            | compan~d year grant donor_id donors |
            |---------------------------------------------|
            1. | co_001 2003 1 do_100 1 |
            |---------------------------------------------|
            2. | co_001 2004 0 0 |
            |---------------------------------------------|
            3. | co_001 2005 1 do_100 2 |
            4. | co_001 2005 1 do_200 2 |
            |---------------------------------------------|
            5. | co_002 2005 0 0 |
            |---------------------------------------------|
            6. | co_002 2006 1 do_300 2 |
            7. | co_002 2006 1 do_300 2 |
            8. | co_002 2006 1 do_500 2 |
            |---------------------------------------------|
            9. | co_003 2007 1 do_300 3 |
            10. | co_003 2007 1 do_400 3 |
            11. | co_003 2007 1 do_600 3 |
            12. | co_003 2007 1 do_600 3 |
            |---------------------------------------------|
            13. | co_004 2008 0 0 |
            |---------------------------------------------|
            14. | co_004 2011 1 do_100 1 |
            +---------------------------------------------+
            Thank you so much for your help Senior. I wonder how to use collapse command for the above post .

            Comment


            • #7
              If you want to go straight to a collapsed dataset, you can do this:

              Code:
              . egen distinct = tag(company_id donor_id year) if donor_id != ""
              
              . collapse (sum) distinct, by(company_id year)
              
              . list, sepby(company_id)
              
                   +----------------------------+
                   | compan~d   year   distinct |
                   |----------------------------|
                1. |   co_001   2003          1 |
                2. |   co_001   2004          0 |
                3. |   co_001   2005          2 |
                   |----------------------------|
                4. |   co_002   2005          0 |
                5. |   co_002   2006          2 |
                   |----------------------------|
                6. |   co_003   2007          3 |
                   |----------------------------|
                7. |   co_004   2008          0 |
                8. |   co_004   2011          1 |
                   +----------------------------+

              Comment


              • #8
                Originally posted by Nick Cox View Post
                If you want to go straight to a collapsed dataset, you can do this:

                Code:
                . egen distinct = tag(company_id donor_id year) if donor_id != ""
                
                . collapse (sum) distinct, by(company_id year)
                
                . list, sepby(company_id)
                
                +----------------------------+
                | compan~d year distinct |
                |----------------------------|
                1. | co_001 2003 1 |
                2. | co_001 2004 0 |
                3. | co_001 2005 2 |
                |----------------------------|
                4. | co_002 2005 0 |
                5. | co_002 2006 2 |
                |----------------------------|
                6. | co_003 2007 3 |
                |----------------------------|
                7. | co_004 2008 0 |
                8. | co_004 2011 1 |
                +----------------------------+
                Thank you so much Senior. You saved me.

                Comment

                Working...
                X