Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen command related to count based on a condition

    Dear All
    Code:
    input str1 firm float(sales) int year
    "a" . 2010
    "a" . 2011
    "a" 5 2012
    "b" 6 2010
    "b" 7 2011
    "b" 8 2012
    end
    This is a sample dataset. Based on the sample dataset,first I would like to create group called pre-regulation(year 2010 & 2011) & post regulation(year 2012 & 2013). Then I would like to tell Stata to count the no: of times a firm reported sales during the pre-regulation period and keep it as dummy . For instance, in my example, during the pre-regulation period(2010-2011) firm "a" didnt report any sales whereas firm "b" reported sales during the year 2010& 2011(hence 2 times) . Thus "b" should be given a dummy. I tried egen salescount= count( sales), by(id ), but that will not be helpful in my case, since that counting ignores the period effect

    Here are the codes I tried
    Code:
    encode firm,gen(id)
    xtset id year
    gen pre_reg=1 if year <2012
    gen post_reg=1 if year >2011
    egen salescount = count( sales), by(id )         // gives count by id only and ignores year
    egen salescount2 = count( sales ), by(id  & pre_reg==1  )           // will not work, but I want some thing similar
    Am I making sense?If yes I request members to help me


  • #2
    Lal:
    I do hope that what follows can be helpful:
    Code:
    . input str1 firm float(sales) int year
    
              firm      sales      year
      1. "a" . 2010
      2. "a" . 2011
      3. "a" 5 2012
      4. "b" 6 2010
      5. "b" 7 2011
      6. "b" 8 2012
      7. end
    
    . egen pre_post=group( firm year) if year<=2011
    
    . replace pre_post=0 if pre_post!=.
    
    . replace pre_post=1 if pre_post==.
    
    . label define pre_post 0 pre_regulation 1 post_regulation, modify
    
    . label val pre_post pre_post
    
    . bysort firm: gen wanted=1 if sales!=. & pre_post==0
    
    
    . bysort firm: egen sum_wanted=sum( sales ) if wanted!=.
    
    
    . list
    
         +-----------------------------------------------------------+
         | firm   sales   year          pre_post   wanted   sum_wa~d |
         |-----------------------------------------------------------|
      1. |    a       .   2010    pre_regulation        .          . |
      2. |    a       .   2011    pre_regulation        .          . |
      3. |    a       5   2012   post_regulation        .          . |
      4. |    b       6   2010    pre_regulation        1         13 |
      5. |    b       7   2011    pre_regulation        1         13 |
         |-----------------------------------------------------------|
      6. |    b       8   2012   post_regulation        .          . |
         +-----------------------------------------------------------+
    
    .
    
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo for the prompt reply. This is fine

      Comment


      • #4
        This may help:

        Code:
        clear 
        
        input str1 firm float(sales) int year
        "a" . 2010
        "a" . 2011
        "a" 5 2012
        "b" 6 2010
        "b" 7 2011
        "b" 8 2012
        end
        
        egen count = total(sales < . & year < 2012), by(firm)
        
        gen indicator = count > 0 
        
        list, sepby(firm)
        
             +----------------------------------------+
             | firm   sales   year   count   indica~r |
             |----------------------------------------|
          1. |    a       .   2010       0          0 |
          2. |    a       .   2011       0          0 |
          3. |    a       5   2012       0          0 |
             |----------------------------------------|
          4. |    b       6   2010       2          1 |
          5. |    b       7   2011       2          1 |
          6. |    b       8   2012       2          1 |
             +----------------------------------------+
        See also https://www.stata-journal.com/articl...article=dm0055 My guess is that most beginner users will find there how to apply things they already know, and some things they didn't already know.

        If anyone reads it and knows all the small tricks, that's great, and you should be writing for the Stata Journal.

        Comment


        • #5
          Nick, thanks for the reply. Though I have not read properly, I have read a few articles, from the link,
          HTML Code:
          https://www.stata-journal.com/article.html?article=dm0055
          but a hands-on try sometimes seems to be difficult.
          With respect to your command
          Code:
           egen count = total(sales < . & year < 2012), by(firm)
          .
          What is the verbal meaning of this? I thought that "total" will give the cumulative sum of a variable(here sales). But if that is the case how come the new variable created "count" gives number "2"? I think I haven't understood this, which I will read my own. But if you could explain the verbal interpretation of the command you gave, I have a better chance.

          Comment


          • #6
            Names can mislead here. count() as an egen function counts non-missing values; it's not really a completely general function to count anything.

            But total() gives totals or sums (but not cumulative, meaning running, sums). And it can be used as a completely general function to count anything -- so long as you feed it the right infomation.

            How do you count anything, say how many pens are on your desk? or whatever. Here's a recipe. Score 1 if an object fits your rules and score 0 if it doesn't.

            So pen 1, pencil 0, book 0, pen 1, small child 0, and so on.

            Then add up. Add up means sum or count the 1s -- same answer from summing or counting. The 0s naturally can be ignored. So 1 + 1 + ... gives the count.

            When Stata looks at the expression

            Code:
            sales < . & year < 2012
            that is true whenever sales is less than missing (meaning, non-missing) and year is less than 2012. And logically true is always numerically evaluated as 1 and logically false is numerically evaluated as 0. For much more see https://www.stata.com/support/faqs/d...rue-and-false/

            There is a difference in that when you instruct Stata to add up 1s and 0s, it is not smart enough to ignore the zeros, but you get the right answer any way.
            Last edited by Nick Cox; 22 Jul 2020, 04:20.

            Comment


            • #7
              And for fun, here is another proposal:

              Code:
              . egen dummy = count(sales) if year<2012, by(firm)
              (2 missing values generated)
              
              . bysort firm (dummy): replace dummy = dummy[1]>0
              (4 real changes made)

              Comment


              • #8
                Thanks for the wonderful explanation Nick. The explanation of the logic related to command is highly help ful

                Comment


                • #9
                  Joro, Thanks for those commands.Once again thanks to all those who helped me here

                  Comment

                  Working...
                  X