Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determine the unique number of obs in panel dataset

    Dear Profs and Colleagues,

    I have a panel data set, which means that for a period of ten years, the id here is (NPC_FIC) repeats. I need to determine the unique total number of employees (pempl) during the period of study. Basically, there are a certain number of employees who work among these firms during these 10 years. I need to know how many there are. The point is that some firms may for some years reduce/ increase the amount of their employees. So dropping duplicates is not a good idea.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(year NPC_FIC pempl)
    2010 500000001 3
    2010 500000002 5
    2011 500000002 4
    2012 500000002 5
    2013 500000002 4
    2014 500000002 1
    2015 500000002 2
    2016 500000002 2
    2017 500000002 2
    2018 500000002 1
    2010 500000033 2
    2011 500000033 2
    2012 500000033 2
    2013 500000033 1
    2014 500000033 1
    2015 500000033 2
    2016 500000033 1
    2017 500000033 1
    2018 500000033 1
    2019 500000033 1
    2010 500000050 2
    2011 500000050 1
    2012 500000050 1
    2013 500000050 1
    2014 500000050 2
    2015 500000050 2
    2016 500000069 1
    2017 500000069 1
    2019 500000073 2
    2010 500000083 1
    2011 500000083 1
    2012 500000083 1
    2013 500000083 1
    2014 500000083 1
    2015 500000083 1
    2016 500000083 1
    2017 500000083 1
    2018 500000083 2
    2019 500000083 2
    2015 500000101 1
    2016 500000101 1
    2017 500000101 1
    2018 500000101 1
    2019 500000101 1
    2010 500000104 2
    2011 500000106 1
    2011 500000113 1
    2012 500000113 1
    2013 500000113 1
    2014 500000113 1
    2010 500000119 8
    2011 500000119 9
    2012 500000119 8
    2010 500000121 2
    2011 500000150 1
    2012 500000150 1
    2013 500000150 1
    2014 500000150 1
    2010 500000156 2
    2016 500000156 1
    2017 500000156 1
    2018 500000156 1
    2019 500000156 1
    2010 500000157 4
    2011 500000157 3
    2012 500000157 2
    2010 500000165 4
    2011 500000165 4
    2012 500000165 3
    2013 500000165 3
    2014 500000165 3
    2015 500000165 3
    2016 500000165 3
    2017 500000165 3
    2010 500000180 1
    2010 500000198 3
    2011 500000198 3
    2012 500000198 2
    2013 500000198 2
    2014 500000198 2
    2015 500000198 2
    2016 500000198 2
    2017 500000198 2
    2010 500000201 1
    2011 500000201 1
    2012 500000201 1
    2013 500000201 1
    2010 500000204 2
    2011 500000204 2
    2012 500000204 2
    2013 500000204 3
    2014 500000204 3
    2015 500000204 3
    2016 500000204 3
    2017 500000204 3
    2018 500000204 1
    2019 500000204 2
    2010 500000212 1
    2011 500000212 1
    2012 500000212 1
    end
    The second question is I am going to determine firms that have:
    1- have 10 or few than 10 employees
    2- between 10-250 employees
    3- more than 250 employees

    Any ideas are appreciated.
    Cheers,
    Paris

  • #2
    assorted stuff of potential interest

    Code:
    tabstat pempl, stats(sum)
    
    egen firm_emp_all = sum(pempl), by(NPC_FIC)
    egen emp_all = sum(pempl)
    egen emp_all_year = sum(pempl) , by(year)
    
    g emp1 = pempl<=10
    g emp2 = pempl>10 & pempl<=250
    g emp3 = pempl>250
    
    *could use firm_emp_all for these 3 vars, or one of the others

    Comment


    • #3
      Dear Ford, none of these codes
      egen firm_emp_all = sum(pempl), by(NPC_FIC)

      egen emp_all = sum(pempl)

      egen emp_all_year = sum(pempl) , by(year)

      is not correct as I said because of the repeated number of employees exits , also codes for the emp1-2-3
      Code:
      . sum emp_all_year emp_all
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
      emp_all_year |  2,808,027     2896649    190608.9    2651500    3225343
           emp_all |  2,808,027    2.89e+07           0   2.89e+07   2.89e+07
      
      . sum emp1 emp2 emp3
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
              emp1 |  2,808,027    .8555922    .3515028          0          1
              emp2 |  2,808,027    .1410229    .3480452          0          1
              emp3 |  2,808,027    .0033849    .0580817          0          1

      Comment


      • #4
        Originally posted by Paris Rira View Post
        I have a panel data set, which means that for a period of ten years, the id here is (NPC_FIC) repeats. I need to determine the unique total number of employees (pempl) during the period of study. Basically, there are a certain number of employees who work among these firms during these 10 years. I need to know how many there are.
        It is impossible without an employee identifier.


        input double(year NPC_FIC pempl)
        2010 500000002 5
        2011 500000002 4
        The above could imply that:

        - 1 employee quitted in 2011, none was employed.
        - 2 employees quitted in 2011, 1 was employed.
        -
        -
        -
        - 5 employeed quitted in 2011, 4 were employed.

        Without an employee identifier, you cannot tell for sure which of the listed scenarios occurred.

        Comment


        • #5
          So, what's the number of employees yearly? Unique numbers? As you can see there are some repeated numbers in Pempl variable.

          Comment


          • #6
            As long as no employee works in more than one firm, then it is the sum of employees in a year. The uniqueness holds within-year, not between-year, for the reasons given in #4.

            Code:
            isid NPC_FIC year
            frame put pempl year, into(wanted)
            frame wanted: collapse (sum) pempl, by(year)
            frame wanted: list, sep(0)
            frame drop wanted
            Res.:

            Code:
            . frame wanted: list, sep(0)
            
                 +--------------+
                 | year   pempl |
                 |--------------|
              1. | 2010      43 |
              2. | 2011      34 |
              3. | 2012      30 |
              4. | 2013      18 |
              5. | 2014      15 |
              6. | 2015      16 |
              7. | 2016      15 |
              8. | 2017      15 |
              9. | 2018       7 |
             10. | 2019       9 |
                 +--------------+
            Last edited by Andrew Musau; 07 Apr 2024, 10:50.

            Comment


            • #7
              Prof Andrew, Thank you so much. It works great.
              Do you have any ideas for the second question? How many Firms have 10 employees or less?
              How many firms have 10-250 employees ? How many firms have more than 250 employees?

              Comment


              • #8
                All firms in your example have less than or equal to 10 employees, but here's how you'd count the firms.

                Code:
                isid NPC_FIC year
                assert !missing(pempl)
                frame put NPC_FIC year pempl, into(wanted)
                frame wanted{
                    gen which= cond(pempl<=10, "<=10 employees", cond(inrange(pempl, 11, 250), "11-250 employees", ">250 employees"))
                    contract which year, freq(count)
                    l, sepby(which)
                }
                frame drop wanted
                Res.:

                Code:
                .     l, sepby(which)
                
                     +-------------------------------+
                     | year            which   count |
                     |-------------------------------|
                  1. | 2010   <=10 employees      16 |
                  2. | 2011   <=10 employees      14 |
                  3. | 2012   <=10 employees      13 |
                  4. | 2013   <=10 employees      10 |
                  5. | 2014   <=10 employees       9 |
                  6. | 2015   <=10 employees       8 |
                  7. | 2016   <=10 employees       9 |
                  8. | 2017   <=10 employees       9 |
                  9. | 2018   <=10 employees       6 |
                 10. | 2019   <=10 employees       6 |
                     +-------------------------------+

                Comment


                • #9
                  Code:
                  . frame put NPC_FIC year pempl, into(wanted)
                  
                  . frame wanted{
                  .     gen which= cond(pempl<=10, "<=10 employees", cond(inrange(pempl, 11, 250), "11-250 employees", ">250 employees"))
                  .     contract which year, freq(count)
                  .     l, sepby(which)
                  
                       +----------------------------------+
                       | year              which    count |
                       |----------------------------------|
                    1. | 2010   11-250 employees    41212 |
                    2. | 2011   11-250 employees    39592 |
                    3. | 2012   11-250 employees    36250 |
                    4. | 2013   11-250 employees    35817 |
                    5. | 2014   11-250 employees    36909 |
                    6. | 2015   11-250 employees    38237 |
                    7. | 2016   11-250 employees    39471 |
                    8. | 2017   11-250 employees    41465 |
                    9. | 2018   11-250 employees    43223 |
                   10. | 2019   11-250 employees    43820 |
                       |----------------------------------|
                   11. | 2010     <=10 employees   224898 |
                   12. | 2011     <=10 employees   248481 |
                   13. | 2012     <=10 employees   238949 |
                   14. | 2013     <=10 employees   236654 |
                   15. | 2014     <=10 employees   240131 |
                   16. | 2015     <=10 employees   241254 |
                   17. | 2016     <=10 employees   243412 |
                   18. | 2017     <=10 employees   244437 |
                   19. | 2018     <=10 employees   245749 |
                   20. | 2019     <=10 employees   238561 |
                       |----------------------------------|
                   21. | 2010     >250 employees   623029 |
                   22. | 2011     >250 employees   579077 |
                   23. | 2012     >250 employees   556008 |
                   24. | 2013     >250 employees   544240 |
                   25. | 2014     >250 employees   546486 |
                   26. | 2015     >250 employees   569285 |
                   27. | 2016     >250 employees   589311 |
                   28. | 2017     >250 employees   620539 |
                   29. | 2018     >250 employees   642045 |
                   30. | 2019     >250 employees   674916 |
                       +----------------------------------+
                  . }
                  
                  . frame drop wanted
                  As you can see in the above example the majority of firms have less than 10 employees. So these results won't be true.

                  Comment


                  • #10
                    Can you show the results of the assert and isid commands? Perhaps you are counting firms with missing values in the "> 250 employees" category. You can start with:

                    Code:
                    frame put NPC_FIC year pempl if !missing(pempl), into(wanted)
                    if assert gives you an error message.

                    Comment


                    • #11
                      You are absolutely right. After using this command it worked!
                      Code:
                      . frame wanted{
                      .     gen which= cond(pempl<=10, "<=10 employees", cond(inrange(pempl, 11, 250), "11-250 employees", ">250 employees"))
                      .     contract which year, freq(count)
                      .     l, sepby(which)
                      
                           +----------------------------------+
                           | year              which    count |
                           |----------------------------------|
                        1. | 2010   11-250 employees    41212 |
                        2. | 2011   11-250 employees    39592 |
                        3. | 2012   11-250 employees    36250 |
                        4. | 2013   11-250 employees    35817 |
                        5. | 2014   11-250 employees    36909 |
                        6. | 2015   11-250 employees    38237 |
                        7. | 2016   11-250 employees    39471 |
                        8. | 2017   11-250 employees    41465 |
                        9. | 2018   11-250 employees    43223 |
                       10. | 2019   11-250 employees    43820 |
                           |----------------------------------|
                       11. | 2010     <=10 employees   224898 |
                       12. | 2011     <=10 employees   248481 |
                       13. | 2012     <=10 employees   238949 |
                       14. | 2013     <=10 employees   236654 |
                       15. | 2014     <=10 employees   240131 |
                       16. | 2015     <=10 employees   241254 |
                       17. | 2016     <=10 employees   243412 |
                       18. | 2017     <=10 employees   244437 |
                       19. | 2018     <=10 employees   245749 |
                       20. | 2019     <=10 employees   238561 |
                           |----------------------------------|
                       21. | 2010     >250 employees      906 |
                       22. | 2011     >250 employees      914 |
                       23. | 2012     >250 employees      855 |
                       24. | 2013     >250 employees      846 |
                       25. | 2014     >250 employees      872 |
                       26. | 2015     >250 employees      908 |
                       27. | 2016     >250 employees      962 |
                       28. | 2017     >250 employees     1026 |
                       29. | 2018     >250 employees     1084 |
                       30. | 2019     >250 employees     1132 |
                           +----------------------------------+
                      . }
                      
                      . frame drop wanted
                      But still, my problem remains.
                      Code:
                      tab firm_count, gen(num_firms)
                      
                      
                      tag(NPC_FIC |
                                ) |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                                0 |  2,258,162       80.42       80.42
                                1 |    549,865       19.58      100.00
                      ------------+-----------------------------------
                            Total |  2,808,027      100.00
                      As you can see, there are 549,865 unique firms so what I am looking for is how many of these firms have employees 10 or fewer ... The results are not according to 549,865. I am sorry maybe I am not able to express the question properly...

                      Comment


                      • #12
                        In some years, a firm may have 10 or fewer employees, while in others, it may have between 11 and 250 employees. Should the firm be counted twice? Remember, this is longitudinal data, not a snapshot at a single point in time.

                        Comment


                        • #13
                          Dear Andrew, in my dataset, the probability that a firm's workforce varies from 10 to 250 is really less possible if not impossible. That's why I made such a big rang. Should the firm be counted twice? NO

                          Comment


                          • #14
                            What is the rule if a firm falls into two or more categories during the sampled period? Should it be placed in the category in which it spent the longest time? And what if there is no most common category? Should ties be broken in favor of the highest or lowest category?

                            Comment


                            • #15
                              In this case, it can be counted twice or as many times that its employees' number changes. Another point is that what would be the firm-year level (not each year, in case we can't reach the unique firm count)?

                              Comment

                              Working...
                              X