Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing while creating fraction with condition

    Hi all,

    I am working on my MSc. Thesis and there is a command that I do not know how to create. Basically my problem is as follows:

    BACKGROUND:
    I have employee-level data, and I need to collapse it into firm-level data, in order to merge it with some firm-level financial variables. My (simplified) data looks like this:
    Employee Gender CompanyID Year NumberOfJobs
    A 1 1 2.000 3
    A 1 1 2.001 2
    B 1 1 2.000 2
    B 1 1 2.001 1
    C 0 2 2.000 1
    C 0 2 2.001 2
    Where Gender is a dummy (0 for male employee, 1 for female employee).
    Please note that every company has a different number of employees.

    The data that I need after collapsing is as follows:
    CompanyID Year MultiemployedFemales
    1 2.000 1
    1 2.001 1
    2 2.000 0
    2 2.001 0
    Basically, I need to collapse while doing the following calculation:
    • Under the new variable MultiemployedFemales (I gave it a random name, it can be any) I need the average number of simultaneous jobs that the female employees have in each company. So, only the average of the female employees, ignoring the male employees.
      • In the first line of the second table, it would be calculated as: 2/2 =1, because the 2 employees at firm 1 are female.
    Currently, I am using the following code to collapse:
    collapse (mean) NumberOfJobs , by(CompanyID, Year)
    list, clean

    But this code gives me the average of all employees in the company, irrespective of their gender.

    Could this be done?

    Thank you in advance!





    Last edited by carla Oth; 30 Apr 2022, 13:34.

  • #2
    Here is the data from post #1 given as dataex output immediately usable in Stata, for the convenience of anyone who wants to address this problem.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 employee byte(gender companyid) int year byte numberofjobs
    "A" 1 1 2000 3
    "A" 1 1 2001 2
    "B" 1 1 2000 2
    "B" 1 1 2001 1
    "C" 0 2 2000 1
    "C" 0 2 2001 2
    end

    Comment


    • #3
      What you say you want in words and your calculations are not consistent with each other. So I'm confused whether you want the average number of jobs held by females in each company in each year, or if you want the proportion of females in each company and year who have more than one job.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str2 employee byte(gender companyid) float year byte numberofjobs
      "A " 1 1 2000 3
      "A " 1 1 2001 2
      "B " 1 1 2000 2
      "B " 1 1 2001 1
      "C " 0 2 2000 1
      "C " 0 2 2001 2
      end
      
      //  IF YOU WANT AVERAGE NUMBER OF JOBS PER EMPLOYED FEMALE
      drop if gender != 1
      collapse (mean)wanted = numberofjobs, by(companyid year)
      
      //  IF YOU WANT THE PROPORTION OF FEMALE EMPLOYEES WITH MULTIPLE JOBS
      gen byte multiple_jobs = (numberofjobs > 1) if !missing(numberofjobs) & gender == 1
      collapse (mean) wanted = multiple_jobs, by(companyid year)
      Note; You can't run both of the "paragraphs" of code I have shown because the first -collapse- eliminates data needed for the second. So it's one or the other. If you actually want both, you can combine them. Post back for additional help if needed.

      In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2. Thanks!

      Comment


      • #4
        Perhaps the following example using your data will start you in a useful direction.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 employee byte(gender companyid) int year byte numberofjobs
        "A" 1 1 2000 3
        "A" 1 1 2001 2
        "B" 1 1 2000 2
        "B" 1 1 2001 1
        "C" 0 2 2000 1
        "C" 0 2 2001 2
        end
        label define GENDER 0 "male" 1 "female"
        label values gender GENDER
        order companyid year gender employee
        sort companyid year gender employee
        generate multiemployed = numberofjobs>1
        list, abbreviate(16) sepby(companyid year)
        collapse (mean) numberofjobs multiemployed, by(companyid year gender)
        list, abbreviate(16)
        Code:
        . label define GENDER 0 "male" 1 "female"
        
        . label values gender GENDER
        
        . order companyid year gender employee
        
        . sort companyid year gender employee
        
        . generate multiemployed = numberofjobs>1
        
        . list, abbreviate(16) sepby(companyid year)
        
             +---------------------------------------------------------------------+
             | companyid   year   gender   employee   numberofjobs   multiemployed |
             |---------------------------------------------------------------------|
          1. |         1   2000   female          A              3               1 |
          2. |         1   2000   female          B              2               1 |
             |---------------------------------------------------------------------|
          3. |         1   2001   female          A              2               1 |
          4. |         1   2001   female          B              1               0 |
             |---------------------------------------------------------------------|
          5. |         2   2000     male          C              1               0 |
             |---------------------------------------------------------------------|
          6. |         2   2001     male          C              2               1 |
             +---------------------------------------------------------------------+
        
        . collapse (mean) numberofjobs multiemployed, by(companyid year gender)
        
        . list, abbreviate(16)
        
             +----------------------------------------------------------+
             | companyid   year   gender   numberofjobs   multiemployed |
             |----------------------------------------------------------|
          1. |         1   2000   female            2.5               1 |
          2. |         1   2001   female            1.5              .5 |
          3. |         2   2000     male              1               0 |
          4. |         2   2001     male              2               1 |
             +----------------------------------------------------------+

        Comment

        Working...
        X