Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gen Variable: Percentage Share (Row)

    Dear Statalisters,

    I'd like to calculate the percentage share of "experience" a member has in its project. I have prepared the following example data in dataex:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    * dataex project time exp_member1 exp_member2 exp_member3 exp_member4 exp_member5 exp_member6
    clear
    input str4 project byte(time exp_member1 exp_member2 exp_member3 exp_member4 exp_member5 exp_member6)
    "ava"  3 1 2 3 1 . .
    "ava"  4 2 3 4 2 . .
    "ava"  5 3 4 5 3 . .
    "neta" 1 1 1 . . . .
    "neta" 2 2 2 . . . .
    "beta" 4 4 3 4 2 4 4
    "beta" 5 5 4 5 3 5 5
    "beta" 6 6 5 6 4 6 6
    end
    A team consists of up to 6 team members (most of the times there are less than 6). I'd like to generate 6 variables that indicate the proportions of experience each member has in his/her team. For example for project "Ava" at time=3:
    expprop_member1 = 0.14
    expprop_member2 = 0.29
    expprop_member3 = 0.43
    expprop_member4 = 0.14
    expprop_member5 = .
    expprop_member6 = .

    Is there an easy way to do this? I have a total of approximately 12,900 observations and am using Stata 16.

    Thank you very much for your advice in advance.

    Maja

  • #2
    The data layout seems ill-advised for most Stata purposes. So, you want 6 more variables. What is your strategy for dealing with them? As you are asking this, the implication is that you are not comfortable with loops, which is sooner or later essential technique for what you are trying. Will the next task need six more variables?

    I would reshape long straight away and stay long. If you really need a wide version of this dataset as well, then reshape wide back again.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    * dataex project time exp_member1 exp_member2 exp_member3 exp_member4 exp_member5 exp_member6
    clear
    input str4 project byte(time exp_member1 exp_member2 exp_member3 exp_member4 exp_member5 exp_member6)
    "ava"  3 1 2 3 1 . .
    "ava"  4 2 3 4 2 . .
    "ava"  5 3 4 5 3 . .
    "neta" 1 1 1 . . . .
    "neta" 2 2 2 . . . .
    "beta" 4 4 3 4 2 4 4
    "beta" 5 5 4 5 3 5 5
    "beta" 6 6 5 6 4 6 6
    end
    
    reshape long exp_member, i(project time) j(member)
    
    rename exp_member exp 
    
    drop if missing(exp) 
    
    bysort project time : gen expsum = sum(exp)
    by project time : replace expsum = expsum[_N]
    gen expprop = exp/expsum 
    
    list, sepby(project time)
    
         +---------------------------------------------------+
         | project   time   member   exp   expsum    expprop |
         |---------------------------------------------------|
      1. |     ava      3        1     1        7   .1428571 |
      2. |     ava      3        2     2        7   .2857143 |
      3. |     ava      3        3     3        7   .4285714 |
      4. |     ava      3        4     1        7   .1428571 |
         |---------------------------------------------------|
      5. |     ava      4        1     2       11   .1818182 |
      6. |     ava      4        2     3       11   .2727273 |
      7. |     ava      4        3     4       11   .3636364 |
      8. |     ava      4        4     2       11   .1818182 |
         |---------------------------------------------------|
      9. |     ava      5        1     3       15         .2 |
     10. |     ava      5        2     4       15   .2666667 |
     11. |     ava      5        3     5       15   .3333333 |
     12. |     ava      5        4     3       15         .2 |
         |---------------------------------------------------|
     13. |    beta      4        1     4       21   .1904762 |
     14. |    beta      4        2     3       21   .1428571 |
     15. |    beta      4        3     4       21   .1904762 |
     16. |    beta      4        4     2       21   .0952381 |
     17. |    beta      4        5     4       21   .1904762 |
     18. |    beta      4        6     4       21   .1904762 |
         |---------------------------------------------------|
     19. |    beta      5        1     5       27   .1851852 |
     20. |    beta      5        2     4       27   .1481481 |
     21. |    beta      5        3     5       27   .1851852 |
     22. |    beta      5        4     3       27   .1111111 |
     23. |    beta      5        5     5       27   .1851852 |
     24. |    beta      5        6     5       27   .1851852 |
         |---------------------------------------------------|
     25. |    beta      6        1     6       33   .1818182 |
     26. |    beta      6        2     5       33   .1515152 |
     27. |    beta      6        3     6       33   .1818182 |
     28. |    beta      6        4     4       33   .1212121 |
     29. |    beta      6        5     6       33   .1818182 |
     30. |    beta      6        6     6       33   .1818182 |
         |---------------------------------------------------|
     31. |    neta      1        1     1        2         .5 |
     32. |    neta      1        2     1        2         .5 |
         |---------------------------------------------------|
     33. |    neta      2        1     2        4         .5 |
     34. |    neta      2        2     2        4         .5 |
         +---------------------------------------------------+

    Comment


    • #3
      Hi Nick,

      thank you very much for this solution. You're right, reshaping the dataset makes it far easier and it's about time to get familiar with loops.

      Really appreciate your advice and quick response!

      Comment

      Working...
      X