Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generate new variable

    Hi Statalist,

    In my airline dataset, I have 2 variables: carrier & origin airport. I want to generate a variable (say X) that takes 1 if there is only one carrier in each airport. Moreover, it takes 1 if there is more than one carrier in each airport but more than 90% of that airport's observations are from only one carrier (x takes 1 in front of that carrier's observation).
    I would appreciate any insights.

    The following table is just an example. I want to know how to code this.
    Thanks,
    carrier origin
    2 2
    4 2
    2 2
    4 2
    4 2
    4 2
    3 2
    2 2
    2 2
    3 2
    2 2
    3 2
    1 2
    4 2
    3 3
    2 5
    2 5
    2 5
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    2 6
    3 7
    4 7
    3 7
    3 7
    3 7
    3 7
    3 7
    3 7
    3 7
    3 7
    3 7
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9
    2 9

  • #2
    Code:
    bys origin carrier: gen n = _N
    bys origin: gen n_total = _N
    gen share = n / n_total
    gen wanted = share > 0.9
    Code:
    . list carrier origin wanted
    
         +---------------------------+
         | carrier   origin   wanted |
         |---------------------------|
      1. |       1        2        0 |
      2. |       2        2        0 |
      3. |       2        2        0 |
      4. |       2        2        0 |
      5. |       2        2        0 |
         |---------------------------|
      6. |       2        2        0 |
      7. |       3        2        0 |
      8. |       3        2        0 |
      9. |       3        2        0 |
     10. |       4        2        0 |
         |---------------------------|
     11. |       4        2        0 |
     12. |       4        2        0 |
     13. |       4        2        0 |
     14. |       4        2        0 |
     15. |       3        3        1 |
         |---------------------------|
     16. |       2        5        1 |
     17. |       2        5        1 |
     18. |       2        5        1 |
     19. |       2        6        1 |
     20. |       2        6        1 |
         |---------------------------|
     21. |       2        6        1 |
     22. |       2        6        1 |
     23. |       2        6        1 |
     24. |       2        6        1 |
     25. |       2        6        1 |
         |---------------------------|
     26. |       2        6        1 |
     27. |       2        6        1 |
     28. |       2        6        1 |
     29. |       2        6        1 |
     30. |       3        7        1 |
         |---------------------------|
     31. |       3        7        1 |
     32. |       3        7        1 |
     33. |       3        7        1 |
     34. |       3        7        1 |
     35. |       3        7        1 |
         |---------------------------|
     36. |       3        7        1 |
     37. |       3        7        1 |
     38. |       3        7        1 |
     39. |       3        7        1 |
     40. |       4        7        0 |
         |---------------------------|
     41. |       2        9        1 |
     42. |       2        9        1 |
     43. |       2        9        1 |
     44. |       2        9        1 |
     45. |       2        9        1 |
         |---------------------------|
     46. |       2        9        1 |
     47. |       2        9        1 |
     48. |       2        9        1 |
     49. |       2        9        1 |
     50. |       2        9        1 |
         |---------------------------|
     51. |       2        9        1 |
     52. |       2        9        1 |
     53. |       2        9        1 |
     54. |       2        9        1 |
     55. |       2        9        1 |
         +---------------------------+

    Comment


    • #3
      Thank you very much, Fei! You helped a lot.

      I want to generate another variable (say Y) which takes 1 if the aggregate share of any 2 carriers (=sum of their shares) is greater than 90%. Is this doable?

      Comment


      • #4
        Originally posted by Saber Feizy View Post
        Thank you very much, Fei! You helped a lot.

        I want to generate another variable (say Y) which takes 1 if the aggregate share of any 2 carriers (=sum of their shares) is greater than 90%. Is this doable?
        Your request is not quite clear to me. For example, there are three carriers in an origin, and shares are 0.05, 0.07, and 0.88. Which carriers would be assigned 1?

        Maybe a more reasonable operation is to check if the largest two carriers sum up to a share greater than 90%?
        Last edited by Fei Wang; 13 Dec 2021, 00:13.

        Comment


        • #5
          Originally posted by Fei Wang View Post

          Your request is not quite clear to me. For example, there are three carriers in an origin, and shares are 0.05, 0.07, and 0.88. Which carriers would be assigned 1?

          Maybe a more reasonable operation is to check if the largest two carriers sum up tp a share greater than 90%?
          Yes, that's right. The sum of the first two leading carriers: the 0.88 and the 0.07 one, in that case!

          Comment


          • #6
            If there are ties for the second largest carriers, all such carriers are assigned 1.

            Code:
            bys origin carrier: gen n = _N
            bys origin: gen n_total = _N
            gen share = n / n_total
            
            gsort origin -share carrier
            by origin: gen group = sum(carrier!=carrier[_n-1])
            
            gen max1 = share if group == 1
            gen max2 = share if group == 2
            bys origin (max1): replace max1 = max1[1] if mi(max1)
            bys origin (max2): replace max2 = max2[1] if mi(max2)
            
            gen wanted = 1 if max1 + max2 > 0.9 & inlist(share, float(max1), float(max2))

            Comment


            • #7
              As a footnote to @Fei Wang's helpful answer note that

              Code:
              gen max1 = share if group == 1
              gen max2 = share if group == 2
              bys origin (max1): replace max1 = max1[1] if mi(max1)
              bys origin (max2): replace max2 = max2[1] if mi(max2)
              could also be written as


              Code:
               
              egen max1 = max(cond(group == 1, share, .)), by(origin) 
              egen max2 = max(cond(group == 2, share, .)), by(origin)

              Comment

              Working...
              X