
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a variable based on status and industry membership

    Dear Statalist,

    I have a categorical variable, status, which takes the value of 1, 2 or 3 for mandatory, voluntary and non-disclosure respectively. I also have another variable, industry, indicating which industry the firm belongs to (from 1 to 13). I want to create a dummy variable which = 1 if status==2 and the firm is in the same industry as other firms with status==3, and 0 otherwise. Or alternatively, I could use a categorical variable which = 1 if status==2 and the firm is in the same industry as other firms with status==3, and = 2 if status==3 and the firm is in the same industry as other firms with status==2. I don't know if this is possible, and if so, how to write the code for them.

    Thanks a lot in advance for any kind help!

  • #2
    Hello Flora. Here is my understanding of your question.


    industry: values 1-13

    You want to do this:
    generate byte flag = status==2 if {some condition is true}
    You said that the condition = "same industry as other firms with status==3".

    If I follow, you want to generate a list of industry codes for industries that have at least one firm with status==3. Is that right?

    If it is right, then perhaps you want something like this?

    generate byte flag = status==2 if inlist(industry,a,b,c,...)
    ...replacing a,b,c, etc with the relevant industry codes.

    In addition to indicating whether I understood correctly or not, please use -dataex- to provide a small sample dataset. (See the FAQ for details about how to do that.)


    Bruce Weaver
    Email: [email protected]
    Version: Stata/MP 18.5 (Windows)


    • #3
      as an aside to Bruce's helpul reply, check if at least a part of your query can be satisfied by the -group- function available from -egen-.
      Kind regards,
      (StataNow 18.5)


      • #4
        Hi Bruce and Carlo,

        Thanks a lot for your reply. Here is an example of my data, for simplicity, I only include industries 1-3, status 2 (voluntary) and 3 (non-disclosure), and year 2009. Symbol is used to identify each individual firm. In my original data, there are 11 years (2009-2019) and 13 industries (1-13).

        I want to create a dummy variable which =1 if the firm has status==2 and at least one of its industry peer firms (other firms in the same industry) in the same year is with status==3; and 0 otherwise. I realize I do need to specify "at least one" industry peer, otherwise it would be misleading.

        input float Year long(Symbol industry status)
        2009   2286  3 3
        2009   2013  2 3
        2009   2287  1 2
        2009   2020  1 3
        2009   2170  3 3
        2009   2321  1 3
        2009   2172  2 2
        2009   2045  2 3
        2009   2097  3 2
        2009   2114  3 3
        2009   2053  1 2
        2009   2019  1 2
        2009   2067  2 2
        2009   2006  3 2


        • #5
          you may want to try:
          . bysort industry Year: gen wanted=1 if status==2 | status==3
          . bysort industry Year: gen wanted=1 if status==2 | status==3
          . replace wanted=0 if status!=2
          . list
               | Year   Symbol   industry   status   wanted |
            1. | 2009     2020          1        3        0 |
            2. | 2009     2287          1        2        1 |
            3. | 2009     2321          1        3        0 |
            4. | 2009     2019          1        2        1 |
            5. | 2009     2053          1        2        1 |
            6. | 2009     2013          2        3        0 |
            7. | 2009     2045          2        3        0 |
            8. | 2009     2172          2        2        1 |
            9. | 2009     2067          2        2        1 |
           10. | 2009     2170          3        3        0 |
           11. | 2009     2006          3        2        1 |
           12. | 2009     2286          3        3        0 |
           13. | 2009     2097          3        2        1 |
           14. | 2009     2114          3        3        0 |
          Kind regards,
          (StataNow 18.5)


          • #6
            Thanks Carlo that works perfectly!


            • #7
              happy with reading that it works for you.
              Due to a copy and paste mishap, the first line of the code was reported twice (now amended). Sorry for that.
              Kind regards,
              (StataNow 18.5)


              • #8
                @Carlo Lazzaro's code can be simplified to

                gen wanted = status==2 
                and that says nothing about the status of any other firms.

                This may get closer to what you want.

                bysort industry year : egen any3 = max(status == 3)
                gen wanted = status == 2 & any3
                given that (as stated in #4 but not #1) you want comparisons to be for the same industry in the same year. If you wanted to look across years, just don't mention year in the code.

                See also for the "any" technique here.

                This would also work

                gen any3 = status == 3
                bysort industry year (any3) : replace any3 = any3[_N]


                • #9
                  Thanks Nick they both work!


                  • #10
                    If they both work that is because empirically there is always a form of status 3 whenever there is one of status 2 in the same industry and year. As said, the code in #5 pays no attention to any other firm.


                    • #11
                      Yes exactly... There are always both status 2 and 3 in the same industry in a year.
                      Last edited by Flora Yin; 12 Feb 2022, 08:54.

