Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • City Industry leader

    Hello,
    I am trying to create the following dummy variable:

    dummy variable that takes the value of 1 (and 0 otherwise) if an audit firm is the number one auditor in terms of aggregated audit fees in a specific year (a minimum of 2 observations in required.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 var1 str10 var2 str11 var3 str4 var4 str9 var5
    "city" "Audit firm" "client code" "year" "audit fee"
    "Antwerp" "EY" "A" "2020" "1000"
    "Antwerp" "EY" "B" "2020" "200"
    "Brussels" "EY" "C" "2019" "500"
    "Paris" "Deloitte" "D" "2013" "2000"
    "Paris" "Deloitte" "E" "2013" "1000"
    "Vienna" "Deloitte" "F" "2014" "100"
    "Vienna" "Deloitte" "G" "2014" "75"
    end
    [/CODE]
    ------------------ copy up to and including the previous line ------------------

    Listed 8 out of 8 observations


    Expected Outcome:
    city Audit firm client code year audit fee Industry leader
    Antwerp EY A 2020 1000 1
    Antwerp EY B 2020 200 0
    Brussels EY C 2019 500 0
    Paris Deloitte D 2013 2000 1
    Paris Deloitte E 2013 1000 0
    Vienna Deloitte F 2014 100 1
    Vienna Deloitte G 2014 75 0


    Thank you in advance!







  • #2
    I had to work a bit on the label of the variables because your dataex treated first row with variables labels.

    Anyway:
    Code:
    destring year, replace
    destring audit_fee, replace
    egen rank_fee = rank(audit_fee), by(year)
    gen dummy_fee=1 if rank_fee==1
    replace dummy_fee=0 if dummy_fee==.
    Here you have two variables, the dummy that you requested and also the rank, by year.
    When it comes to cases like your one the function -rank- is really helpful and I use it before to create dummies

    Comment


    • #3
      Your code can be trimmed down:

      Code:
      destring year audit_fee, replace
      egen rank_fee = rank(audit_fee), by(year)
      gen dummy_fee = rank_fee==1
      The help for destring shows that you can specify a varlist.

      Most explanations of how to generate indicator variables underline that true-or-false statements evaluate directly to 1 or 0, for example

      https://www.stata.com/support/faqs/d...mmy-variables/

      https://www.stata-journal.com/articl...article=dm0099

      https://www.stata.com/support/faqs/d...rue-and-false/



      Comment


      • #4
        Thanks, Nick Cox !

        Comment


        • #5
          Thank you @Marco Errico and @Nick Cox, Thats very useful

          In case I want to add more details to the dataset with the type of industry and country, how will this code change?



          Reminder that the dummy variable that takes the value of 1 (and 0 otherwise) if an audit firm is the number one auditor in a spcific year and for a specific industry in terms of aggregated audit fees in a specific year (a minimum of 2 observations in required.



          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str8 city str10 Audit_firm str11 client_code str4 year str9 audit_fee str7 Country str13 Industry
          "Antwerp"  "EY"       "A" "2020" "1000" "Belgium" "Services"    
          "Antwerp"  "EY"       "B" "2020" "200"  "Belgium" "Services"    
          "Brussels" "EY"       "C" "2019" "500"  "Belgium" "Banking"      
          "Paris"    "Deloitte" "D" "2013" "2000" "France"  "Manufacturing"
          "Paris"    "Deloitte" "E" "2013" "1000" "France"  "Manufacturing"
          "Vienna"   "Deloitte" "F" "2014" "100"  "Austria" "Banking"      
          "Vienna"   "Deloitte" "G" "2014" "75"   "Austria" "Banking"      
          end
          ------------------ copy up to and including the previous line ------------------

          Listed 7 out of 7 observations




          Expected Outcome:
          city Audit_firm client_code year audit_fee Country Industry rank Industry leader dummy variable
          Antwerp EY A 2020 1000 Belgium Services 1 1
          Antwerp EY B 2020 200 Belgium Services 2 0
          Brussels EY C 2019 500 Belgium Banking 1 0
          Paris Deloitte D 2013 2000 France Manufacturing 1 1
          Paris Deloitte E 2013 1000 France Manufacturing 2 0
          Vienna Deloitte F 2014 100 Austria Banking 1 1
          Vienna Deloitte G 2014 75 Austria Banking 2 0
          Thanking you in advance
          Last edited by Maysam Ayoub; 25 Dec 2021, 12:02.

          Comment


          • #6
            Dear Mays Ay
            Sorry for getting back late.
            Regarding your question is not clear whether you want a rank in audit_fee by combination of industry and year. Therefore, I created some code with unique combination of country and indutry, by using the -group- function of -egen- command. See below
            Code:
            egen country_industry=group(Country Industry)
            egen count=count(country_industry), by(country_industry)
            drop if count<2
            drop count
            destring year, replace
            destring audit_fee, replace
            egen rank_fee = rank(audit_fee), by (year)
            gen dummy_fee = rank_fee==1
            One thing that I have to clarify is the following. It seems that you want at least 2 observations in the combination country-industry in order to be considered for the rank.
            Here I'm tackling the issue simply dropping those country-industry that don't have the requirement you specified.
            But I don't know whether is irrelevant to you to drop those observation

            Comment


            • #7
              Dear Marco Errico
              Thank you so much!
              It works perfectly now, i just did few changes. I made sure not to drop those observations with less than two observations
              .
              I also learnt to add a minus for the rank so that its in descending order.


              egen city_industry_year=group(city Industry year)
              egen count=count(city_industry_year), by(city_industry_year)
              destring year, replace
              destring audit_fee, replace
              egen rank_fee = rank(-audit_fee), by (year)
              gen dummy_fee = rank_fee==1
              gen industry_leader = dummy_fee ==1 & count>=2








              Comment

              Working...
              X