Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of C-level executives at a company

    Dear community,
    For an analysis I need to count the number of executives in a dataset with a couple hundred companies.
    I have given four columns: A company ID number; the name of an executive at the company; the entry date and in the fourth column the day they left. What I would need is the number of executives each company is having during each year within the last 10 years but I'm a bit stuck.

    Help or Ideas would be very much welcomed :-)
    (Using Stata 17 on the Mac)

  • #2
    Very often a simple description really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what has been shared. When that happens other members will decide not to answer the question, or ask for an improved presentation that could have been provided to begin with.

    Please help us help you. Show example data. If you have run a command that didn't work, show your code and show us what Stata told you. The Statalist FAQ provides advice on effectively posing your questions, posting data, and sharing Stata output.

    You will find that the time spend preparing a well-formed question will often pay off in a quicker answer that requires less followup.

    Comment


    • #3
      Thank you a lot for your quick response, I tried to come up with a first idea in the meantime to have some kind of code to display in order to make my problem more tangible.
      I now reduced my analysis to just figuring out if at specific reporting dates more than 1 executives have been listed for the same company.

      the code I tried is based on the four columns I startet with (manager_name, CompanyID, beginn, end)

      Code:
      sort CompanyId beginn
      gen number_managers = 1
      gen date1 = "01-01-06"
      generate report_date1 = date(date1, "MD20Y")
      format report_date1 %td
      
      forvalues i = 2/25971 {
        if (CompanyId[`i'] == CompanyId[`i'-1]) & (begin[`i'] < end[`i'-1]) & (begin[`i'] <= report_date1) & (report_date1 < end[`i']) {
          replace number_managers = number_managers[`i'-1] + 1
        }
        if n_managers > 1 {
          display "not_solo"
        }
      }
      However, when I try to run I get invalid syntax
      r(198);

      What do I need to adjust - or what different code would I try in order to get the information if the company was run by one or more than one executive at a given date?

      Thanks for the help!

      Comment


      • #4
        Are "begin" and "beginn" supposed to be the same variable?

        Are "n_managers" and "number_managers" supposed to be the same?

        Note that if you add:
        Code:
        set trace on
        set tracedepth 1
        Stata will show the exact line within a forvalues loop that is generating the error.

        Comment


        • #5
          Thank you for the reply!
          Yes, 'begin' and 'beginn' as well as 'n_managers' and 'number_managers' are supposed to be the the same, sorry for that! (I was typing the code again instead of copy-pasting it)

          I just added the code you suggested and retried again and this time there was no invalid syntax error. However now the number of manager in the column is 7568 everywhere now... Any Idea if there is a workaround to have the number of managers variable working correctly?

          using
          Code:
          replace n_managers[`i'] == n_managers[`i'-1] + 1
          results in an weights not allowed error

          Comment


          • #6
            Code:
            replace n_managers == n_managers[`i'-1] + 1 in `i'
            because subscripts are not allowed on the left side of the equal sign in generate and replace.

            But really, the loop is not necessary. This loop
            Code:
            sort CompanyId begin
            forvalues i = 2/25971 {
              if (CompanyId[`i'] == CompanyId[`i'-1]) & (begin[`i'] < end[`i'-1]) & (begin[`i'] <= report_date1) & (report_date1 < end[`i']) {
                replace number_managers = number_managers[`i'-1] + 1
              }
            }
            can be written as
            Code:
            sort CompanyId begin
            replace number_managers = number_managers[_n-1] + 1 if (CompanyId == CompanyId[_n-1]) & (begin < end[_n-1]) & (begin <= report_date1) & (report_date1 < end)
            or even
            Code:
            bysort CompanyId (begin): replace number_managers = number_managers[_n-1] + 1 if (_n>1) & (begin < end[_n-1]) & (begin <= report_date1) & (report_date1 < end)

            Comment


            • #7
              William Lisowski has made the main point and this post is just to add a detail.

              Note that not only is no loop needed here, but also

              Code:
               if number_managers > 1 {     display "not_solo"   }
              would not do what you want. It's not obvious without experiment but this would get interpreted as referring to number_managers[1] so you need a subscript too. The loop machinery does not guarantee that Stata will look at the current observation. Beyond that, even when fixed you would just get a series of messages
              Code:
              not solo not solo  not solo
              and so on, so it best to take the command out of the loop and just use other commands to look at results afterwards.

              Comment


              • #8
                Thank you both for your help!

                Both of the codes in #6 are running without any errors.
                The only remaining problem, as already anticipated by Nick Cox, is that to distinguish between companies that are managed by a single person and companies that have several executive managers employed at the same time, I would need a extra variable that eventually provides a reliable indication whether there was more than one or exactly one executive working at the company on that date.

                Is it possible to tell Stata to show me of each CompanyID only the rows with the max number_managers because currently, Stata assigns both, the first manager of a group and the single manager the value 1 in the number_managers column, so I can't make a final distinction "if n_manager >= 2 then company I is managed by a group of executives"

                Is there a way to add something that eventually gives a reliable statement 'there was more than one (exactly one) executive working at the company on that date'?

                Comment


                • #9
                  I imagine what you want is quite straightforward to code but it's hard without a concrete data example. Please see especially https://www.statalist.org/forums/help#stata in the FAQ Advice. We just need to see a few companies and the variables mentioned in #1. If the data are confidential a realistic fake will serve.

                  See also https://www.stata-journal.com/articl...article=dm0068

                  Comment


                  • #10
                    You have yet to respond to the request in post #2 above.

                    Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It is particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

                    Descriptions of data are well-meant but insufficient to help those who want to help you. Even the best descriptions of data are no substitute for a well chosen example example of the data. In order to get a helpful response, you need to show some example data.

                    Be sure to use the dataex command to do this. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                    When asking for help with code, always show example data. When showing example data, always use the dataex command.

                    Comment


                    • #11
                      Thanks again for the quick and helpful response.


                      I read through the Stata FAQs in more detail and hope that the following data excerpt will help to get an Idea of the dataset I am dealing with.

                      Please note that this is the 'unadjusted' data, before I startet working on it so here even the dates are not yet in Stata format and the report_date and number_managers variables are not added and unnecessary variables like country and education are still included.


                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input str19 executive_name str10(begin end) str8 companyid str3 country float education
                      "Herbert E. Ehlers"   "1993-02-01" "1995-01-01" "VKN005FM" "UK" 3
                      "Gerald Levine"       "1986-07-25" "1995-01-01" "VKN0059I" "UK" 3
                      "Victor J. Melone"    "1993-07-01" "1995-01-01" "VKN0057T" "UK" 3
                      "Robert S. Salomon"   "1987-09-01" "1995-01-01" "VKN0001S" "UK" 3
                      "Donald A. Yacktman"  "1993-02-01" "1995-01-01" "VKN0000N" "UK" 3
                      "Andrew R. Midler"    "1991-12-01" "1995-01-01" "VKN001X8" "UK" 3
                      "Byron B. Snider"     "1989-08-01" "1995-01-01" "VKN0019H" "UK" 3
                      "Michael R. Peers"    "1993-10-08" "1995-01-01" "VKN002BS" "UK" 3
                      "James J. LaTorre"    "1987-10-21" "1995-01-01" "VKN000SA" "UK" 3
                      "Gordon Fines"        "1987-10-21" "1995-01-01" "VKN0029W" "UK" 3
                      "James Pollock"       "1994-01-01" "1995-01-01" "VKN001AT" "UK" 3
                      "Jim Hering"          "1993-10-01" "1995-01-01" "VKN001AT" "UK" 3
                      "Andrew R. Midler"    "1994-01-03" "1995-01-01" "VKN005NM" "UK" 3
                      "William Trimbur"     "1993-06-30" "1995-01-01" "VKN0000V" "UK" 3
                      "Carol R. Miller"     "1994-01-01" "1995-01-01" "VKN0000T" "UK" 3
                      "Robert S. Salomon"   "1994-06-14" "1995-01-01" "VKN005F8" "UK" 3
                      "Linda Greenburg"     "1990-05-01" "1994-01-01" "VKN06UKA" "UK" 3
                      "Martin A. Weisberg"  "1985-01-01" "1994-01-01" "VKN001ZO" "UK" 3
                      "Harvey M. Salkin"    "1988-10-31" "1994-02-01" "VKN001DV" "UK" 3
                      "Otto G. Hinzmann"    "1991-11-01" "1994-02-10" "VKN001DV" "UK" 3
                      "Jo Ann Lyndon"       "1983-01-01" "1994-03-01" "VKN006C3" "UK" 3
                      "Goli Hohenberg"      "1990-08-21" "1994-03-31" "VKN006C2" "UK" 3
                      "James S. McClure"    "1991-04-02" "1994-04-01" "VKN006BD" "UK" 3
                      "Sara H. Brandaleone" "1961-09-28" "1994-04-01" "VKN0063S" "UK" 3
                      "Lawrence Luchini"    "1989-08-01" "1994-04-01" "VKN005ZK" "UK" 3
                      "Edwin W. Bragdon"    "1990-01-12" "1994-04-01" "VKN005Y0" "UK" 3
                      "A. Walker Martin"    "1991-12-01" "1994-04-01" "VKN005RR" "UK" 3
                      "Robert K. Jermain"   "1991-12-01" "1994-04-01" "VKN005NM" "UK" 3
                      "Kelly McDermott"     "1990-01-01" "1994-04-29" "VKN005K7" "UK" 3
                      "Varilyn K. Schock"   "1987-04-20" "1994-05-01" "VKN005K7" "UK" 3
                      "Ben A. Hock"         "1989-11-01" "1994-05-01" "VKN005JU" "UK" 3
                      "C. Beth Cotner"      "1991-10-01" "1994-05-01" "VKN005JU" "UK" 3
                      "Bill McKee"          "1991-10-01" "1994-05-01" "VKN005JR" "UK" 3
                      end

                      (The original Dataset has ~50.000 observations)

                      Comment


                      • #12
                        Sorry to join the thread at this late stage, but now that example data has been made available, I see the question posed in #1 as ambiguous in a few ways.
                        1. When you refer to "the number of executives each company is having" in a given year, do you mean the number of people who are executives at some point during the year? Or do you mean the maximum number of people who were executives on any given date in that year. So, for example, if Mr. A, Ms. B, and Ms. C are executives at Firm X from January 1 through September 30, and then Ms. C leaves and is replaced by Mr. D, then there are four different people who have been executives that year, but there were never more than three in place at any time. So is it four or three you want?
                        2. If an executive leaves a firm at some point during the year and then returns later in that year, do you want to count that executive twice or only once?
                        3. I notice that many of the end dates are on January 1 of year, so that person is in the firm for only one day of the year. And especially since most people are not particularly productive on their final day at any job, I wonder if you want to count them in that year or ignore them.
                        4. Similarly (and this may be irrelevant depending on your answer to 1), if Ms. C ends on October 1 and Mr. D starts on October 1, do you want to count them both as being present concurrently?
                        5. Does "the last ten years" mean the current year and the nine preceding ones, or does it mean starting from the preceding year and going back to ten years earlier?
                        Finally, a concern about the data. Are you absolutely certain that the names of the executives are used consistently throughout. For example, you have a "Byron B. Snyder." Are you certain that there is no other observation in the data that refers to this same person as "Byron Snyder" or "B. Snyder" or has some misspelled version of his name?

                        Comment


                        • #13
                          Thank you for pointing out the lack of clarity.
                          I adjusted the depth of the analysis a bit so that now, for the sake of less complexity, it would be sufficient to see if at a specific date (e.g. 12-31-1994) one or more than one executive was running the company.
                          (I would check for robustness by testing if results are sufficiently similar if I take another reporting date during that year (e.g. 06-30-1994))

                          1. So "the number of executives each company is having" just need to reflect the number of executives at a specific date. (max. three in your example as there were never more than three at once)
                          2. In this case the executive should only be counted once.
                          3. I would refrain from setting the reporting date on the 1st of January, but generally speaking if they are only active one day of the year, they should be ignored.
                          4. If the specific date would be the day where the change is happening it should not be counted as two managers (so start of manager C < end of manager B, instead of '<=' )
                          5. The 'last ten years' was formulated quite misleading. It is just a ~10 year period I want to analyse so the first reporting date would be in 1994 and the last in 2005.

                          The names should be consistent but it is possible that there is more that one person called Byron Snyder so it can not serve as a unique identifier. However, I like to get the number of executives at one company in a specific year and the company IDs are unique identifiers.

                          I hope it this explanation helps.

                          Comment


                          • #14
                            The link at the end of #9 is important, I venture to suggest. I don't think these problems are at all easy with your present data structure.

                            Comment


                            • #15
                              I'm not sure this does everything you want, but it should be a good starting point.
                              Code:
                              list if companyid=="VKN005NM", noobs abbreviate(12)
                              generate seq = _n, before(executive_name)
                              generate time1 = daily(begin,"YMD"), after(begin)
                              generate time2 = daily(end,"YMD"), after(end)
                              format %td time1 time2
                              drop begin end
                              reshape long time, i(seq) j(value)
                              replace value = -1 if value==2
                              order value, last
                              drop seq
                              bysort companyid (time value): generate c = sum(value)
                              bysort companyid time: egen count = min(c)
                              drop c
                              list if companyid=="VKN005NM", noobs abbreviate(12)
                              // create yearly  summary
                              generate year = yofd(time)
                              encode companyid, gen(cid)
                              collapse (max) count, by(cid year)
                              xtset cid year
                              tsfill 
                              bysort cid (year): replace count = L.count if missing(count)
                              list if cid=="VKN005NM":cid, noobs abbreviate(12)
                              Code:
                              . list if companyid=="VKN005NM", noobs abbreviate(12)
                              
                                +-------------------------------------------------------------------------------+
                                |    executive_name        begin          end   companyid   country   education |
                                |-------------------------------------------------------------------------------|
                                |  Andrew R. Midler   1994-01-03   1995-01-01    VKN005NM        UK           3 |
                                | Robert K. Jermain   1991-12-01   1994-04-01    VKN005NM        UK           3 |
                                +-------------------------------------------------------------------------------+
                              Code:
                              . list if companyid=="VKN005NM", noobs abbreviate(12)
                              
                                +---------------------------------------------------------------------------------+
                                |    executive_name        time   companyid   country   education   value   count |
                                |---------------------------------------------------------------------------------|
                                | Robert K. Jermain   01dec1991    VKN005NM        UK           3       1       1 |
                                |  Andrew R. Midler   03jan1994    VKN005NM        UK           3       1       2 |
                                | Robert K. Jermain   01apr1994    VKN005NM        UK           3      -1       1 |
                                |  Andrew R. Midler   01jan1995    VKN005NM        UK           3      -1       0 |
                                +---------------------------------------------------------------------------------+
                              Code:
                              . list if cid=="VKN005NM":cid, noobs abbreviate(12)
                              
                                +-------------------------+
                                | year        cid   count |
                                |-------------------------|
                                | 1991   VKN005NM       1 |
                                | 1992   VKN005NM       1 |
                                | 1993   VKN005NM       1 |
                                | 1994   VKN005NM       2 |
                                | 1995   VKN005NM       0 |
                                +-------------------------+

                              Comment

                              Working...
                              X