Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Only use selected observations

    Hi everybody,

    I got quite a large dataset (700K observations and around 20 variables), but I'm only interested in a small part of this. I combined firm financial specifics (compustat) and director specifics (ISS) and I'm trying to select only firms that appointed directors in two or more different years.

    I'm a complete beginner with STATA, but I tried identifying firms by CUSIP code and director appointments by the year a director started (directorsince). However, I can't figure out the commands for selecting only this subset of my database. I could be totally wrong, but I think I need a command that does something similar as:

    only keep observations if directorsince is for at least two different years and the same CUSIP code (firm)


    Hopefully I described my problem clear enough for someone to be able to help me!





  • #2
    Rick - can you create an additional binary variable which could indicate what you want it to - what format does the 'directorsince' variable take, and if they had appointed directors in two or more different years would there be multiple years in the same variable or would each director have a 'directorsince' variable associated with them?

    Comment


    • #3
      Hi Matthew,

      Thanks for replying! The format of the directorsince variable is years (type: double, format %ty). The directorsince variable is linked to the firm (CUSIP code), so it would show like:

      cusip (firm ID) ------ directorsince
      123456 ------------ 2007
      123456 ------------ 2009
      123456 ------------ 2011

      This would indicate that the same firm hired directors in multiple different years. However, my database contains a lot of observations of firms that hired no directors at all, only in 1 year or multiple in the same year.

      Hope this makes a bit more clear.

      Comment


      • #4
        Hi Rick - I dont think this is clear. It would help if you showed an actual example of the data since we dont know what your varnames mean or how your data are structured. Here's an example with a few options that might help:


        Code:
        **create some fake data:
        clear
        set obs 60
        g firm = int(runiform()*5)
        bys firm: g year = 2000+_n
        g directorsince = year if runiform()<.15
        
        
        
        **keep only if director since isnt missing & >= 2
        
        bys firm (year): egen select = count(directorsince)
        
        keep if select >=2
        
        
        **or you could do this:
        bys firm (year): egen firstdirector = min(directorsince)
        bys firm (year): egen lastdirector = max(directorsince)
        
        
        keep if firstdirector!=lastdirector & !mi(lastdirector)
        Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

        Comment


        • #5
          Hi Eric,

          Thanks a lot for trying to help me out. I copied the following part of my dataset:
          Code:
           
          cusip    dirsince
          03600T104    2004
          03600T104    2012
          98933Q108    2010
          50060P106    2004
          50060P106    2008
          98933Q108    2016
          247361702    2010
          910047109    2010
          37045V100    2009
          053332102    2002
          584688105    2003
          So from the part of the dataset I copied, I only want to keep the CUSIP codes that have two ore more dirsince observations in different years.

          Comment


          • #6
            Perhaps this will do what you need.
            Code:
            by cusip (dirsince), sort: keep if dirsince[1]!=dirsince[_N]

            Comment


            • #7
              Originally posted by William Lisowski View Post
              Perhaps this will do what you need.
              Code:
              by cusip (dirsince), sort: keep if dirsince[1]!=dirsince[_N]
              What exactly does the command do William? So far it seems to work..

              Comment


              • #8
                They keys to understanding the command are
                • the by command will sort the data by cusip and dirsince, and then run the command following it separately for each value of cusip
                • dirsince[1] is the first observation of dirsince; dirsince[_N] is the last observation, which because of the by command are for a given value of cusip
                • the keep command will thus keep those observations where the cusip has at least two different values of dirsince
                But you should thoroughly read the output of help by since it is an important tool in Stata and you will need it repeatedly in your work.

                I'm sympathetic to you as a new user of Stata - it's a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly.

                When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. There are a lot of examples to copy and paste into Stata's do-file editor to run yourself, and better yet, to experiment with changing the options to see how the results change.

                All of these manuals are included as PDFs in the Stata installation (since version 11) and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu. The objective in doing the reading was not so much to master Stata (several years later and I still won't claim mastery) as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

                Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

                Comment


                • #9
                  Thanks a lot for your in depth explanation William!

                  Comment

                  Working...
                  X