Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bachelor's thesis help :)

    Hi everyone!

    I'm currently writing my bachelor's thesis and could do with some help. I have a dataset of companies with the tobin's q from 3 years (2007,2008 and 2009). The unique identifier is gvkey. I'd like to drop all the observations of companies that only have the tobin's q for one or two of the years, but not all three. Also, some of the companies have three values, but they are all in the same year, eg. 2007. I would like to drop these companies from the data set as well.

    Essentially, I want to keep only the companies that have the Tobin's q for years 2007,2008 and 2009. I know I have to do a for loop but that's about where my knowledge ends. Anybody know what I could do?
    Last edited by Mary Poppins; 19 May 2023, 04:21.

  • #2
    Whether you are writing a Bachelor's thesis or are a very distinguished researcher or someone in between doesn't really define your question! Please give an informative title to your posts, such as "Cleaning panel data for distinct years and non-missing values".

    Your first rule is presumably that a value for q must be present and not missing. Now you need to find three such observations for three distinct years for each company.

    Here's how it works with fake data, absent a data example from you. I have no idea what Tobin's q is beyond knowing that the man deserves his T. Here panel 1 is good (3 values for 3 years), but panels 2, 3 and 4 are not. Panel 2 has repeated years, panel 3 has a missing value, and panel 4 has only 2 years present.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(gvkey year q)
    1 2007 42
    1 2008 42
    1 2009 42
    2 2007 42
    2 2007 42
    2 2008 42
    3 2007 42
    3 2008 42
    3 2009  .
    4 2007 42
    4 2008 42
    end
    
    egen tag = tag(gvkey year) if !missing(q)
    
    bysort gvkey: egen n_good = total(tag)
    
    list, sepby(gvkey)
    
         +----------------------------------+
         | gvkey   year    q   tag   n_good |
         |----------------------------------|
      1. |     1   2007   42     1        3 |
      2. |     1   2008   42     1        3 |
      3. |     1   2009   42     1        3 |
         |----------------------------------|
      4. |     2   2007   42     1        2 |
      5. |     2   2007   42     0        2 |
      6. |     2   2008   42     1        2 |
         |----------------------------------|
      7. |     3   2007   42     1        2 |
      8. |     3   2008   42     1        2 |
      9. |     3   2009    .     0        2 |
         |----------------------------------|
     10. |     4   2007   42     1        2 |
     11. |     4   2008   42     1        2 |
         +----------------------------------+
    So,

    Code:
    keep if n_good == 3
    would keep companies that are good. That automatically selects companies as well as observations because the count of suitable observations was across companies.

    The problem could be solved with a for loop, but I would rather not do it that way. I really wouldn't expect an undergraduate to know that solution unless they had found it in a post. If you want more on the principles, see p.563 at https://journals.sagepub.com/doi/pdf...867X0800800408

    Our FAQ Advice at https://www.statalist.org/forums/help is meant to give focused advice on how to ask good questions. Points that seem to apply here are

    #7 choose informative titles

    #6 use full real names (Mary Poppins always had good solutions, not problems, IIRC)

    #12 give a data example

    Comment


    • #3
      Hi,

      Thanks so much for your help. Sorry about the poor question formatting.

      Comment

      Working...
      X