Bachelor's thesis help :)

Mary Poppins

Join Date: May 2023

Posts: 3
#1

Bachelor's thesis help :)

19 May 2023, 03:44

Hi everyone!

I'm currently writing my bachelor's thesis and could do with some help. I have a dataset of companies with the tobin's q from 3 years (2007,2008 and 2009). The unique identifier is gvkey. I'd like to drop all the observations of companies that only have the tobin's q for one or two of the years, but not all three. Also, some of the companies have three values, but they are all in the same year, eg. 2007. I would like to drop these companies from the data set as well.

Essentially, I want to keep only the companies that have the Tobin's q for years 2007,2008 and 2009. I know I have to do a for loop but that's about where my knowledge ends. Anybody know what I could do?

Last edited by Mary Poppins; 19 May 2023, 04:21.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35433
#2

19 May 2023, 04:19

Whether you are writing a Bachelor's thesis or are a very distinguished researcher or someone in between doesn't really define your question! Please give an informative title to your posts, such as "Cleaning panel data for distinct years and non-missing values".

Your first rule is presumably that a value for q must be present and not missing. Now you need to find three such observations for three distinct years for each company.

Here's how it works with fake data, absent a data example from you. I have no idea what Tobin's q is beyond knowing that the man deserves his T. Here panel 1 is good (3 values for 3 years), but panels 2, 3 and 4 are not. Panel 2 has repeated years, panel 3 has a missing value, and panel 4 has only 2 years present.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(gvkey year q) 1 2007 42 1 2008 42 1 2009 42 2 2007 42 2 2007 42 2 2008 42 3 2007 42 3 2008 42 3 2009 . 4 2007 42 4 2008 42 end egen tag = tag(gvkey year) if !missing(q) bysort gvkey: egen n_good = total(tag) list, sepby(gvkey) +----------------------------------+ | gvkey year q tag n_good | |----------------------------------| 1. | 1 2007 42 1 3 | 2. | 1 2008 42 1 3 | 3. | 1 2009 42 1 3 | |----------------------------------| 4. | 2 2007 42 1 2 | 5. | 2 2007 42 0 2 | 6. | 2 2008 42 1 2 | |----------------------------------| 7. | 3 2007 42 1 2 | 8. | 3 2008 42 1 2 | 9. | 3 2009 . 0 2 | |----------------------------------| 10. | 4 2007 42 1 2 | 11. | 4 2008 42 1 2 | +----------------------------------+

So,

Code:

keep if n_good == 3

would keep companies that are good. That automatically selects companies as well as observations because the count of suitable observations was across companies.

The problem could be solved with a for loop, but I would rather not do it that way. I really wouldn't expect an undergraduate to know that solution unless they had found it in a post. If you want more on the principles, see p.563 at https://journals.sagepub.com/doi/pdf...867X0800800408

Our FAQ Advice at https://www.statalist.org/forums/help is meant to give focused advice on how to ask good questions. Points that seem to apply here are

#7 choose informative titles

#6 use full real names (Mary Poppins always had good solutions, not problems, IIRC)

#12 give a data example
Comment
Mary Poppins

Join Date: May 2023

Posts: 3
#3

19 May 2023, 04:26

Hi,

Thanks so much for your help. Sorry about the poor question formatting.
Comment

Announcement

Bachelor's thesis help :)

Comment

Comment