Classify or label observations (where classification changes with time)

Nellie Henriksson

Join Date: Jan 2016

Posts: 6
#1

Classify or label observations (where classification changes with time)

11 Jan 2016, 10:36

Hi!
I am using version 13.1 and I am trying to classify/label each company in my dataset as young or mature (i.e. “1” “0”).
I am using a dataset including companies with stock price data from 1988-2008. I should analyse stock price movements, but as a first step I have to classify the companies.

Variables I want to use are:
gvkey(->long, %12.0g)
year(->int, %9.0g)
prcc(->Type:double, %10.0g)

but company names (conm) are still included.

A company should be labeled mature, if prcc shows six entries or more (company is listed with prcc six times, does not have to be in row),
and should be young as long as “prcc” has one to five entries. I tried "label" but were not able to code the "jump" from young to old.
In my dataset, if there is an entry for a company (gvkey) in a certain year(year), there is always a stock price (prcc). It looks like this:

Here is my problem: Has anyone an idea how I can label or create a new variable for the companies classsifing them?
Some companies start at 1995, some 1999 and so on.
A classification for a company should change from young to old, when it “turns” six (i.e. see Kofax which should have a “young” or 1 from 1998-2002 and “old” or 0 from 2003 throuhout the rest).

If anyone could help me out, that would be nice!

Nellie
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

11 Jan 2016, 10:58

Would

Code:

bysort gvkey (year) : generate old = (_n >= 6) label define old 0 "young" 1 "old" label values old old

do for you?

Best
Daniel
Comment
Nellie Henriksson

Join Date: Jan 2016

Posts: 6
#3

11 Jan 2016, 11:08

Thank you Daniel! Looks so easy, but for me it wasnt

It works perfectly!

Nellie
Comment
Nellie Henriksson

Join Date: Jan 2016

Posts: 6
#4

13 Jan 2016, 03:14

Hi again!

I am struggling with something similar as above. I looked for old threads but could not find an answer.
My data is the same as above. I want to keep all companies with >=5 values for variable div (Cash Dividends. Type:float).There are no missing values for div, but some companies have just two or three observations (i.e. years 2006 and 2007), which is not enough so I want to drop them. All companies which show in general 5 obseravtions or more for div are the ones I want to keep in the dataset.

I tried the above code with adjusted to div and “<=4” but that drops also the first 4 observations of a company with i.e. 7 years div.
Just the last three years are kept in the dataset which is not the idea...
Any recommendations? Again, thank you for your help!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#5

13 Jan 2016, 03:25

Code:

bysort gvkey : drop if _N < 5
Comment
Nellie Henriksson

Join Date: Jan 2016

Posts: 6
#6

13 Jan 2016, 03:37

That is straightforward and elegant! Sorry for that question...
Thank you!
Nellie
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

13 Jan 2016, 03:48

When I first learned Stata in the early 1990s if and in were immediate and obvious but I had to work harder at getting used to by: and what it could do. My perhaps unusual tip is that writing an article on something helps you understand it, just as preparing a lecture or lesson does. That's not essential, but the article I wrote may still be helpful: http://www.stata-journal.com/sjpdf.h...iclenum=pr0004
Comment
Nellie Henriksson

Join Date: Jan 2016

Posts: 6
#8

14 Jan 2016, 06:27

Thank you for the advice and for that link! It helps, yes!
Comment

Announcement

Classify or label observations (where classification changes with time)

Comment

Comment

Comment

Comment

Comment

Comment

Comment