Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Classify or label observations (where classification changes with time)

    Hi!
    I am using version 13.1 and I am trying to classify/label each company in my dataset as young or mature (i.e. “1” “0”).
    I am using a dataset including companies with stock price data from 1988-2008. I should analyse stock price movements, but as a first step I have to classify the companies.

    Variables I want to use are:
    gvkey(->long, %12.0g)
    year(->int, %9.0g)
    prcc(->Type:double, %10.0g)

    but company names (conm) are still included.

    A company should be labeled mature, if prcc shows six entries or more (company is listed with prcc six times, does not have to be in row),
    and should be young as long as “prcc” has one to five entries. I tried "label" but were not able to code the "jump" from young to old.
    In my dataset, if there is an entry for a company (gvkey) in a certain year(year), there is always a stock price (prcc). It looks like this:

    Click image for larger version

Name:	Snapshot.JPG
Views:	1
Size:	79.7 KB
ID:	1322476


    Here is my problem: Has anyone an idea how I can label or create a new variable for the companies classsifing them?
    Some companies start at 1995, some 1999 and so on.
    A classification for a company should change from young to old, when it “turns” six (i.e. see Kofax which should have a “young” or 1 from 1998-2002 and “old” or 0 from 2003 throuhout the rest).

    If anyone could help me out, that would be nice!

    Nellie


  • #2
    Would

    Code:
    bysort gvkey (year) : generate old = (_n >= 6)
    label define old 0 "young" 1 "old"
    label values old old
    do for you?

    Best
    Daniel

    Comment


    • #3
      Thank you Daniel! Looks so easy, but for me it wasnt

      It works perfectly!

      Nellie

      Comment


      • #4
        Hi again!

        I am struggling with something similar as above. I looked for old threads but could not find an answer.
        My data is the same as above. I want to keep all companies with >=5 values for variable div (Cash Dividends. Type:float).There are no missing values for div, but some companies have just two or three observations (i.e. years 2006 and 2007), which is not enough so I want to drop them. All companies which show in general 5 obseravtions or more for div are the ones I want to keep in the dataset.

        I tried the above code with adjusted to div and “<=4” but that drops also the first 4 observations of a company with i.e. 7 years div.
        Just the last three years are kept in the dataset which is not the idea...
        Any recommendations? Again, thank you for your help!

        Comment


        • #5
          Code:
          bysort gvkey : drop if _N < 5

          Comment


          • #6
            That is straightforward and elegant! Sorry for that question...
            Thank you!
            Nellie

            Comment


            • #7
              When I first learned Stata in the early 1990s if and in were immediate and obvious but I had to work harder at getting used to by: and what it could do. My perhaps unusual tip is that writing an article on something helps you understand it, just as preparing a lecture or lesson does. That's not essential, but the article I wrote may still be helpful: http://www.stata-journal.com/sjpdf.h...iclenum=pr0004

              Comment


              • #8
                Thank you for the advice and for that link! It helps, yes!

                Comment

                Working...
                X