Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to identify a variable's top/bottom 30% of each year in panel data?

    Dear Stata users,


    I am trying to run an cross-sectional regression on firms in the bottom 30% and top30% of the distribution of book-to-market value of panel data.
    I tried to rank firms every year, but I can't identify the top/ bottom 30% of them, because this an unbalanced panel , and each year's total number of firms is different.
    I would be grateful if someone could help me to identify these firms each year.

    here's the code i use
    Code:
    sort gvkey year
    local i=1964 // the time period is 1964-2014
    while `i'<=2014{
    quietly egen per70`i'=pctile(btm), p(70) //btm is the book-to-market value, and I have to find out the firms with top/bottom 30% of the distribution of btm
    quietly drop if btm<70`i' 
    quietly drop per70`i' 
    local i=`i'+1
    }
    Here's part of my data
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long gvkey double year float btm
     2403 1964 .16921677
     9103 1964 .16258383
     1608 1964 .18209743
    10060 1964 .18040167
     1481 1964 .14549348
     4780 1964  .1590813
     3874 1964  .1805083
     3235 1964  .1769771
    11535 1964 .17150614
     4475 1964  .1712105
     4453 1964 .12189302
     4021 1964 .18900825
    11264 1964 .17115825
    10878 1964 .19931643
     6502 1964 .18697643
     8645 1964 .16715898
     6113 1964 .16654503
     3489 1964  .1921034
    11280 1964 .16220094
     9616 1964 .18040165
    end
    Thank you for your help in advance!






























  • #2
    Hey Noah, your example is not perfect because you are only showing a single year. I took the liberty to simulate the data in order to have more than one year of data.

    Code:
    clear
    set obs 3
    gen n = 10
    gen year = _n+1963
    expand n
    sort year
    gen btm = runiform()
    drop n
    So now there are 10 observations per year for 3 years. The not-so-elegant code below creates one variable for each year specifying which observations were the bottom and top 30% values for btm for each year. In the "topbottom" variables, 0 codes for bottom 30%, 1 codes for top 30%. This should be resistant for years with different amount of observations. It might not work properly if there are less than 10 observations per year, you might have to check for that. The code also don't pay any attention to if there are ties in assignment of deciles (to obtain the 30% top and bottom, I'm classifying the observations for each year in deciles first).

    Code:
    levelsof year, local(levels)
    
    foreach year of local levels{
        xtile topbottom`year' = btm if year==`year', nq(10)
    }
    
    foreach var of varlist topbottom1964 - topbottom1966{
        replace `var' = 0 if `var' <=3
        replace `var' = 1 if `var' >=8 & `var' !=.
        replace `var' = . if `var' >1 & `var' !=.
    }
    Last edited by Igor Paploski; 15 Aug 2019, 14:06.

    Comment


    • #3
      Thank you for your help!
      There are hundreds of firms each year in my data so the code works well.

      Comment


      • #4
        I can't see any reason to loop here. Or rather, the egen call can use by() to calculate separately by year, after which you have all the ingredients you need.

        Code:
        egen per70 = pctile(btm), p(70) by(year)
        egen per30 = pctile(btm), p(30) by(year)
        gen wanted = cond(btm <= per30, 1, cond(btm <= per70, 2, 3)) if btm < . 

        Comment


        • #5
          Originally posted by Nick Cox View Post
          I can't see any reason to loop here. Or rather, the egen call can use by() to calculate separately by year, after which you have all the ingredients you need.

          Code:
          egen per70 = pctile(btm), p(70) by(year)
          egen per30 = pctile(btm), p(30) by(year)
          gen wanted = cond(btm <= per30, 1, cond(btm <= per70, 2, 3)) if btm < . 
          Thank you for your code!
          Before reading your post, I thought pctile() cannot be used togerther with by().
          Since it can be used like this, it is more conveniet now. Thank you!

          Comment

          Working...
          X