Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • conditional sorting

    Hello, new STATA learner here

    I have a sample of my data and I try to sort it based on date. but I can`t figure out the code.
    this is a sample of my dataset:

    Date Companies Return MarketCap PBR t
    31 Dec 2018 Comp1 -.0753 2.310e+08 1.095 1
    31 Dec 2018 Comp2 -.1307 44044143 .89241 2
    31 Dec 2018 Comp3 -.0356 20107234 3.07592 3
    31 Dec 2018 Comp4 .1555 25572773 6.154 4
    31 Dec 2018 Comp5 .0043 24495543 1.55772 5
    31 Dec 2018 Comp6 .0605 15059432 1.25528 6
    31 Dec 2018 Comp7 .1075 25319680 .47472 7
    31 Dec 2018 Comp8 -.0845 8588941 1.67169 8
    31 Dec 2018 Comp9 .1049 13660745 .49748 9
    31 Dec 2018 Comp10 -.0148 19442186 .53016 10
    31 Dec 2018 Comp11 -.0182 21186401 .47182 11
    31 Dec 2018 Comp12 .0526 18495334 .59724 12
    31 Dec 2018 Comp13 -.0544 28468515 10.56176 13
    31 Dec 2018 Comp14 -.1407 10195259 .78793 14
    31 Dec 2018 Comp15 -.0865 16597569 .87337 15
    31 Dec 2018 Comp16 -.0388 18778304 .52334 16
    31 Dec 2018 Comp17 .0293 20012300 .92337 17
    31 Dec 2018 Comp18 -.0764 18293677 1.05184 18
    31 Dec 2018 Comp19 -.0509 17195635 5.44452 19
    31 Dec 2018 Comp20 -.0359 10883775 .41181 20
    31 Jan 2019 Comp1 .1925 2.755e+08 1.22975 21
    31 Jan 2019 Comp2 .2215 53799375 1.06682 22
    31 Jan 2019 Comp3 .1148 22414622 3.11504 23
    31 Jan 2019 Comp4 .0323 26399835 6.06272 24
    31 Jan 2019 Comp5 .0605 25977982 1.65942 25
    31 Jan 2019 Comp6 .0205 15368872 1.24155 26
    31 Jan 2019 Comp7 .0928 27670030 .50101 27
    31 Jan 2019 Comp8 -.0359 8280406 1.63821 28
    31 Jan 2019 Comp9 .0786 14734958 .50472 29

    I have 20 companies and 37 dates (Only showing 2). I`m trying to sort Marketcap and PBR by each date, for example, 31/12/2018, 31/01/2019, and so on. after sorting market cap, for example, I want to get the correspondent return for the sorted market cap and get the average of smalls, and the average of bigs and then the average of these 2. and I want to do the same thing to PBR.

    could you help me figure out the right code for this.
    I later have to use a bigger dataset, more than 2000 comps.

    thank you

  • #2
    Please see

    https://www.statalist.org/forums/help#stata (please use dataex)

    https://www.statalist.org/forums/help#realnames

    https://www.statalist.org/forums/help#spelling

    What are "smalls" and "bigs"?

    Consider

    Code:
    webuse grunfeld, clear 
    and definitions means of "the smallest 3" and "the largest 3" in each year. Then that would be

    Code:
    bysort year (invest) : gen smallest3 = (invest[1] + invest[2] + invest[3]) / 3
    by year : gen largest3 = (invest[_N] + invest[_N-1] + invest[_N-2]) / 3
    but that gets a bit awkward for other definitions and for coping with missing values. Here is a more general recipe.


    Code:
    webuse grunfeld, clear
    
    egen rank = rank(invest), by(year) unique
    
    egen max_rank = max(rank), by(year)
    
    egen smallest3 = mean(cond(rank <= 3, invest, .)), by(year)
    
    egen largest3 = mean(cond(rank > max_rank - 3, invest, .)), by(year)
    
    sort year invest
    
    list year invest rank max_rank smallest3 largest3 in 1/10
    
         +-------------------------------------------------------+
         | year   invest   rank   max_rank   smalle~3   largest3 |
         |-------------------------------------------------------|
      1. | 1935     2.54      1         10   11.94333   189.2633 |
      2. | 1935    12.93      2         10   11.94333   189.2633 |
      3. | 1935    20.36      3         10   11.94333   189.2633 |
      4. | 1935    24.43      4         10   11.94333   189.2633 |
      5. | 1935    26.63      5         10   11.94333   189.2633 |
         |-------------------------------------------------------|
      6. | 1935     33.1      6         10   11.94333   189.2633 |
      7. | 1935    39.68      7         10   11.94333   189.2633 |
      8. | 1935    40.29      8         10   11.94333   189.2633 |
      9. | 1935    209.9      9         10   11.94333   189.2633 |
     10. | 1935    317.6     10         10   11.94333   189.2633 |
         +-------------------------------------------------------+
    You need to repeat it for different variables (with different names) and vary it for different definitions of smallest and largest.

    A column with the title "The largest five" will appear in Stata Journal 22(2) 2022 at around the end of June. That could interest readers of this thread after that time.
    Last edited by Nick Cox; 07 May 2022, 05:07.

    Comment


    • #3
      Hello~
      thank you for your reply. this is my first post so I`m still learning about the rules here
      what I mean by bigs and smalls is that I`m trying to replicate Fama and French`s factors using monthly variables
      if you see, I have only 20 companies. I need to use marketcap to get the size factor. so I need to get the median or the mean each month and get 2 groups, companies with big marketcap and companies with small marketcap. that way I can get the mean of each group and then get my size factor by mirroring the marketcaps to the returns.
      I hope you understand what I`m trying to say
      what I`m trying to do is as follows:

      1- each month, I want to sort my marketcap (ex: 31/12/2018)
      2- get the mean of that month`s marketcap
      3-divide the companies into 2 groups (Big Marketcap companies vs Small marketcap Companies)
      4-mirror the sorted companies with their returns
      5- get the mean of the returns for each group
      6- get the overall mean of both groups

      I need to do the same thing for PBR

      the steps are very clear but I`m still unable to write the code
      could you help, please

      Comment


      • #4
        Other way round, I am not an economist and don't work with your kind of data and I've never read Fama and French and couldn't even cite the reference. Your prescription is pretty fuzzy to me and you don't seem clear whether you want to use the mean or the median or precisely what big and small are (above and below the mean? the median? something else?).

        However, #2 already contains code hints you should be able to use. You're still not using dataex but here is how to get the mean, the mean of company investment below the mean, and the mean of company investment above or equal to the mean. That may help. If not, I think you're expecting replies from people who know your kind of analysis. Naturally you need to adapt to your variable names -- and to repeat for eac

        Code:
        webuse grunfeld, clear
        
        egen mean_invest = mean(invest), by(year)
        
        egen small_invest = mean(cond(invest &lt; mean, invest, .)), by(year)
        
        egen large_invest = mean(cond(invest &gt;= mean, invest, .)), by(year)

        Comment


        • #5
          I`m currently using your code but may I know how to get the time to vary
          from 1 to 20 for each company. i would like to do that to my data set too

          Comment


          • #6
            If I understand #1 correctly, doing your calculation by(Date) should be fine. Your Companies and Date variables don't matter otherwise.

            Comment

            Working...
            X