Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate Median for each industry and year

    Hi , Iam trying to generate median for the variable name = ROA for each industry (SIC Code) and year and my sample from 2009 to 2013
    but its seem some missing in this code Code:

    Code:
    gen n1=.
    gen n2=.
    
    
    foreach v of var  roa {
             forval y=2009/2014 {
                              forval  i= 1/89 {
                              
                               count if industry==`i' & year==`y'
        
                                    
    gen ggg if industry== `i' & year==`y'= r(mean)
    gen hhh if industry== `i' & year==`y'= r(median)
       replace n1= ggg if industry== `i' & year==`y'
       replace n2= hhh if industry== `i' & year==`y'
         drop ggg 
         drop hhh 
         
     }
    }
    }

  • #2
    -egen- is your friend:

    Code:
    bysort industry year : egen mean=mean(ROA)
    hth,
    Jeph

    Comment


    • #3
      Thank you for reply , but my questions here I have a lot of SIC code from 10 to 83 for the 2009 to 2013 . the questions here I want the median/mean for 2009 for the same SIC code 10 and mean/median for the same SIC code 11 for 2010 and etc....

      Comment


      • #4
        Jeph is right. Note that his example was for calculating the mean, but there's a median() function too.

        The problem in #3 is no problem and indeed Jeph already answered that.

        It seems that you want

        Code:
         
        bysort industry year : egen median_roa= median(roa)

        Comment


        • #5
          Thank you Nick for teply , for example my sample like that and I want to generate the median for sic2 no. 10 for year 2009
          YEAR roa SIC2
          2009 6059 10
          2010 16344 10
          2011 17436 10
          2012 4360 10
          2013 3088 10
          2014 1200 10
          2009 171353 28
          2010 210090 28
          2011 245036 28
          2012 240776 28
          2013 251324 28
          2014 288745 28
          2009 89329 10
          2010 100068 10
          2011 121603 10
          2012 129111 10
          2013 141196 10
          2014 163480 10
          when I check my result on excel to ensure that code is correct , the result of median is different . in excel he gave me the median for sic2 no. 10 for 2009 is 95388
          and when I use this code above they give this no. 47694 . please advise me

          Comment


          • #6
            I can reproduce your result in Stata:

            Code:
            . clear 
            
            . input YEAR      roa     SIC2
            
                      YEAR        roa       SIC2
              1. 2009    6059    10
              2. 2010    16344   10
              3. 2011    17436   10
              4. 2012    4360    10
              5. 2013    3088    10
              6. 2014    1200    10
              7. 2009    171353  28
              8. 2010    210090  28
              9. 2011    245036  28
             10. 2012    240776  28
             11. 2013    251324  28
             12. 2014    288745  28
             13. 2009    89329   10
             14. 2010    100068  10
             15. 2011    121603  10
             16. 2012    129111  10
             17. 2013    141196  10
             18. 2014    163480  10
             19. end 
            
            . sort SIC2 YEAR 
            
            . egen median_roa = median(roa), by(SIC2 YEAR) 
            
            . list, sepby(SIC2 YEAR) 
            
                 +---------------------------------+
                 | YEAR      roa   SIC2   median~a |
                 |---------------------------------|
              1. | 2009    89329     10      47694 |
              2. | 2009     6059     10      47694 |
                 |---------------------------------|
              3. | 2010    16344     10      58206 |
              4. | 2010   100068     10      58206 |
                 |---------------------------------|
              5. | 2011   121603     10    69519.5 |
              6. | 2011    17436     10    69519.5 |
                 |---------------------------------|
              7. | 2012   129111     10    66735.5 |
              8. | 2012     4360     10    66735.5 |
                 |---------------------------------|
              9. | 2013   141196     10      72142 |
             10. | 2013     3088     10      72142 |
                 |---------------------------------|
             11. | 2014   163480     10      82340 |
             12. | 2014     1200     10      82340 |
                 |---------------------------------|
             13. | 2009   171353     28     171353 |
                 |---------------------------------|
             14. | 2010   210090     28     210090 |
                 |---------------------------------|
             15. | 2011   245036     28     245036 |
                 |---------------------------------|
             16. | 2012   240776     28     240776 |
                 |---------------------------------|
             17. | 2013   251324     28     251324 |
                 |---------------------------------|
             18. | 2014   288745     28     288745 |
                 +---------------------------------+
            As far as Excel is concerned, you get a different result. So something is wrong there. Perhaps the data are different; perhaps the code doesn't do what you think, perhaps there is a bug in Excel's code. (I am not fond of Excel as a statistical engine at all, but that I guess is the least likely here.) We can't say which on the evidence you've given.


            Comment


            • #7
              Thank you very much for help and time , I got it

              Comment


              • #8
                Note that there are exactly two observations for sic2=10 and year=2009, and the largest of the two is 89,329. The median cannot possible by 95,388 as you assert Excel reported, since that is larger than the largest value in the data. When I compute the median of those two observations in Excel, I get the same answer as Stata.

                So I agree with Nick that checking the Excel answer in Stata shows that something is wrong with the calculations in Excel, not the other way around.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  I can reproduce your result in Stata:

                  Code:
                  . clear
                  
                  . input YEAR roa SIC2
                  
                  YEAR roa SIC2
                  1. 2009 6059 10
                  2. 2010 16344 10
                  3. 2011 17436 10
                  4. 2012 4360 10
                  5. 2013 3088 10
                  6. 2014 1200 10
                  7. 2009 171353 28
                  8. 2010 210090 28
                  9. 2011 245036 28
                  10. 2012 240776 28
                  11. 2013 251324 28
                  12. 2014 288745 28
                  13. 2009 89329 10
                  14. 2010 100068 10
                  15. 2011 121603 10
                  16. 2012 129111 10
                  17. 2013 141196 10
                  18. 2014 163480 10
                  19. end
                  
                  . sort SIC2 YEAR
                  
                  . egen median_roa = median(roa), by(SIC2 YEAR)
                  
                  . list, sepby(SIC2 YEAR)
                  
                  +---------------------------------+
                  | YEAR roa SIC2 median~a |
                  |---------------------------------|
                  1. | 2009 89329 10 47694 |
                  2. | 2009 6059 10 47694 |
                  |---------------------------------|
                  3. | 2010 16344 10 58206 |
                  4. | 2010 100068 10 58206 |
                  |---------------------------------|
                  5. | 2011 121603 10 69519.5 |
                  6. | 2011 17436 10 69519.5 |
                  |---------------------------------|
                  7. | 2012 129111 10 66735.5 |
                  8. | 2012 4360 10 66735.5 |
                  |---------------------------------|
                  9. | 2013 141196 10 72142 |
                  10. | 2013 3088 10 72142 |
                  |---------------------------------|
                  11. | 2014 163480 10 82340 |
                  12. | 2014 1200 10 82340 |
                  |---------------------------------|
                  13. | 2009 171353 28 171353 |
                  |---------------------------------|
                  14. | 2010 210090 28 210090 |
                  |---------------------------------|
                  15. | 2011 245036 28 245036 |
                  |---------------------------------|
                  16. | 2012 240776 28 240776 |
                  |---------------------------------|
                  17. | 2013 251324 28 251324 |
                  |---------------------------------|
                  18. | 2014 288745 28 288745 |
                  +---------------------------------+
                  As far as Excel is concerned, you get a different result. So something is wrong there. Perhaps the data are different; perhaps the code doesn't do what you think, perhaps there is a bug in Excel's code. (I am not fond of Excel as a statistical engine at all, but that I guess is the least likely here.) We can't say which on the evidence you've given.

                  Dear Nick, if i would like to generate a median every 3 years, how could i do that?

                  In this examples. if I would like to generate the median of ROA every 3 year based on SIC. What would be the code?

                  Thanks

                  Comment


                  • #10
                    Lucas Long You haven't addressed my questions at https://www.statalist.org/forums/for...-every-3-years

                    -- so asking another vague question isn't going to help.

                    Comment


                    • #11
                      Hi all,

                      I have a similar problem as Basiem. I want to generate median values of askbid for daily observations in a panel dataset. So there are multiple companies (c_id) and multiple years of data per c_id.


                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input float(date year2 fperiod askbid c_id)
                      20804 2016 20819   .002751032 3
                      20807 2016 20819   .002247191 3
                      20808 2016 20819  .0011129661 3
                      20809 2016 20819 .00056038104 3
                      20810 2016 20819  .0016925246 3
                      20811 2016 20819  .0011229647 3
                      20814 2016 20819            . 3
                      20815 2016 20819  .0016877637 3
                      20816 2016 20819            0 3
                      20817 2016 20819  .0011370097 3
                      20818 2016 20819  .0011514105 3
                      20821 2017 21184   .002258611 3
                      20822 2017 21184  .0010718113 3
                      20823 2017 21184  .0016291067 3
                      20824 2017 21184  .0016116035 3
                      20825 2017 21184            . 3
                      20828 2017 21184  .0005275653 3
                      20829 2017 21184  .0020920502 3
                      20830 2017 21184   .001545197 3
                      20831 2017 21184   .002545177 3
                      20832 2017 21184  .0015209125 3
                      end
                      format %tdDD/NN/CCYY date
                      format %tdDD/NN/CCYY fperiod

                      I use to get the median:
                      Code:
                      bysort c_id year2 : egen median_ab= median(askbid)
                      Which works as supposed. The problem I have, is that now I generated the median per year per company. But I want to generate the median measured over the period spanning month -9 to month +3 relative to a firm’s fiscal year-end(the value in fperiod). Helpfull could be that fperiod changes on the date that the fiscal year ends.

                      I tried some things but it didn't work out that well. Who could help me solve this puzzle? Many thanks in advance!

                      Comment


                      • #12
                        Nick Cox I hope that you could please help me.

                        For every CEO in my dataset and for every year, I have a numerical variable in my sample called GAI. My timeframe is 1992 – 2016.
                        I used the following function to create the median of a variable called “GAI” in my sample: egen median_GAI = median(GAI), by (fyear)

                        Now, I need to create a dummy variable that says:
                        if the GAI of a CEO in year x, is higher than the median_GAI variable for that CEO in that specific year, then = 1, and this 1, should be labeled as Generalist. If the GAI is lower than the median_GAI, then = 0, and this variable is labeled Specialist.

                        Is there an easy way to do this in stata?

                        Comment

                        Working...
                        X