Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Note that you can do this with rangestat (from SSC): far simpler to code and much much faster! To install rangestat, type in Stata's Command window:
    Code:
    ssc install rangestat
    Just as a proof of concept, here's some synthetic data that has the correct structure. I copied Clyde's code from #12 (altered slightly to remove all the output):

    Code:
    clear all
    set obs 50
    gen ticker = string(_n)
    expand 10
    bysort ticker: gen year = 2006 + _n
    expand 365
    bysort ticker year: gen Date = mdy(1,1,year) + _n - 1
    sort ticker year Date
    format %td Date
    gen company_stock_return = runiform()
    gen sp_500_return = runiform()
    drop year
    
    
    * Clyde's code
    gen int year = yofd(Date)
    gen beta = .
    
    levelsof ticker, local(firms)
    qui foreach f of local firms {
        levelsof year if ticker == `"`f'"', local(years)
        foreach y of local years {
            capture regress company_stock_return sp_500_return if year == `y' ///
                & ticker ==`"`f'"'
            if c(rc) == 0 { // SUCCESSFUL REGRESSION, STORE RESULTS
                replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'
            }
            else if !inlist(c(rc), 2000, 2001) { // SOME PROBLEM OTHER THAN TOO FEW OBSERVATIONS
                display in red `"Unexpected error with ticker = `f' and year = `y'"'
                exit c(rc)
            }
        }
    }
    
    * compare with rangestat (from SSC).
    rangestat (reg) company_stock_return sp_500_return, interval(year 0 0) by(ticker year)
    assert beta == float(b_sp_500_return)
    Note that there are may ways to code the interval() option. Within groups of ticker year, the year does not change so all observations in the group are included. Since all observations within the group have the same interval bounds, the regression is calculated once and the results are carried over to the other observations in the group.

    Comment


    • #17
      Thank you very much for that addition! I am currently working on the next step of the analysis and my dataset looks like this now:

      Code:
      year   ceo cfo  ticker    beta        emp    gp    nbeta
      2012    0    0    CBG  1.974823    37       0    1.974823
      2009    0    0    ALL    1.870906    36.8    0    1.870906
      2009    0    1    JLL    1.947649    36.6    0    1.947649
      2009    0    0    JLL    1.947649    36.6    0    1.947649
      2009    0    0    JLL    1.947649    36.6    0    1.947649
      2013    0    0    CI       .8702291    36.5    0    .8702291
      2009    0    0    AON    .5969396    36.2    0    .5969396
      2008    0    1    JLL    1.393131    36.2    0    1.393131
      2008    0    0    JLL    1.393131    36.2    0    1.393131
      2008    0    0    JLL    1.393131    36.2    0    1.393131
      2012    0    0    CI     .9133391    35.8     0    .9133391
      2010    0    0    ALL    1.089363    35.7    0    1.089363
      2010    0    0    GS     .8921949    35.7    0    .8921949
      2008    1    0    AET    1.255258    35.5    0    1.255258
      2008    0    0    AET    1.255258    35.5    0    1.255258
      2008    0    0    AET    1.255258    35.5    0    1.255258
      2008    0    0    AET    1.255258    35.5    0    1.255258
      2007    1    0    AET    .8773144    35.2    0    .8773144
      Note: This is just part of the entire dataset.

      Whereby emp = number of employees (thousands) and gp = gross profit (millions). The ceo and cfo variables specify the respective gender of the ceo and cfo in a specific year, for a specific company, whereby 0 = male and 1 = female. What I would like to do now is create a box plot of the dataset, with the beta variable and the ceo and cfo variables. I guess in this case I will need two seperate box plots, one that analyses beta and the ceo variable, and another one that analyses beta and the cfo variable.

      I already changed the beta variable into a numerical variable, denoted by nbeta.

      What I also would like to know are some "descriptive statistics" of the betas that correspond to the "0"s and "1"s of the ceo and cfo variables.

      Does someone also know what the best way would be for me to do a regression with the control variables? Basically I want to run a regression that looks like this:

      beta = a + b*gender(ceo/cfo)

      !and I want to include the control variables gp and emp in this!

      This would enable me to find out the coefficients and t-stats of the model.

      Thank you very much for any help,

      Konstantin
      Last edited by Konstantin Schmeisser; 19 Jun 2017, 04:07.

      Comment


      • #18
        Originally posted by Clyde Schechter View Post
        It would have been more helpful had you posted an example of your actual data, rather than this schematic. I'll assume that Date is a Stata internal format numeric date, that ticker is a string variable, and that company_stock_return and sp_500_return are the names of your two return variables. Then you loop over firms, and loop over years within that, placing the regression coefficient in a variable beta as you go:

        Code:
        gen int year = yofd(Date)
        gen beta = .
        
        levelsof ticker, local(firms)
        foreach f of local firms {
        levelsof year if ticker == `"`f'"', local(years)
        foreach y of local years {
        regress company_stock_return sp_500_return if year == `y' & ticker ==` "`f'"'
        replace beta = _b[sp_500_return] if year == `y' & ticker == `"`f'"'
        }
        }
        Note: Not tested; beware of typos.
        Dear Clyde: Wouldn't it be easier (and much faster) to use "statsby" as
        Code:
        statsby _b, by(firms year) saving("coef.dta", replace): regress company_stock_return sp_500_return
        merge m:1 using coef.dta
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #19
          Re #18:

          Yes -statsby- would accomplish the same thing. The merge back into the original data (which requires key variables to be specified) adds a bit of complication. But I'm pretty sure -statsby- would be slower, not faster. Perhaps not in every case, but most of the time. -statsby- is set up to cope with a broad range of situations, and it has overhead involved to detect and handle them.

          But, in any case, the -rangestat- solution proposed by Robert Picard in #16 is by far the simplest and the fastest way to do this. -rangestat- is a relatively new command (though I have been using it very often in my own work since it came out), and the (reg) specification was only very recently added in a new update. Since the (reg) option is new, and because my own uses of -rangestat- don't involve regressions, I had forgotten about it. But had I remembered it, the -rangestat- solution is definitely what I would have advised here.

          Comment


          • #20
            Got it and thanks for your answer. I am starting using -rangestat- now. And I should have added `firms year' as
            Code:
            merge m:1 firms year using coef.dta
            in my post above.

            Ho-Chuan (River) Huang
            Stata 19.0, MP(4)

            Comment

            Working...
            X