Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the standard deviation for a defined time window before a certain date

    Hello statalist community,

    I am working on my thesis and have encountered the following issue when preparing my dataset:

    I have merged daily return data from CRSP with quarterly reporting data from Compustat. The quarterly data was made public on the date indicated by the reporting_date, but the data published in that report obviously corresponds to a date previous to that. Daily return data ranges over many years, for many different stocks. I now need to calculate the standard deviation of the excess returns, in each case for a 45-day period, ending 10 days before the reporting dates. I would like to do this for every reporting_date for every permno. Please see an example of my data below:

    input double permno long(date reporting_date) float excess_return
    10010 10156 . -.02708268
    10010 10157 . .005845358
    10010 10160 . -.02454966
    10010 10161 . -.05564585
    10010 10162 . -.03952751
    10010 10163 . -.002447468
    10010 10164 . .0455881
    10010 10165 10245 .
    10010 10167 . .06420771
    10010 10168 . -.15954015
    10010 10169 . .13668042
    10010 10170 . .01815607
    10010 10171 . .01033406
    10010 10174 . -.04951261
    10010 10175 . .13724907
    10010 10176 . -.08172845
    10010 10177 . .05281845
    10010 10178 . .04306433
    10010 10181 . -.002760191
    10010 10182 . -.02001336
    10010 10183 . -.008839875
    10010 10184 . .01964553
    10010 10185 . -.07655262
    10010 10188 . -.15768693
    10010 10189 . -.01353516
    10010 10190 . .005196954
    10010 10192 . .056017
    10010 10195 . .16974324
    10010 10196 . -.005207345
    10010 10197 . .1496127
    10010 10198 . -.07011753
    10010 10199 . .08459848
    10010 10202 . -.01641954
    10010 10203 . -.05716385
    10010 10204 . .05511259
    10010 10205 . -.017439324
    10010 10206 . .02835849
    10010 10209 . -.05973966
    10010 10210 . -.003885546
    10010 10211 . -.01931514
    10010 10212 . .08233552
    10010 10213 . .010260553
    10010 10216 . -.003440613
    10010 10217 . .0003619183
    10010 10218 . -.04361133
    10010 10219 . .065786
    10010 10223 . -.05249239
    10010 10224 . .02006706
    10010 10225 . -.01194168
    10010 10226 . .06602075
    10010 10230 . -.06364613
    10010 10231 . -.04398114
    10010 10232 . -.03553922
    10010 10233 . .02590137
    10010 10234 . .10373606
    10010 10237 . -.02420747
    10010 10238 . -.02279297
    10010 10239 . -.002029145
    10010 10240 . .015212037
    10010 10241 . -.03876884
    10010 10244 . -.0003627767
    end
    format %td date
    format %td reporting_date

    I have tried various code in an attempt to generate my return volatility variable RETVOL, for example:
    egen RETVOL = std(excess_return) if date >= (reporting_date - 55) & date < (reporting_date - 10), by(permno reporting_date)
    However, I can't seem to get it to work. Would love some help.

    Thank you in advance,
    Benedikt

  • #2
    (1) If I understand correctly reporting date is almost always missing in your data example.

    For any observation with missing reporting date, your code boils down to

    [CODE]egen RETVOL = std(excess_return) if date >= . & date < . , by(permno) /CODE]

    (2) date is not ever missing in your example

    (3) More fundamentally, date can't be BOTH >= . AND < . in a given observation. Those conditions are disjoint. So, I don't see that you are specifying any observations for Stata to use.

    Otherwise put, Stata is going to evaluate your if condition separately in each observation, which does not help. For the one observation in the data example with both dates, excess return is missing too, so no joy there.

    So, where to start: date and reporting date must both be non-missing in each observation as otherwise the comparison between them is not helpful. The by() option won't make Stata look forward or backward in each stock to compare dates within the stock.

    The problem I guess lies upstream. Your merge didn't do what you need.

    Comment


    • #3
      Thanks for your reply. I guess I didn't explain the data very well. The excess return data is daily, and the reporting date is quarterly, so its true that for any stock for any given year, I have around 250 observations for excess return, and only 4 for the reporting date.

      I am trying to create a return volatility variable, where I see how volatile each stocks excess returns (relative to an index) were in the build up to their quarterly financial reporting date.

      I am not interested in the observations where I do not have a reporting date. I am only interested in the standard deviation of the excess returns from 55 days to 10 days prior to the reporting date.

      In the example I posted above, the reporting date is 10245, so I want to take the standard deviation of the excess returns that correspond to the dates 10190 (inclusive) all the way to 10235 (not inclusive).

      Ideally I would have a code that does this around every reporting date. I hope that's more clear.

      Thanks,
      Benedikt

      Comment


      • #4
        Your data organization is a bit unorthodox and confusing, but I think I understand what you want. Try this:
        Code:
        gen lb = reporting_date - 55
        gen ub = reporting_date - 10
        rangestat (sd) wanted = excess_return, by(permno) interval(date lb ub)
        replace wanted = . if missing(reporting_date)
        -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer. It is available from SSC.

        If this isn't what you had in mind, please handwork the calculation for a few return date/permno pairs, edit the results into a new example, run -dataex-, and show us what you have in mind.

        Comment


        • #5
          That does in fact do exactly what I was looking for, thank you!

          Comment

          Working...
          X