Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculating a dummy for extreme returns

    i have the following data for each of the 12 fama and French industries and each quarter i have a return. I would like to create a dummy variable for the extreme return per quarter which could be lowest or highest. So for each quarter when an industry has the highest or lowest return the extreme dummy variable needs to get the value 1 and 0 otherwise.

    FF12 qrt returns max_return min_return
    11 2011q1 -6242.417 86014.92 -6242.417
    10 2011q1 -1363.764 86014.92 -6242.417
    1 2011q1 -1015.994 86014.92 -6242.417
    7 2011q1 1189.521 86014.92 -6242.417
    2 2011q1 1588.364 86014.92 -6242.417
    8 2011q1 1696.335 86014.92 -6242.417
    5 2011q1 1937.876 86014.92 -6242.417
    9 2011q1 3223.934 86014.92 -6242.417
    6 2011q1 7801.93 86014.92 -6242.417
    12 2011q1 24711.94 86014.92 -6242.417
    3 2011q1 33777.23 86014.92 -6242.417
    4 2011q1 86014.92 86014.92 -6242.417
    11 2011q2 -11506.95 63269.62 -11506.95
    10 2011q2 -1200.312 63269.62 -11506.95
    1 2011q2 239.7879 63269.62 -11506.95
    7 2011q2 830.6815 63269.62 -11506.95
    2 2011q2 1393.649 63269.62 -11506.95
    5 2011q2 1508.391 63269.62 -11506.95
    8 2011q2 1987.859 63269.62 -11506.95
    6 2011q2 2698.404 63269.62 -11506.95
    9 2011q2 4551.7 63269.62 -11506.95
    12 2011q2 19606.52 63269.62 -11506.95
    3 2011q2 45043.75 63269.62 -11506.95
    4 2011q2 63269.62 63269.62 -11506.95


    I tried the following command
    gen extreme = .
    replace extreme = 1 if (returns == max_return)
    replace extreme = 1 if (returns == min_return)

    however then a get a variable extreme which all have the value .

  • #2
    Code:
    by qrt (returns), sort: gen byte extreme_return ///
        = inlist(returns, returns[1], returns[_N])
    Regarding the code you attempted, I cannot replicate the problem you report. I have imported your data into Stata using the data editor's paste special function, and the code you tried produces 1's in the appropriate observations. I suspect, however, that the example data you provided is not an exact replica of the real data set as Stata sees it internally. It looks like -list- output, which takes certain steps to make the listing look more appealing to human eyes. This beautification, however, comes at the cost of precision of the data. I suspect in your real data, whatever you did to calculate the variables max_return and min_return introduced some rounding error, so that none of the values of returns are an exact match to the values of max-return and min_return. (And, in general, in all software, not just Stata, relying on exact equality of floating point numbers is hazardous.) To verify that this is the case would require having an exact replica of the data as Stata sees it. That can be done by using the -dataex- command to post example data here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    In fact, the vast majority of coding problems require for their solution a proper replica of the data as Stata sees it. That is why it is strongly recommended that you always show example data and always use -dataex- to do that when asking for help with code.

    Finally, let me point out that it is a bad practice in Stata to create indicator variables like this with values 1 and missing. That is a recipe for errors later on. A much better practice is to create such variables with values 1 (yes) and 0 (no), which is what my code here does. Use missing values only when the correct yes/no status cannot be ascertained from the data.
    Last edited by Clyde Schechter; 03 Apr 2024, 13:53.

    Comment

    Working...
    X