Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Egen xtile & portfolio sorting

    Hi everybody,
    this is the dataset that I have:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long permno float date double Return long numboftrades float(MeR RF MEt Exret) double firstbeta float(monthidio dummyendofmonth)
    10000  9527 -.014085 .  .0092 .00025     16100 -.014335  2.5143529763022494   .06413759 1
    10000  9555        0 .  .0019 .00028     11960  -.00028   .9500515549985641   .03131281 1
    10000  9586  .007092 .  .0006  .0003     16330  .006792    .975191955026165   .04484615 1
    10000  9616 -.015385 .  -.019 .00024     15172 -.015625   .5980838126958284  .013047387 1
    10000  9646  .015306 . -.0013 .00023 11793.878  .015076   .2896041894082146   .03917417 1
    10000  9677  .010204 .  .0052 .00025 11734.594  .009954  .38846733224656743  .019607043 1
    10000  9708  .096386 . -.0012 .00024 10786.344  .096146   .5768399748387934   .04623256 1
    10000  9737        0 .  .0004 .00022 4148.5938  -.00022   .3954379265009734     .096399 1
    10000  9769  .015385 .  .0065 .00021  3911.531  .015175   .5341150474454677   .04805862 1
    10000  9800        0 .  .0005  .0002  3002.344   -.0002   .6142194073333234   .04291996 1
    10000  9828        0 .  .0019 .00021  3182.504  -.00021   .5121709319162139   .03591325 1
    10000  9861        0 . -.0037 .00022  1981.566  -.00022   .6205842421862775   .04912256 1
    10000  9891        0 . -.0002  .0002 1581.5313   -.0002   .4051215011366245   .03017019 1
    10000  9919 -.071429 .  .0037 .00023 1581.5313 -.071659   .3677733912958665  .025082354 1
    10000  9951 -.111111 .  .0069 .00021    973.25 -.111321  .01944200731215832   .05910159 1
    10000  9981  .153846 .  .0112 .00021  912.4413  .153636  .10909586043253999   .04275477 1
    10000 10010        0 . -.0004 .00019  851.5938  -.00019 -.20707865128488628   .01545156 1
    10000 10024        . .  .0086 .00022         .        .                   . .0011192125 1
    10001  9527  .010309 .  .0092 .00025  6033.125  .010059    .807071924428063  .011264178 1
    10001  9555 -.019608 .  .0019 .00028   6156.25 -.019888    .272703714704386   .00933914 1
    end
    format %td date
    My aim is to sort stocks (identified by permno) into 10 portfolios based on the volatility of their residuals (monthidio). I have to repeat the sorting at the end of every month. I call that variable "monthidio" because I have already reduced the complete dataset to a smaller one that contains only the dates at the end of eevry month (as you can seee). Now, to sort stocks into 10 portfolios based on monthidio in every date I run:

    egen voladecile= xtile(monthidio), by(date) nq(10)

    which gives me the error message "too many values". This is very strange for me because that same command worked on a similar dataset which contained even more values for both dates and monthidio. Do you have an idea to why this happen and how to solve the issue?

    I have already read https://www.stata.com/statalist/arch.../msg00365.html and https://www.statalist.org/forums/for...y-values-error but it doesn't seem to exactly fit the my case, because I would like to avoid loops and because I actually have missing values in my variable monthidio and thus what I have differs from these two cases.
    Thank you in advance

  • #2
    I wanted to add that a very mechanical solution I found would be the one of splitting the dataset into parts (for example 3 ) with subsamples by years (e.g. first from 1963 to 1982, second 1983 to 2000 and so on) and then simply append the results (I have permno date as unique identifiers). However, I find this is not an elegant solution and I hope someone can help me doing this all at once.

    Comment


    • #3
      The xtile() function of egen is community-contributed (ssc install egenmore), as you're asked to explain (FAQ Advice #12).

      To get more information on why it's failing you need something like

      Code:
      set trace on 
      set traced 2
      before your egen call. I don't have any guesses what is wrong here.

      Comment


      • #4
        --------------------------------------------------------------------------------------------------------------------------------- begin egen ---
        - version 6, missing
        - local cvers = _caller()
        - gettoken type 0 : 0, parse(" =(")
        - gettoken name 0 : 0, parse(" =(")
        - if `"`name'"'=="=" {
        = if `"="'=="=" {
        - local name `"`type'"'
        = local name `"voladecile"'
        - local type : set type
        - }
        - else {
        gettoken eqsign 0 : 0, parse(" =(")
        if `"`eqsign'"' != "=" {
        error 198
        }
        }
        - confirm new variable `name'
        = confirm new variable voladecile
        - gettoken fcn 0 : 0, parse(" =(")
        - gettoken args 0 : 0, parse(" ,") match(par)
        - if "`c(adoarchive)'"=="1" {
        = if ""=="1" {
        capture qui _stfilearchive find _g`fcn'.ado
        if _rc {
        di as error "unknown egen function `fcn'()"
        exit 133
        }
        }
        - else {
        - capture qui findfile _g`fcn'.ado
        = capture qui findfile _gxtile.ado
        - if (`"`r(fn)'"' == "") {
        = if (`"c:\ado\plus/_/_gxtile.ado"' == "") {
        di as error "unknown egen function `fcn'()"
        exit 133
        }
        - }
        - if `"`par'"' != "(" {
        = if `"("' != "(" {
        exit 198
        }
        - if `"`args'"' == "_all" | `"`args'"' == "*" {
        = if `"monthidio"' == "_all" | `"monthidio"' == "*" {
        version 7.0, missing
        unab args : _all
        local args : subinstr local args "`_sortindex'" "", all word
        version 6.0, missing
        }
        - syntax [if] [in] [, *]
        - if _by() {
        local byopt "by(`_byvars')"
        local cma ","
        }
        - else if `"`options'"' != "" {
        = else if `"by(date) nq(10)"' != "" {
        - local cma ","
        - }
        - tempvar dummy
        - global EGEN_Varname `name'
        = global EGEN_Varname voladecile
        - version 7.0, missing
        - global EGEN_SVarname `_sortindex'
        = global EGEN_SVarname __000000
        - version 6.0, missing
        - if ("`fcn'" == "mode" | "`fcn'" == "concat") {
        = if ("xtile" == "mode" | "xtile" == "concat") {
        local vv : display "version " string(`cvers') ", missing:"
        }
        - capture noisily `vv' _g`fcn' `type' `dummy' = (`args') `if' `in' `cma' `byopt' `options'
        = capture noisily _gxtile float __000001 = (monthidio) , by(date) nq(10)
        too many values
        - global EGEN_SVarname
        - global EGEN_Varname
        - if _rc { exit _rc }
        ----------------------------------------------------------------------------------------------------------------------------------- end egen ---

        Thank you Nick. This is what I get (sorry but I didn't find if there is a way to post the code results as in dataex for the dataset). On the fourth-to-last line is the error. I am not an expert of Stata code of this type unfortunately. Do you have an idea of what is going on? I mean, you have already said you have no guesses, have you already looked into the code results?

        Comment


        • #5
          Yes, I looked at the code before posting my previous.

          The way to show code properly is to use CODE delimiters. This is explained in the FAQ Advice, tweaked slightly here: :

          12.3 How to use CODE delimiters



          Stata code (i.e. the exact commands issued) is very much easier to read if presented as such.

          When you are editing an answer you should see a # button in the toolbar above the text area. Click on # to insert beginning and end CODE mark-up. Write your code between, paying particular attention to linebreaks and indentation.

          If you do not see that button, then click on the “Toggle Advanced Editor” button (an underlined A) in the area above to show the toolbar.

          If you do not have access to the Advanced Editor in your interface, you can just insert those mark-ups manually before, or indeed after, you insert your code. Many people fast at typing do that any way.

          Examples of your data (or of realistic similar datasets) are also much easier to read if presented as CODE. dataex, explained just above, automatically generates text including CODE delimiters, which can be copied and pasted into Statalist posts.

          What is valuable with presenting code or data as CODE is that other members can easily copy and paste what you post to play with in their Stata installation.
          You need

          Code:
          set traced 3
          as you seem to be running code from a script and we only need to see an expansion of

          Code:
          - capture noisily `vv' _g`fcn' `type' `dummy' = (`args') `if' `in' `cma' `byopt' `options'
          = capture noisily _gxtile float __000001 = (monthidio) , by(date) nq(10)
          too many values
          - global EGEN_SVarname
          - global EGEN_Varname
          - if _rc { exit _rc }

          Comment

          Working...
          X