Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving Median

    I am trying to find the moving median for profit. ticker (date): egen medprofit=median(profit) won't do as this computes the median of all the observations within the group. mvsumm profit, force end stat(med) window(30) generate(medprofit) is also a no-go as it requires no gaps for the time variable and I do not wish to fill the gaps using tsfill. I would like to find the median profit for (1) the non-missing values of the last 30 observations (containing both missing and non-missing values) within ticker (date) and for (2) the non-missing values of the last 30 observations (containing only non-missing values) within ticker (date)

  • #2
    rangestat (SSC) can do what you want, I think, except that I don't get the difference between (1) and (2) as missings will always get ignored.

    Comment


    • #3
      Hey Nick, thanks for the help. (1) would generate a median for the last 30 observations, regardless of whether profit values are missing for some of the dates, whereas (2) generates a median for the last 30 non-missing profit values, so it would include observations beyond the 30 observation window if some of the values were missing.

      Comment


      • #4
        This may help:

        Code:
        . clear
        
        . set obs 10 
        number of observations (_N) was 0, now 10
        
        . gen t = _n 
        
        . set seed 2803 
        
        . gen foo = runiformint(1, 10) if mod(_n, 2) 
        (5 missing values generated)
        
        . gen counter = sum(!missing(foo))  
        
        . 
        . rangestat (median) m1=foo (count) c1=foo, int(t -4 0) 
        
        . rangestat (median) m2=foo (count) c2=foo, int(counter -4 0)
        
        . 
        . list 
        
             +-----------------------------------------+
             |  t   foo   counter    m1   c1   m2   c2 |
             |-----------------------------------------|
          1. |  1     6         1     6    1    6    1 |
          2. |  2     .         1     6    1    6    1 |
          3. |  3     2         2     4    2    4    2 |
          4. |  4     .         2     4    2    4    2 |
          5. |  5     6         3     6    3    6    3 |
             |-----------------------------------------|
          6. |  6     .         3     4    2    6    3 |
          7. |  7     1         4     2    3    4    4 |
          8. |  8     .         4   3.5    2    4    4 |
          9. |  9     3         5     3    3    3    5 |
         10. | 10     .         5     2    2    3    5 |
             +-----------------------------------------+

        Comment


        • #5
          thanks Nick. any way to change the median to refer to the last five observations instead of the current observation and the last four observations?

          Comment


          • #6
            Do please refer to the help. int(time -5 -1) would be some syntax.

            Comment


            • #7
              ok. I entered the following code:

              Code:
              egen tickergroup=group(ticker)
              bysort tickergroup: gen t = _n
              by tickergroup: gen counter = sum(!missing(profit))  
              rangestat (median) medprofit= profit(count) c1profit=profit, int(t -35 -1)
              I got the following error message: "factor variables and time-series operators not allowed". Profit is definitely not one of these. Ticker was a categorical variable and tickergroup would be a factor variable. However, I really need to use by with my code and I don't know how to proceed

              Comment


              • #8
                set trace on would give you (and us) a precise idea of what is failing where, particularly as you don't give a data example. (Please refer to FAQ Advice #12.)

                But it is easy to guess what is happening.


                rangestat isn't as flexible with spurious spaces as you're implying there. The syntax really is as documented new_varname=varname. My guess is that the error message is a side-effect of confusion given your syntax.

                The help for rangestat does explain, more than once and with many examples, that you need by() here.

                by(varlist) groups observations, so that statistics are generated using only observations within the same
                group. For example, this option should be specified when you wish calculations to be restricted to
                given panels or given times for panel or longitudinal data.
                This works for example:

                Code:
                clear
                set obs 10
                gen t = _n
                set seed 2803
                gen foo = runiformint(1, 10) if mod(_n, 2)
                gen counter = sum(!missing(foo))  
                rangestat (median) m1=foo (count) c1=foo, int(t -5 -1)
                rangestat (median) m2=foo (count) c2=foo, int(counter -5 -1)
                expand 2
                gen group = _n > 10
                rangestat (median) M2=foo (count) C2=foo, int(counter -5 -1) by(group)
                
                list, sepby(group)
                
                     +-----------------------------------------------------------+
                     |  t   foo   counter    m1   c1   m2   c2   group   M2   C2 |
                     |-----------------------------------------------------------|
                  1. |  1     6         1     .    .    .    .       0    .    . |
                  2. |  2     .         1     6    1    .    .       0    .    . |
                  3. |  3     2         2     6    1    6    1       0    6    1 |
                  4. |  4     .         2     4    2    6    1       0    6    1 |
                  5. |  5     6         3     4    2    4    2       0    4    2 |
                  6. |  6     .         3     6    3    4    2       0    4    2 |
                  7. |  7     1         4     4    2    6    3       0    6    3 |
                  8. |  8     .         4     2    3    6    3       0    6    3 |
                  9. |  9     3         5   3.5    2    4    4       0    4    4 |
                 10. | 10     .         5     3    3    4    4       0    4    4 |
                     |-----------------------------------------------------------|
                 11. |  1     6         1     .    .    .    .       1    .    . |
                 12. |  2     .         1     6    1    .    .       1    .    . |
                 13. |  3     2         2     6    1    6    1       1    6    1 |
                 14. |  4     .         2     4    2    6    1       1    6    1 |
                 15. |  5     6         3     4    2    4    2       1    4    2 |
                 16. |  6     .         3     6    3    4    2       1    4    2 |
                 17. |  7     1         4     4    2    6    3       1    6    3 |
                 18. |  8     .         4     2    3    6    3       1    6    3 |
                 19. |  9     3         5   3.5    2    4    4       1    4    4 |
                 20. | 10     .         5     3    3    4    4       1    4    4 |
                     +-----------------------------------------------------------+
                By extension something like this should work with your data (surely you have a time variable already?)

                Code:
                rangestat (median) medprofit=profit (count) c1profit=profit, int(time -35 -1) by(ticker)
                I note that you're now using 35 not 30 as the number in the window and presume that's deliberate.


                Note: compare your last thread https://www.statalist.org/forums/for...other-variable where you promised to use dataex next time!





                Last edited by Nick Cox; 30 Jun 2018, 03:13.

                Comment


                • #9
                  Thanks Nick. sorry for not using dataex. I wasnt sure if it applied to this question, but i see how it is helpful to always provide examples. I made the last correction you stated but I still come up with the same error. With set trace on:
                  rangestat (median) medmoneylastmedcombined= moneylastmedcombined(count) c1moneylastmedcombined=moneylastmedcombined, int(date -35 -1) by(ticker)
                  = if "=" == "=" {
                  - if !`: list statword in single_stats' {
                  = if !1 {
                  dis as error "new_varname=varname syntax is restricted to built-in single stats"
                  exit 198
                  }
                  - confirm name `what'
                  = confirm name medmoneylastmedcombined
                  - local vres`n' `vres`n'' `what'
                  = local vres1 medmoneylastmedcombined
                  - gettoken v slist: next
                  - unab v : `v'
                  = unab v : moneylastmedcombined(count)
                  --------------------------------------------------------------------- begin unab ---
                  - version 6
                  - gettoken user 0: 0, parse(" :")
                  - gettoken colon 0: 0, parse(" :")
                  - if `"`colon'"' != ":" { error 198 }
                  = if `":"' != ":" { error 198 }
                  - syntax [varlist(default=empty)] [, MIN(integer 1) MAX(integer 32767) NAME(string)]
                  factor variables and time-series operators not allowed
                  ----------------------------------------------------------------------- end unab ---
                  I am not sure how to read this so hoping for advice
                  Last edited by michael joe; 30 Jun 2018, 06:07.

                  Comment


                  • #10
                    It's just more sloppy syntax. Syntax diagrams are intended to be taken literally unless and until a user knows better. The trace shows that Stata thinks that you intended

                    Code:
                    moneylastmedcombined(count)
                    to be regarded as a variable name. Stata is not seeing (count) separately as is needed for the syntax to work. The space after the variable name is essential. So, try

                    Code:
                    rangestat (median) medmoneylastmedcombined=moneylastmedcombined (count) c1moneylastmedcombined=moneylastmedcombined, int(date -35 -1) by(ticker)

                    Comment


                    • #11
                      Ahhh. Thanks a bunch. Finally worked.

                      Comment

                      Working...
                      X