Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Moving average by groups

    Dear All,

    I got stuck on a seemingly very simple problem. More precisely, I want to calculate the average of a group's value and the respective value of the preceding group for that variable. Thereby, the value for each group ("threshold") is constant within the group ("country") but varies across groups.

    I tried:

    Code:
    bysort country: gen threshold_new = (threshold[_n-1]+threshold)/2
    However, I want threshold_new to be the same within each country, i.e. I want _n-1 to refer to the previous group and not to the previous observation. There might be a solution with "rangestat", but as I was having trouble installing the package, I was wondering if there is a workaround similar to the code above?

    Many thanks in advance.

  • #2
    You may try asrol
    Code:
    ssc install asrol
    
    * Example data
    webuse grunfeld
    
    bys company : asrol invest, stat(mean) window(year 2)
    list in 1/20
    
         +--------------------------------------------------------------+
         | company   year   invest   mvalue   kstock   time   invest_~2 |
         |--------------------------------------------------------------|
      1. |       1   1935    317.6   3078.5      2.8      1   317.60001 |
      2. |       1   1936    391.8   4661.7     52.6      2       354.7 |
      3. |       1   1937    410.6   5387.1    156.9      3       401.2 |
      4. |       1   1938    257.7   2792.2    209.2      4   334.15001 |
      5. |       1   1939    330.8   4313.2    203.4      5      294.25 |
         |--------------------------------------------------------------|
      6. |       1   1940    461.2   4643.9    207.2      6         396 |
      7. |       1   1941      512   4551.2    255.2      7   486.60001 |
      8. |       1   1942      448   3244.1    303.7      8         480 |
      9. |       1   1943    499.6   4053.7    264.1      9       473.8 |
     10. |       1   1944    547.5   4379.3    201.6     10      523.55 |
         |--------------------------------------------------------------|
     11. |       1   1945    561.2   4840.9      265     11   554.35001 |
     12. |       1   1946    688.1   4900.9    402.2     12   624.64999 |
     13. |       1   1947    568.9   3526.5    761.5     13       628.5 |
     14. |       1   1948    529.2   3254.7    922.4     14   549.05002 |
     15. |       1   1949    555.1   3700.2   1020.1     15   542.14999 |
         |--------------------------------------------------------------|
     16. |       1   1950    642.9   3755.6     1099     16         599 |
     17. |       1   1951    755.9     4833   1207.7     17   699.40002 |
     18. |       1   1952    891.2   4924.9   1430.5     18   823.55002 |
     19. |       1   1953   1304.4   6241.7   1777.3     19      1097.8 |
     20. |       1   1954   1486.7   5593.6   2226.3     20     1395.55 |
         +--------------------------------------------------------------+
    If you are facing a problem in installing asrol, please provide details.
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    FinTechProfessor.com
    https://asdocx.com
    Check out my asdoc program, which sends outputs to MS Word.
    For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

    Comment


    • #3
      please post a data example using dataex, also did you install rangestat by typing,
      Code:
      ssc install rangestat
      ?

      Comment


      • #4
        Hi, thanks for your answers.

        @Oyvind: Rangestat installation worked now (had to do it offline), and here is a simple data example:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(country risk threshold)
        1 .01 .2
        1 .02 .2
        1 .03 .2
        2 .01 .4
        2 .02 .4
        2 .03 .4
        3 .01 .7
        3 .02 .7
        3 .03 .7
        end
        But if I now do
        Code:
        rangestat (mean) threshold, interval(country -1 1) by(country)
        I obtain the moving average within each group. However, I would like to obtain the moving average across group (always the average of n and n-1). That is, "threshold_mean" should be 0.2 for country 1, 0.3 for country 2, and 0.55 for country 3.

        @ Attaullah: Thanks for the suggestion. Wouldn't I run into the same issue with asrol as I did with rangestat above?

        Comment


        • #5
          there are probably better solutions but if threshold is constant within country,
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input float(country risk threshold)
          1 .01 .2
          1 .02 .2
          1 .03 .2
          2 .01 .4
          2 .02 .4
          2 .03 .4
          3 .01 .7
          3 .02 .7
          3 .03 .7
          end
          
          tempfile temp
          save `temp'
          
          collapse threshold, by(country)
          gen wanted = cond(_n>1, (threshold[_n-1] + threshold)/2, threshold)
          merge 1:m country using `temp', nogen

          Comment


          • #6
            Code:
              
             rangestat (mean) threshold, interval(country -1 1) by(country)

            won't, I guess, help at all. Even if country is numeric then what this would do is something like averaging over countries 12, 13, 14, given country 13, except that by(country) will enforce separate calculations. Why would country 12 be the previous group for 13, and so forth, and if you want to use the previous group why use the following group too.

            Also, this code says nothing whatsoever about values previous or following in time.

            I did not post a positive answer earlier because I didn't understand what you want to do, and that is still true.

            @Attaullah Shah's interpretation of your question is that for each country you want the average of the previous value (if available) and the present value. Your wording sounds quite different from that, except that I cannot follow what your wording means.

            Comment


            • #7
              Nick Cox Thanks for your response. Starting from the simple example I provided above, I need a moving average for the "threshold" variable over groups, whereby threshold is always constant within a group but differs across groups.

              @Ovind Snilsberg Many thanks for the help, this code gives me the output I need.

              Comment


              • #8
                I am glad to hear that #5 solves your problem but remain completely clueless about how countries can be put in an order here. Your "simple" problem has three different reactions from three experienced people: I don't understand it; Attaullah Shah reads it as being about averages in timel and Oyvind Snilsberg apparently guessed right.
                Last edited by Nick Cox; 24 Jan 2022, 06:51.

                Comment


                • #9
                  I am sorry if the wording of my example has created confusion. In fact, "country" was just an example name for a numerical variable that came up to my mind, you could also name the variable "risk_level" or whatever other name. I was more focusing on the problem of having different groups, for which I want to generate a moving average that is constant within a group but differs across groups (i.e. where we calculate an average for the "threshold" value of 2 groups).

                  Hope it helps.

                  Comment


                  • #10
                    #9 certainly helps. Thanks.

                    Comment


                    • #11
                      I am happy that I could clarify this. Working in the same dataset, I was encountering another technical issue. Please let me know in case this should be posted in a different/new thread.

                      In the following, I provide a data example based on that from before (I replaced "country" by "risk_level"):

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input float(risk_level risk_aversion capital minimum_capital)
                      1 .01   44 12.5
                      1 .02   34 12.5
                      1 .03 12.5 12.5
                      1 .04 12.5 12.5
                      1 .05   19 12.5
                      2 .01   20   13
                      2 .02   21   13
                      2 .03   13   13
                      2 .04   13   13
                      2 .05   13   13
                      3 .01   36   18
                      3 .02   18   18
                      3 .03   28   18
                      3 .04   18   18
                      3 .05   39   18
                      end


                      Here, my aim is to create a variable "riskaversion_min" that equals the risk_aversion value of the observation with the minimum value for "capital" within each risk-level. As you can see, for each risk level/within each group there are >1 observations that meet the minimum capital-value requirement.

                      I am now wondering, if there is a possibility to always choose the value for risk_aversion, such that the distance between minimum "risk_aversion" values of two neighbouring groups/risk-levels is minimised.

                      So, in this example "riskaversion_min" should equal 0.3 for risk-level 1 and 0.3 for risk-level 2, since these are the values for risk_aversion that correspond to observations that have a minimum for "capital" within their group, but at the same time minimise the distance between the "risk_aversion" values of two neighbouring groups.

                      This code

                      Code:
                      bysort risk_level: gen riskaversion_min = cond(min(risk_aversion[_n-1] - risk_aversion[_n]) & capital==minimum_capital, risk_aversion, .)
                      is technically wrong, but hopefully gives you an idea of what I have in mind.

                      Thanks a lot!
                      Last edited by Roman von Vorgestern; 24 Jan 2022, 08:29.

                      Comment

                      Working...
                      X