Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Geometric mean in the row

    Hello,

    I'm looking around since few hours but I didn't find a solution.

    I would like to calculate the Geometric Mean in the row.
    I tried first to calculate the products of the row with egen newvar = rprod(varlist) and that worked.
    Now I need to calculate for the product of every row the nth root. I have 7 variables so I need the 7th root of the products.
    I didn't find any solution to calculate the nth root.
    I Hope someone can help me

    Thank you

    Lorenzo

  • #2
    You can use the rprod() function here but you need to document for Statalist where it comes from. See FAQ Advice #12.

    dm87 from http://www.stata.com/stb/stb60
    STB-60 dm87. Calculating the row product of observations / STB insert by
    Philip Ryan, University of Adelaide, Australia / Support:
    [email protected] / After installation, see help rprod


    I think this yields to a loop and standard functions. This is a self-contained example with an independent check.

    Code:
    clear
    set obs 10
    set seed 2803
    forval j = 1/5 {
          gen y`j' = ceil(100 * (runiform()^2))
    }
    
    list
    
         +-------------------------+
         | y1   y2   y3    y4   y5 |
         |-------------------------|
      1. | 86   63   45     8    1 |
      2. | 12   40   73   100    4 |
      3. | 60    1   74    61    4 |
      4. |  2    1    4     2   54 |
      5. | 12    1   22    22    4 |
         |-------------------------|
      6. |  1    7   15    84   14 |
      7. |  4    1   12    94    7 |
      8. | 40    2   15     2   89 |
      9. | 16   34   25     7    6 |
     10. | 15    6    3    44    6 |
         +-------------------------+
    
    gen double gm = y1
    
    quietly forval j = 2/5 {
        replace gm = gm * y`j'
    }
    
    replace gm = exp(log(gm)/5)
    
    list
    
         +-------------------------------------+
         | y1   y2   y3    y4   y5          gm |
         |-------------------------------------|
      1. | 86   63   45     8    1   18.114581 |
      2. | 12   40   73   100    4   26.873536 |
      3. | 60    1   74    61    4   16.104771 |
      4. |  2    1    4     2   54   3.8663641 |
      5. | 12    1   22    22    4   7.4682237 |
         |-------------------------------------|
      6. |  1    7   15    84   14   10.430841 |
      7. |  4    1   12    94    7   7.9413333 |
      8. | 40    2   15     2   89   11.639123 |
      9. | 16   34   25     7    6   14.169602 |
     10. | 15    6    3    44    6   9.3453063 |
         +-------------------------------------+
    
    matrix test = (86, 63, 45, 8, 1)
    
    gen test = test[1, _n]
    
    means test
    
        Variable |    Type             Obs        Mean       [95% Conf. Interval]
    -------------+---------------------------------------------------------------
            test | Arithmetic            5        40.6       -4.225618   85.42562
                 |  Geometric            5    18.11458        1.794746   182.8326
                 |   Harmonic            5    4.256322               .          .
    -----------------------------------------------------------------------------
    Missing values in confidence intervals for harmonic mean indicate
    that confidence interval is undefined for corresponding variables.
    Consult Reference Manual for details.
    .

    Comment


    • #3
      Hello Nick,

      I didn't understand so much about your solution. I guess there is not the possibility to do a nth root in stata, is it correct?

      Following 20obs of my dataset

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double Year str19 Firm_uniquecode float(eco_subsidies_ha share_legumes variety_per_ha share_cereals_rev easysoluble_fert_ha_rev protection_products_per_ha_rev share_external_input_rev)
      2010 "2222009010010000001"         0          0 .031354517  .3636364 259.68002    367.71  .4532276
      2010 "2222008180614000001"  275.3485          0  .37402245  1.293252     259.7    367.71  .8092318
      2011 "2222008180614000001" 267.90164          0  .33681375 1.2444043     259.7    357.86  .7939668
      2012 "2222008180614000001" 255.12402 .031984337   .2610966 1.2389628     243.6    357.86  .7336684
      2013 "2222008180614000001" 256.43503          0   .2804612 1.2891583     243.6    363.15  .7828866
      2014 "2222008180614000001"  233.2149  .01776199   .2072232 1.2221326     259.7       366  .7971369
      2014 "2222008070396000001" .01724138    .862069  .03448276  .8636364     259.7       366  .7375501
      2012 "2502012035204000001"         0          0 .074074075 1.3636364     259.7    332.47  .7254438
      2010 "2302008005368000001"         0   .3444043  .28880867 1.2957664    252.98    355.84  .6092678
      2008 "2302008000358000001" 143.21817  .04691491  .05377067 .41055125    255.85  329.6348  .7618185
      2009 "2302008000358000001"         0 .005108214    .040328  .3687446     259.7  326.1545  .5595968
      2010 "2302008000306000001"         0   .2112331    .039208  .6063339    247.34     367.6  .5560856
      2010 "2302008032024000001"         0  .07454707 .014850015  .4381834     259.7    367.73  .5415391
      2010 "2302008002190000001"         0          0 .005250998  .3636364 256.93002    364.09  .6694973
      2011 "2302008032024000001"  46.56222  .07699732 .014850015  .4406337    252.03     360.8  .4913416
      2012 "2302011093120000001"  165.7143   .7005291   .3174603 1.3636364    252.03 264.39502  .7458543
      2013 "2302011093120000001"  81.28588   .6808266   .3444317 1.3636364    252.03 340.67105  .7536048
      2014 "2302011093120000001"         0        .25   .3826531 1.1723099    248.37  337.9313  .7730006
      2010 "2302008000223000001"         0          0  .14587893   .462834 259.65002    316.11  .7484564
      2010 "2302008002341000001"         0   .5370213  .12765957  .9006577     259.7    316.11 .58342004
      end

      Comment


      • #4
        My solution is general. I showed you the geometric mean of 5 numbers, which is the 5th root of the product of 5 numbers. Notice how the solution for the first row that I got matches what the means command produces.

        Otherwise put, the geometric mean is obtained by exponentiation of the mean of the natural logarithms of the values. You don't need a general root function or operator (which is pretty rare in software, and if available usually implemented that way any way).

        At most you just need to change 5 to whatever number is relevant.
        Last edited by Nick Cox; 20 Oct 2016, 08:58.

        Comment


        • #5
          If iI understood correctly about your example;
          You first created a random set of variables.
          then you created a loop in order to multiply the first variable for the other four.
          then you substituted variables' products (gm) for the log of itself divided by 5.
          then you check with a matrix

          Is it correct?

          Comment


          • #6
            Correct. I needed a data example because you didn't give one originally. But you should just use your data.

            Comment


            • #7
              Ok thank you.

              I only have last doubt.
              the geometric mean is obtained by exponentiation of the mean of the natural logarithms of the values.
              is it the same as calculate the products of seven numbers and then doing the 7th root of that? (this is my lack in mathematics knowledge)

              thank you

              Comment


              • #8
                Yes. The only warning is that the 7th root can be zero or negative (the 7th root of 0 is 0 and of -1 is -1) whereas the use of logarithms depends on all values being positive.

                I am remembering this from secondary school mathematics. Perhaps it is not taught any more.

                Comment


                • #9
                  Hello again,

                  I realised that I have some negative values. When I do the logarithm STATA gives back missing value. What are the solutions for this? Using the ratio as exponential?
                  thank you

                  Comment


                  • #10
                    Either the geometric mean makes no sense in that circumstance or minimally you can't calculate it that way. You perhaps should explain what the data are and why a geometric mean is wanted.

                    Comment


                    • #11
                      Perhaps I am missing something, but you can calculate the 7th root of a value in any programming package, including Stata by remembering that the 7th root = the 1/7th power.

                      Code:
                      clear
                      set obs 10
                      
                      gen x = rnormal()
                      gen x7 = x^7
                      gen x_again = x7^(1/7)
                      Note that it will still only work with positive numbers.

                      PS: A lot more complicated is to find the root of an equation, perhaps that's the source of the confusion?

                      Comment


                      • #12
                        I had negative values because I reversed the scale of some variables in order to have the same rule for everyone: the higher the number the better it is.
                        I checked and with geometric aggregation is not necessary to have the same scale for every variables. So I will use the normal (not-reversed) dataset.

                        Another problem is for the 0 value in some variables. In that case the product of the row is 0 and that want I don't want. Neither the missing value. Which solutions do you suggest for it?

                        Thank you

                        Comment


                        • #13
                          Jesse is right of course. But as he says, the power operator does less than you think and can't cope with negative roots even when they are perfectly well defined, because in practice it just works internally by taking logarithms, etc.

                          A general root operator would be able to return negative roots for example for odd integer powers. It's that which is missing.

                          More discussion at http://www.stata-journal.com/sjpdf.h...iclenum=st0223

                          http://blog.stata.com/2011/01/20/how...ulates-powers/

                          Comment


                          • #14
                            Sorry, I don't understand #12. You are asking for geometric means when they are doubly undefined by there being zeros and negative values. Again, why do you think you need geometric means?

                            I don't recognise "geometric aggregation", so that needs to be explained too, unless someone else jumps in.

                            Also, the data here remain totally anonymous. Some context might make clearer what you should be doing.

                            Comment


                            • #15
                              Yes maybe is better to explain what I'm doing, in short.

                              In my dataset I have around 10k farms and for each of them 7 variables, that are indicators of, for example, kg/ha of fertiliser used, Kg/ha of plant protection product, share of legumes on total agricultural area etc.
                              I want to aggregate all the 7 variables in order to build a composite indicator which should provide a sort of environmental score for each farm.
                              To aggregate variables there are several different methods and one of them is the geometric aggregation that consist of geometric mean.

                              I have some 0 values because of course some farms doesn't use fertiliser at all.

                              In order to eliminate zero values I was thinking to standardise the variables using
                              Code:
                              egen newvar=std(varlist)
                              The problem of standardisation is that I obtain some negative values





                              Last edited by Lore Fuma; 20 Oct 2016, 11:28.

                              Comment

                              Working...
                              X