Geometric mean in the row

Lore Fuma

Join Date: Aug 2016

Posts: 17
#1

Geometric mean in the row

20 Oct 2016, 05:15

Hello,

I'm looking around since few hours but I didn't find a solution.

I would like to calculate the Geometric Mean in the row.
I tried first to calculate the products of the row with egen newvar = rprod(varlist) and that worked.
Now I need to calculate for the product of every row the nth root. I have 7 variables so I need the 7th root of the products.
I didn't find any solution to calculate the nth root.
I Hope someone can help me

Thank you

Lorenzo
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35697

20 Oct 2016, 05:37

You can use the rprod() function here but you need to document for Statalist where it comes from. See FAQ Advice #12.

dm87 from http://www.stata.com/stb/stb60
STB-60 dm87. Calculating the row product of observations / STB insert by
Philip Ryan, University of Adelaide, Australia / Support:
[email protected] / After installation, see help rprod

I think this yields to a loop and standard functions. This is a self-contained example with an independent check.

Code:

clear
set obs 10
set seed 2803
forval j = 1/5 {
      gen y`j' = ceil(100 * (runiform()^2))
}

list

     +-------------------------+
     | y1   y2   y3    y4   y5 |
     |-------------------------|
  1. | 86   63   45     8    1 |
  2. | 12   40   73   100    4 |
  3. | 60    1   74    61    4 |
  4. |  2    1    4     2   54 |
  5. | 12    1   22    22    4 |
     |-------------------------|
  6. |  1    7   15    84   14 |
  7. |  4    1   12    94    7 |
  8. | 40    2   15     2   89 |
  9. | 16   34   25     7    6 |
 10. | 15    6    3    44    6 |
     +-------------------------+

gen double gm = y1

quietly forval j = 2/5 {
    replace gm = gm * y`j'
}

replace gm = exp(log(gm)/5)

list

     +-------------------------------------+
     | y1   y2   y3    y4   y5          gm |
     |-------------------------------------|
  1. | 86   63   45     8    1   18.114581 |
  2. | 12   40   73   100    4   26.873536 |
  3. | 60    1   74    61    4   16.104771 |
  4. |  2    1    4     2   54   3.8663641 |
  5. | 12    1   22    22    4   7.4682237 |
     |-------------------------------------|
  6. |  1    7   15    84   14   10.430841 |
  7. |  4    1   12    94    7   7.9413333 |
  8. | 40    2   15     2   89   11.639123 |
  9. | 16   34   25     7    6   14.169602 |
 10. | 15    6    3    44    6   9.3453063 |
     +-------------------------------------+

matrix test = (86, 63, 45, 8, 1)

gen test = test[1, _n]

means test

    Variable |    Type             Obs        Mean       [95% Conf. Interval]
-------------+---------------------------------------------------------------
        test | Arithmetic            5        40.6       -4.225618   85.42562
             |  Geometric            5    18.11458        1.794746   182.8326
             |   Harmonic            5    4.256322               .          .
-----------------------------------------------------------------------------
Missing values in confidence intervals for harmonic mean indicate
that confidence interval is undefined for corresponding variables.
Consult Reference Manual for details.

Comment

Lore Fuma

Join Date: Aug 2016
Posts: 17

20 Oct 2016, 07:27

Hello Nick,

I didn't understand so much about your solution. I guess there is not the possibility to do a nth root in stata, is it correct?

Following 20obs of my dataset

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double Year str19 Firm_uniquecode float(eco_subsidies_ha share_legumes variety_per_ha share_cereals_rev easysoluble_fert_ha_rev protection_products_per_ha_rev share_external_input_rev)
2010 "2222009010010000001"         0          0 .031354517  .3636364 259.68002    367.71  .4532276
2010 "2222008180614000001"  275.3485          0  .37402245  1.293252     259.7    367.71  .8092318
2011 "2222008180614000001" 267.90164          0  .33681375 1.2444043     259.7    357.86  .7939668
2012 "2222008180614000001" 255.12402 .031984337   .2610966 1.2389628     243.6    357.86  .7336684
2013 "2222008180614000001" 256.43503          0   .2804612 1.2891583     243.6    363.15  .7828866
2014 "2222008180614000001"  233.2149  .01776199   .2072232 1.2221326     259.7       366  .7971369
2014 "2222008070396000001" .01724138    .862069  .03448276  .8636364     259.7       366  .7375501
2012 "2502012035204000001"         0          0 .074074075 1.3636364     259.7    332.47  .7254438
2010 "2302008005368000001"         0   .3444043  .28880867 1.2957664    252.98    355.84  .6092678
2008 "2302008000358000001" 143.21817  .04691491  .05377067 .41055125    255.85  329.6348  .7618185
2009 "2302008000358000001"         0 .005108214    .040328  .3687446     259.7  326.1545  .5595968
2010 "2302008000306000001"         0   .2112331    .039208  .6063339    247.34     367.6  .5560856
2010 "2302008032024000001"         0  .07454707 .014850015  .4381834     259.7    367.73  .5415391
2010 "2302008002190000001"         0          0 .005250998  .3636364 256.93002    364.09  .6694973
2011 "2302008032024000001"  46.56222  .07699732 .014850015  .4406337    252.03     360.8  .4913416
2012 "2302011093120000001"  165.7143   .7005291   .3174603 1.3636364    252.03 264.39502  .7458543
2013 "2302011093120000001"  81.28588   .6808266   .3444317 1.3636364    252.03 340.67105  .7536048
2014 "2302011093120000001"         0        .25   .3826531 1.1723099    248.37  337.9313  .7730006
2010 "2302008000223000001"         0          0  .14587893   .462834 259.65002    316.11  .7484564
2010 "2302008002341000001"         0   .5370213  .12765957  .9006577     259.7    316.11 .58342004
end

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35697
#4

20 Oct 2016, 07:55

My solution is general. I showed you the geometric mean of 5 numbers, which is the 5th root of the product of 5 numbers. Notice how the solution for the first row that I got matches what the means command produces.

Otherwise put, the geometric mean is obtained by exponentiation of the mean of the natural logarithms of the values. You don't need a general root function or operator (which is pretty rare in software, and if available usually implemented that way any way).

At most you just need to change 5 to whatever number is relevant.

Last edited by Nick Cox; 20 Oct 2016, 07:58.
Comment
Lore Fuma

Join Date: Aug 2016

Posts: 17
#5

20 Oct 2016, 07:59

If iI understood correctly about your example;
You first created a random set of variables.
then you created a loop in order to multiply the first variable for the other four.
then you substituted variables' products (gm) for the log of itself divided by 5.
then you check with a matrix

Is it correct?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

20 Oct 2016, 08:00

Correct. I needed a data example because you didn't give one originally. But you should just use your data.
Comment
Lore Fuma

Join Date: Aug 2016

Posts: 17
#7

20 Oct 2016, 08:06

Ok thank you.

I only have last doubt.

the geometric mean is obtained by exponentiation of the mean of the natural logarithms of the values.

is it the same as calculate the products of seven numbers and then doing the 7th root of that? (this is my lack in mathematics knowledge)

thank you
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

20 Oct 2016, 08:13

Yes. The only warning is that the 7th root can be zero or negative (the 7th root of 0 is 0 and of -1 is -1) whereas the use of logarithms depends on all values being positive.

I am remembering this from secondary school mathematics. Perhaps it is not taught any more.
Comment
Lore Fuma

Join Date: Aug 2016

Posts: 17
#9

20 Oct 2016, 09:23

Hello again,

I realised that I have some negative values. When I do the logarithm STATA gives back missing value. What are the solutions for this? Using the ratio as exponential?
thank you
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#10

20 Oct 2016, 09:33

Either the geometric mean makes no sense in that circumstance or minimally you can't calculate it that way. You perhaps should explain what the data are and why a geometric mean is wanted.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#11

20 Oct 2016, 09:54

Perhaps I am missing something, but you can calculate the 7th root of a value in any programming package, including Stata by remembering that the 7th root = the 1/7th power.

Code:

clear set obs 10 gen x = rnormal() gen x7 = x^7 gen x_again = x7^(1/7)

Note that it will still only work with positive numbers.

PS: A lot more complicated is to find the root of an equation, perhaps that's the source of the confusion?
1 like
Comment
Lore Fuma

Join Date: Aug 2016

Posts: 17
#12

20 Oct 2016, 10:05

I had negative values because I reversed the scale of some variables in order to have the same rule for everyone: the higher the number the better it is.
I checked and with geometric aggregation is not necessary to have the same scale for every variables. So I will use the normal (not-reversed) dataset.

Another problem is for the 0 value in some variables. In that case the product of the row is 0 and that want I don't want. Neither the missing value. Which solutions do you suggest for it?

Thank you
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#13

20 Oct 2016, 10:05

Jesse is right of course. But as he says, the power operator does less than you think and can't cope with negative roots even when they are perfectly well defined, because in practice it just works internally by taking logarithms, etc.

A general root operator would be able to return negative roots for example for odd integer powers. It's that which is missing.

More discussion at http://www.stata-journal.com/sjpdf.h...iclenum=st0223

http://blog.stata.com/2011/01/20/how...ulates-powers/
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#14

20 Oct 2016, 10:11

Sorry, I don't understand #12. You are asking for geometric means when they are doubly undefined by there being zeros and negative values. Again, why do you think you need geometric means?

I don't recognise "geometric aggregation", so that needs to be explained too, unless someone else jumps in.

Also, the data here remain totally anonymous. Some context might make clearer what you should be doing.
Comment
Lore Fuma

Join Date: Aug 2016

Posts: 17
#15

20 Oct 2016, 10:26

Yes maybe is better to explain what I'm doing, in short.

In my dataset I have around 10k farms and for each of them 7 variables, that are indicators of, for example, kg/ha of fertiliser used, Kg/ha of plant protection product, share of legumes on total agricultural area etc.
I want to aggregate all the 7 variables in order to build a composite indicator which should provide a sort of environmental score for each farm.
To aggregate variables there are several different methods and one of them is the geometric aggregation that consist of geometric mean.

I have some 0 values because of course some farms doesn't use fertiliser at all.

In order to eliminate zero values I was thinking to standardise the variables using

Code:

egen newvar=std(varlist)

The problem of standardisation is that I obtain some negative values

Last edited by Lore Fuma; 20 Oct 2016, 10:28.
Comment

Announcement