Hi,
I'm struggling with the geometric mean computation in the following case.
I need to create composite indexes based on the geometric (row) mean of multiple variables. The indexes are composed of a different number of variables, and the variables have different distribution.
I created a syntax following these steps:
1) standardization of the variables by generating a "modified z-scores" based on median absolute deviation (to minimize the impact of extreme values);
2) log transformation: store the sign of the values before the logarithmic transformation and log transform abs(`var'), adding 1 so it returns zeros when `var' == 0
3) exponentiate the arithmetic rowmean of the log transformed variables: store its sign, exponentiate it, substract 1, and restore its sign.
This syntax is:
//Step1 - standardization: compute "modified z-scores" (based on median absolute deviation to minimize the impact of extreme values)
Code:
foreach var of varlist v* { qui su `var', det gen double `var'_zsco = ((`var'-`r(p50)')/`r(p50)')* 0.6745 }
Code:
foreach var of varlist *zsco { //store the sign of the values before the logarithmic transformation gen s_`var' = . replace s_`var' = -1 if `var' < 0 & `var' != . replace s_`var' = 1 if `var' > 0 & `var' != . replace s_`var' = 1 if `var' == 0 & `var' != . /*to avoir missing values for (zsco==0)*/ //logarithmic transformation of `var', adding 1 so it returns zeros when `var' == 0 gen double i_`var' = ln(1+(abs(`var')))*s_`var' }
//Step 3 - compute the arithmetic rowmean of the ln transformed variables and
Code:
egen double i_Mean = rmean(i_*) foreach var of varlist i_Mean { //store the sign of the values of var gen s_`var' = . replace s_`var' = -1 if `var' < 0 & `var' != . replace s_`var' = 1 if `var' > 0 & `var' != . replace s_`var' = 1 if `var' == 0 & `var' != . // exponentiate the arithmetic mean gen double exp_`var' = (exp(abs(`var')))-1 //restore the sign of var values replace exp_`var' = s_`var'*exp_`var' }
I created an independent check for rows with positive z scores only (as the gmean() function for egen in egenmore (SSC) ignores zeros and negatives).
Taking for granted that step 1 is irrelevant for the actual problem, I simulated steps 2 and 3 on a previous exmaple provided by Nick (https://www.statalist.org/forums/for...62#post1360962)
I get very close values to what my syntax generate, but it is not an exact match (I get a .9948 correlation), and I just can't find why and where is my mistake.
All the values I get from my own Steps 2 and 3 slightly higher then the expected values.
//Generating example data
Code:
clear set obs 10 set seed 2803 forval j = 1/5 { gen y`j' = ceil(100 * (runiform()^2)) } list+-------------------------+ | y1 y2 y3 y4 y5 | |-------------------------| 1. | 86 63 45 8 1 | 2. | 12 40 73 100 4 | 3. | 60 1 74 61 4 | 4. | 2 1 4 2 54 | 5. | 12 1 22 22 4 | |-------------------------| 6. | 1 7 15 84 14 | 7. | 4 1 12 94 7 | 8. | 40 2 15 2 89 | 9. | 16 34 25 7 6 | 10. | 15 6 3 44 6 | +-------------------------+
Code:
gen double M1 = y1 quietly forval j = 2/5 { replace M1 = M1 * y`j' } replace M1 = exp(log(M1)/5) list
//independent check 2 proposed by Nick
Code:
matrix test = (86, 63, 45, 8, 1) gen test = test[1, _n] means test egen gmean = mean(ln(test)) replace gmean = exp(gmean)means testVariable | Type Obs Mean [95% Conf. Interval]-------------+---------------------------------------------------------------test | Arithmetic 5 40.6 -4.225618 85.42562| Geometric 5 18.11458 1.794746 182.8326| Harmonic 5 4.256322 . .-----------------------------------------------------------------------------Missing values in confidence intervals for harmonic mean indicatethat confidence interval is undefined for corresponding variables.Consult Reference Manual for details.
//Applying my syntax
//Step 2 - log transformation
Code:
foreach var of varlist y* { //store the sign of the values before the log transformation gen s_`var' = . replace s_`var' = -1 if `var' < 0 & `var' != . replace s_`var' = 1 if `var' > 0 & `var' != . replace s_`var' = 1 if `var' == 0 & `var' != . /*to avoid missing values when var ==0)*/ //log transformation of `var', adding 1 so it returns zeros when `var' == 0 gen double i_`var' = ln(1+(abs(`var')))*s_`var' }
//Step 3 - compute the arithmetic rowmean of the ln transformed variables and
Code:
egen double i_Mean = rmean(i_*) foreach var of varlist i_Mean { //store the sign of the values of var gen s_`var' = . replace s_`var' = -1 if `var' < 0 & `var' != . replace s_`var' = 1 if `var' > 0 & `var' != . replace s_`var' = 1 if `var' == 0 & `var' != . /*to avoid missing values when var == 0*/ // exponentiate the arithmetic mean gen double exp_`var' = exp(abs(`var'))-1 //restore the sign of var values replace exp_`var' = s_`var'*exp_`var' } list y1 y2 y3 y4 y5 M1 exp_i_Mean +-------------------------------------------------+ | y1 y2 y3 y4 y5 M1 exp_i_M~n | |-------------------------------------------------| 1. | 86 63 45 8 1 18.114581 20.515226 | 2. | 12 40 73 100 4 26.873536 27.83036 | 3. | 60 1 74 61 4 16.104771 18.52345 | 4. | 2 1 4 2 54 3.8663641 4.4817729 | 5. | 12 1 22 22 4 7.4682237 8.2785434 | |-------------------------------------------------| 6. | 1 7 15 84 14 10.430841 11.669224 | 7. | 4 1 12 94 7 7.9413333 8.975884 | 8. | 40 2 15 2 89 11.639123 12.966184 | 9. | 16 34 25 7 6 14.169602 14.40053 | 10. | 15 6 3 44 6 9.3453063 9.713163 | +-------------------------------------------------+
Best,
Martin
Comment