Thanks to Kit Baum, I have made available on SSC a new -egen- function wmean(). Calculates (optionally) weighted, (optionally) byable Arithmetic, Geometric or Harmonic mean.
Type
from within Stata to find it, and follow the instructions there to install.
The motivation for writing this -egen- function is that weights are not supported by the official -egen- functions, however they are much needed. In particular the question of "How can I calculate the weighted mean" pops up often on Statalist.
The most popular weighted mean egen function is _gwtmean.ado by David Kantor, but it is written for Stata Version 3.0, and recently it became apparent that _gwtmean does not correctly parse string variables, and apparently the problem arises because the Version 3 of Stata is too old. The issue is explained on this thread here:
https://www.statalist.org/forums/for...-bug-in-wtmean
Therefore I wrote my own weighted mean _gwmean for Stata 11 which does not suffer from the defect in the old _gwtmean.
Once I was on the task, I also made my _gwmean able to calculate, apart from Arithmetic, also weighted Geometric and weighted Harmonic means, and I believe this functionality is novel, no other -egen- function can do that.
I also added the option -label- so that it can automatically label nicely the newly generated variable if desired.
Some test code follows:
Type
Code:
. ssc desc _gwmean
The motivation for writing this -egen- function is that weights are not supported by the official -egen- functions, however they are much needed. In particular the question of "How can I calculate the weighted mean" pops up often on Statalist.
The most popular weighted mean egen function is _gwtmean.ado by David Kantor, but it is written for Stata Version 3.0, and recently it became apparent that _gwtmean does not correctly parse string variables, and apparently the problem arises because the Version 3 of Stata is too old. The issue is explained on this thread here:
https://www.statalist.org/forums/for...-bug-in-wtmean
Therefore I wrote my own weighted mean _gwmean for Stata 11 which does not suffer from the defect in the old _gwtmean.
Once I was on the task, I also made my _gwmean able to calculate, apart from Arithmetic, also weighted Geometric and weighted Harmonic means, and I believe this functionality is novel, no other -egen- function can do that.
I also added the option -label- so that it can automatically label nicely the newly generated variable if desired.
Some test code follows:
Code:
. *** This file tests the egen function _gwmean . *** (weighted, byable, arithmetic, geometric and harmonic means) . *** 31 March 2021, Gueorgui I. Kolev . . sysuse auto, clear (1978 Automobile Data) . . keep foreign price weight . . * Introduce some missing, and negative values . . sort foreign . . replace price = . in 1/4 (4 real changes made, 4 to missing) . . replace weight = . in 4/7 (4 real changes made, 4 to missing) . . replace price = -price in -3/l (3 real changes made) . . egen arimean = wmean(price), by(foreign) weights(weight) label // the default is Arithmetic mean, . //Weights can be abbreviated to w. Option Label can be abbreviated to l, and labels the new generated va > riable. . . egen geomean = wmean(price), by(foreign) w(weight) geometric Geometric and Harmonic mean are defined for Xi>0 only. If some Xi<=0, I discard them, and compute on the basis of those Xi>0 only. . // Geometric mean option can be abbreviated to g > . . . egen harmean = wmean(price), by(foreign) w(weight) harmonic label // Harmonic mean option can be Geometric and Harmonic mean are defined for Xi>0 only. If some Xi<=0, I discard them, and compute on the basis of those Xi>0 only. . // abbreviated to h. . . * The native Stata's -ameans- calculate on this data the same Arithmetic, . * Geometric, and Harmonic means as our -egen, wmean- function above. . . by foreign: ameans price [aw=weight] ---------------------------------------------------------------------------------------------------------- -> foreign = Domestic Variable | Type Obs Mean [95% Conf. Interval] -------------+--------------------------------------------------------------- price | Arithmetic 45 6682.039 5627.334 7736.743 | Geometric 45 5996.11 5244.565 6855.351 | Harmonic 45 5511.785 4968.547 6188.396 ----------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------- -> foreign = Foreign Variable | Type Obs Mean [95% Conf. Interval] -------------+--------------------------------------------------------------- price | Arithmetic 22 4415.504 1747.879 7083.13 | Geometric 19 6070.164 5066.556 7272.57 | Harmonic 19 5706.753 4905.482 6820.889 ----------------------------------------------------------------------------- . . tabstat arimean geomean harmean, by(foreign) stat(mean count) notot Summary statistics: mean, N by categories of: foreign (Car type) foreign | arimean geomean harmean ---------+------------------------------ Domestic | 6682.039 5996.11 5511.785 | 52 52 52 ---------+------------------------------ Foreign | 4415.504 6070.164 5706.753 | 22 22 22 ---------------------------------------- . . * And an example where the argument of the function is a general expression, and with If and In. . . egen arimeanexpre = wmean(log(price)*price) if weight>3000 in 10/l, by(foreign) weights(weight) label (41 missing values generated) .
Comment