Creating a value that equals the average of other values for the same variable

Sonnen Blume

Join Date: Aug 2018

Posts: 342
#1

Creating a value that equals the average of other values for the same variable

30 May 2019, 17:55

(Sorry about the poor phrasing of the question)
My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4439
#2

30 May 2019, 18:05

your question is not completely clear to me (e.g., how do you get 35 categories when you have 34 countries) and there is no data example, but the following FAQ might be what you are looking for:
https://www.stata.com/support/faqs/d...ng-properties/
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#3

30 May 2019, 18:29

Originally posted by Rich Goldstein View Post

your question is not completely clear to me (e.g., how do you get 35 categories when you have 34 countries) and there is no data example, but the following FAQ might be what you are looking for:
https://www.stata.com/support/faqs/d...ng-properties/

Hi, thanks for the reference. It doesn't seem to be the thing I'm looking for. Its a huge dataset so dataex won't be a good choice. May the table below can explain the question better. For instance the country variable has 7 categories now, and I want to create a new one that will be the average of these and will be labelled Africa. If this is feasible, the Country variable will then have 8 categories.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#4

30 May 2019, 21:29

There are better ways to produce output that looks like this (such as -table make, c(mean price) row-), but this should do the trick:

Code:

sysuse auto, clear sum price set obs `=_N+1' replace make = "All Cars" in l replace price = r(mean) in l list make price in l, clean noobs

Effectively, this adds one more row at the bottom of your data. Then you add the mean and the new name there.

You may want to consider using weights to avoid treating countries of different sizes the same in your average. Or you may also want to consider a geometric mean since you're averaging a ratio.

Last edited by Dimitriy V. Masterov; 30 May 2019, 21:49.
Comment

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

30 May 2019, 21:57

Here's an example demonstrating the suggested portion of the advice for this calculation:

Code:

clear
input str41 country long(adult_pop obese_adults)
"China"                            952269953 65706627
"India"                            904750094 44332755
"United States of America"         219204813 73872022
"Indonesia"                        178352576 10166097
"Brazil"                           141395161 28279032
"Pakistan"                         133103979  7187615
"Nigeria"                          128962792 14185907
"Bangladesh"                       111250884  4005032
"Russian Federation"                97279478 23444354
end

format adult_pop obese_adults %16.0fc
gen double ratio = 100*obese_adult/adult_pop
format ratio %9.1f

ameans ratio      // no weights
ameans ratio      [aw=adult_pop]
ameans ratio      [aw=obese_adults]
summarize ratio   [iw=adult_pop]


/* calculation by hand */
collapse (sum) adult_pop obese_adults
gen double ratio = 100*obese_adults/adult_pop
list, clean noobs

Here the pop-weighted arithmetic means, the obese-weighted harmonic, and the pop-weighted geometric get you the exact right answer.

The unweighted geometric mean comes very close, which saves you the need to go get the weights data if you don't have that handy.

The unweighted arithmetic mean is ~35% too large.

Last edited by Dimitriy V. Masterov; 30 May 2019, 22:11.

Comment

Sonnen Blume

Join Date: Aug 2018
Posts: 342

01 Jun 2019, 12:57

Originally posted by Dimitriy V. Masterov View Post

Here's an example demonstrating the suggested portion of the advice for this calculation:

Code:

clear
input str41 country long(adult_pop obese_adults)
"China" 952269953 65706627
"India" 904750094 44332755
"United States of America" 219204813 73872022
"Indonesia" 178352576 10166097
"Brazil" 141395161 28279032
"Pakistan" 133103979 7187615
"Nigeria" 128962792 14185907
"Bangladesh" 111250884 4005032
"Russian Federation" 97279478 23444354
end

format adult_pop obese_adults %16.0fc
gen double ratio = 100*obese_adult/adult_pop
format ratio %9.1f

ameans ratio // no weights
ameans ratio [aw=adult_pop]
ameans ratio [aw=obese_adults]
summarize ratio [iw=adult_pop]


/* calculation by hand */
collapse (sum) adult_pop obese_adults
gen double ratio = 100*obese_adults/adult_pop
list, clean noobs

Thank you for the explanation@Dimitry!

Announcement