Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a value that equals the average of other values for the same variable

    (Sorry about the poor phrasing of the question)
    My dataset contains variables such as country and prevalence of obesity, for 34 countries. I want to create a new value for country variable that will be the average of obesity of all the countries, i.e there will be 35 categories under the country variable. Is there any command to do that in Stata 14.

  • #2
    your question is not completely clear to me (e.g., how do you get 35 categories when you have 34 countries) and there is no data example, but the following FAQ might be what you are looking for:
    https://www.stata.com/support/faqs/d...ng-properties/

    Comment


    • #3
      Originally posted by Rich Goldstein View Post
      your question is not completely clear to me (e.g., how do you get 35 categories when you have 34 countries) and there is no data example, but the following FAQ might be what you are looking for:
      https://www.stata.com/support/faqs/d...ng-properties/
      Hi, thanks for the reference. It doesn't seem to be the thing I'm looking for. Its a huge dataset so dataex won't be a good choice. May the table below can explain the question better. For instance the country variable has 7 categories now, and I want to create a new one that will be the average of these and will be labelled Africa. If this is feasible, the Country variable will then have 8 categories.
      Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	57.9 KB
ID:	1500947

      Comment


      • #4
        There are better ways to produce output that looks like this (such as -table make, c(mean price) row-), but this should do the trick:

        Code:
        sysuse auto, clear
        sum price
        set obs `=_N+1'
        replace make = "All Cars" in l
        replace price = r(mean) in l
        list make price in l, clean noobs
        Effectively, this adds one more row at the bottom of your data. Then you add the mean and the new name there.

        You may want to consider using weights to avoid treating countries of different sizes the same in your average. Or you may also want to consider a geometric mean since you're averaging a ratio.
        Last edited by Dimitriy V. Masterov; 30 May 2019, 21:49.

        Comment


        • #5
          Here's an example demonstrating the suggested portion of the advice for this calculation:

          Code:
          clear
          input str41 country long(adult_pop obese_adults)
          "China"                            952269953 65706627
          "India"                            904750094 44332755
          "United States of America"         219204813 73872022
          "Indonesia"                        178352576 10166097
          "Brazil"                           141395161 28279032
          "Pakistan"                         133103979  7187615
          "Nigeria"                          128962792 14185907
          "Bangladesh"                       111250884  4005032
          "Russian Federation"                97279478 23444354
          end
          
          format adult_pop obese_adults %16.0fc
          gen double ratio = 100*obese_adult/adult_pop
          format ratio %9.1f
          
          ameans ratio      // no weights
          ameans ratio      [aw=adult_pop]
          ameans ratio      [aw=obese_adults]
          summarize ratio   [iw=adult_pop]
          
          
          /* calculation by hand */
          collapse (sum) adult_pop obese_adults
          gen double ratio = 100*obese_adults/adult_pop
          list, clean noobs
          Here the pop-weighted arithmetic means, the obese-weighted harmonic, and the pop-weighted geometric get you the exact right answer.

          The unweighted geometric mean comes very close, which saves you the need to go get the weights data if you don't have that handy.

          The unweighted arithmetic mean is ~35% too large.
          Last edited by Dimitriy V. Masterov; 30 May 2019, 22:11.

          Comment


          • #6
            Originally posted by Dimitriy V. Masterov View Post
            Here's an example demonstrating the suggested portion of the advice for this calculation:

            Code:
            clear
            input str41 country long(adult_pop obese_adults)
            "China" 952269953 65706627
            "India" 904750094 44332755
            "United States of America" 219204813 73872022
            "Indonesia" 178352576 10166097
            "Brazil" 141395161 28279032
            "Pakistan" 133103979 7187615
            "Nigeria" 128962792 14185907
            "Bangladesh" 111250884 4005032
            "Russian Federation" 97279478 23444354
            end
            
            format adult_pop obese_adults %16.0fc
            gen double ratio = 100*obese_adult/adult_pop
            format ratio %9.1f
            
            ameans ratio // no weights
            ameans ratio [aw=adult_pop]
            ameans ratio [aw=obese_adults]
            summarize ratio [iw=adult_pop]
            
            
            /* calculation by hand */
            collapse (sum) adult_pop obese_adults
            gen double ratio = 100*obese_adults/adult_pop
            list, clean noobs
            Here the pop-weighted arithmetic means, the obese-weighted harmonic, and the pop-weighted geometric get you the exact right answer.

            The unweighted geometric mean comes very close, which saves you the need to go get the weights data if you don't have that handy.

            The unweighted arithmetic mean is ~35% too large.
            Thank you for the explanation@Dimitry!

            Comment

            Working...
            X