Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate weighted mean by calculating weights first

    Dear All,

    I'm trying to calculate some weighted averages by city (average of var1 var2 var3 each), below is how the data looks like:

    Click image for larger version

Name:	Screen Shot 2020-03-09 at 5.43.29 PM.png
Views:	1
Size:	279.3 KB
ID:	1540516

    And below is the result I'm trying to produce:
    Click image for larger version

Name:	Screen Shot 2020-03-09 at 5.46.51 PM.png
Views:	1
Size:	102.7 KB
ID:	1540517

    Below are my questions:

    1. The first step I have to complete is to calculate the population weight (population of each city divided by the total of population of all cities), which requires calculate the total of population of all cities first. Since the data is in long-shape (and I would like to keep it this way), I'm wondering if there's a way to calculate the total of population of all individual city names?

    2. After calculate the weight, I would like to calculate the weighted average of each variable (average*weight). Since I need to output the result to Excel, I think using collapse command is convenient, I'm thinking about set pw=weight in the command, but since pw is sample weight and I do not have any sample design here, I'm not sure if this will do the trick. Does anyone know and please let me know? If setting pw=weight does not calculate the weighted average that I'm looking for, I'm wondering if there's better way to calculate if for several variables and output the result to Excel?

    Any help will be appreciated, thank you very much!

    Best,
    Craig



  • #2
    I can suggest the following to create a total population variable.

    Code:
    egen once = tag(city)
    egen totpop = total(once*population)
    I'm not so sure about the weighted average. I don't understand why one of your cities has just three observations and others have 7. I think you many need to account for that, as well as for the population size. So I'll leave that part of your question for another reader to address.

    Along with using CODE delimiters to present code, as you have done in the past, be sure to use the dataex command to present sample data. If you are running version 15.1 or later, or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use dataex.

    Comment


    • #3
      Hello William,

      Thank you very much for your help and suggestions! The numbers of cities are different since there's another variable for each city, I removed it to make the graph more simple. And will use dataex next time, didn't know that before.

      Thank you,
      Craig

      Comment

      Working...
      X