What is the formula for aweight?

Yuqing Hu

Join Date: May 2014

Posts: 17
#1

What is the formula for aweight?

22 Aug 2018, 08:35

Does anyone know what is the exact formula for aweight?

I have a variable "x", and its weight "w". I want to do

Code:

tab x [aw=w], m

, could anyone tell me how I can calculate the weighted frequencies manually? I want to code the process in some other software. Thank you very much in advance!
Tags: None
Wessel de Kroo

Join Date: Feb 2017

Posts: 92
#2

22 Aug 2018, 10:25

ai , other software, an unholy sin in stataland.

what about:

egen total=sum(weight)

gen weight1 = w / total
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

22 Aug 2018, 11:18

The details of weighted estimation are discussed in Section 20.24 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.
Comment
Yuqing Hu

Join Date: May 2014

Posts: 17
#4

22 Aug 2018, 19:07

Originally posted by William Lisowski View Post

The details of weighted estimation are discussed in Section 20.24 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.

Thank you, William. But the weighted estimation discusses about regression for at least two variables. What about only one variable? (I only want to tabulate one variable, and see the weighted frequency).
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 536
#5

22 Aug 2018, 21:11

To me your question is not clear: If you have already an a_weight variable (you call it w), why do you need a formula for it? If you want to know how the weight has been constructed, you have to ask the person who did this, because this can be done in many ways. Normaly an a_weight is scaled such that the mean of the weights is 1. Is this what you are looking for?
Comment

Yuqing Hu

Join Date: May 2014
Posts: 17

22 Aug 2018, 23:49

Originally posted by Dirk Enzmann View Post

To me your question is not clear: If you have already an a_weight variable (you call it w), why do you need a formula for it? If you want to know how the weight has been constructed, you have to ask the person who did this, because this can be done in many ways. Normaly an a_weight is scaled such that the mean of the weights is 1. Is this what you are looking for?

Thank you, Dirk! Let me clarify it here. Yes, the weight variable w is already created.

Code:

. tab a, m nol

              |
          a  |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     73,015       13.68       13.68
          2 |    308,638       57.82       71.50
          3 |      7,888        1.48       72.98
          4 |      4,196        0.79       73.77
          5 |      4,176        0.78       74.55
          6 |      3,094        0.58       75.13
          . |    132,763       24.87      100.00
------------+-----------------------------------
      Total |    533,770      100.00


. tab a [aw=w], m

              |
         a   |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |73,814.2124       13.83       13.83
          2 |301,603.919       56.50       70.33
          3 | 7,546.3846        1.41       71.75
          4 | 3,464.6331        0.65       72.40
          5 | 3,435.2578        0.64       73.04
          6 | 4,270.5778        0.80       73.84
          . | 139,635.02       26.16      100.00
------------+-----------------------------------
      Total |    533,770      100.00

I just want to know how the weighted frequency, such as "73,814.2124" for a==1 is constructed. Do you know how?

Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 536

23 Aug 2018, 05:50

If the weights are normlized to sum to N (as will be automatically done when using analytic weights) and the weights are constant within the categories of your variable a, the frequencies of the weighted data are simply the product of the weighted frequencies per category multiplied by w.

Perhaps the following demonstration helps:

Code:

. sysuse auto, clear
(1978 Automobile Data)

. tab1 foreign

-> tabulation of foreign  

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

. /* Create w1 (normalized to sum to N)
>    to produce 37 cases per category of foreign: */
. qui gen double w1 = 37/52 if foreign==0
. qui replace w1 = 37/22 if foreign==1

. /* Create w2 (normalized to sum to N)
>    to reverse the number of cases per category of foreign: */
. qui gen double w2 = 22/52 if foreign==0
. qui replace w2 = 52/22 if foreign==1

. * Show that the mean of the weights is 1:
. sum w1 w2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          w1 |         74           1    .4465115   .7115385   1.681818
          w2 |         74           1    .8930231   .4230769   2.363636


. * Frequencies of weighted data:
. tab1 foreign [aw = w1]

-> tabulation of foreign  

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         37       50.00       50.00
    Foreign |         37       50.00      100.00
------------+-----------------------------------
      Total |         74      100.00

. tab1 foreign [aw = w2]

-> tabulation of foreign  

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         22       29.73       29.73
    Foreign |         52       70.27      100.00
------------+-----------------------------------
      Total |         74      100.00

. * ------------------------------------------------------------------------------
. * Demonstrate that frequencies of weighted data are simply freq*w :
.
. contract foreign, f(freq)

. * Multiply frequencies with w1:
. qui gen freq_w1 = round(freq * 37/52) if foreign==0
. qui replace freq_w1 = round(freq * 37/22) if foreign==1

. * Multiply frequencies with w2:
. qui gen freq_w2 = round(freq * 22/52) if foreign==0
. qui replace freq_w2 = round(freq * 52/22) if foreign==1

. * Original frequencies and frequencies weighted by w1 and w2:
. list

     +-------------------------------------+
     |  foreign   freq   freq_w1   freq_w2 |
     |-------------------------------------|
  1. | Domestic     52        37        22 |
  2. |  Foreign     22        37        52 |
     +-------------------------------------+

Last edited by Dirk Enzmann; 23 Aug 2018, 05:59. Reason: (I changed the format of the weight variables from float to double to avoid nasty decimals in the weighted frequencies)

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

23 Aug 2018, 07:38

But the weighted estimation discusses about regression for at least two variables.

For others who may read this, regression is the example the manual uses for illustration. But the text explains how the weights are used in general circumstances. In the case of tabulation, each observation counts not as 1 observation but as the value of it's weight, after the weights are rescaled to sum to the same total number of observations. Consider the following example, with 3 observations and weighs summing to 6.

Code:

. list, clean noobs x w 1 2 . 1 4 3 . tabulate x [aw=w], missing x | Freq. Percent Cum. ------------+----------------------------------- 1 | 1 33.33 33.33 4 | 1.5 50.00 83.33 . | .5 16.67 100.00 ------------+----------------------------------- Total | 3 100.00

For each observation, its frequency is its weight divided by (3/6) so the total frequency matches the number of observations in the dataset.
1 like
Comment
Yuqing Hu

Join Date: May 2014

Posts: 17
#9

24 Aug 2018, 04:14

Thank you very much, Dirk and William. This is very helpful!

But the problem is, the "w" weight in my data is different for each individual, and "a" is a categorical variable (e.g., the education level). Do you know how we can construct the weights for each category based on the individual weights?
Comment

Yuqing Hu

Join Date: May 2014
Posts: 17

#10

24 Aug 2018, 07:07

Thank you, everyone. I have solved it

Code:

      

. tab a [aw=w], m

     a |      Freq.     Percent        Cum.
------------+-----------------------------------
       1 |73,814.2124       13.83       13.83
       2 |301,603.919       56.50       70.33
       3 | 7,546.3846        1.41       71.75
       4 | 3,464.6331        0.65       72.40
       5 | 3,435.2578        0.64       73.04
       6 | 4,270.5778        0.80       73.84
       . | 139,635.02       26.16      100.00
------------+-----------------------------------
      Total |    533,770      100.00

. egen mean_w=mean(w)

. gen w_de_mean=w/mean_w

. bysort a: egen sum_w_de_mean=sum(w_de_mean)

. keep a sum_w_de_mean

. duplicates drop


      . list

     +---------------------+
     |       a    sum_w_~n |
     |---------------------|
  1. |     1      73814.21 |
  2. |     2      301603.9 |
  3. |     3      7546.384 |
  4. |     4      3464.633 |
  5. |     5      3435.258 |
     |---------------------|
  6. |     6      4270.578 |
  7. |     .        139635 |
     +---------------------+

Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 536

#11

24 Aug 2018, 15:14

Another variant without the necessity to drop variables and cases (that is not slower and uses somewhat less memory) would be

Code:

. sysuse auto, clear
(1978 Automobile Data)

. rename price w
. rename rep78 a

. * ------------------------------------------
. tab a [aw=w], mi

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 | 1.48071692        2.00        2.00
          2 | 7.74355422       10.46       12.47
          3 | 31.2845041       42.28       54.74
          4 |  17.726269       23.95       78.70
          5 | 10.5499256       14.26       92.95
          . | 5.21503017        7.05      100.00
------------+-----------------------------------
      Total |         74      100.00

. * ------------------------------------------
. tempvar w_de_mean
. qui sum w, meanonly
. gen `w_de_mean' = w/r(mean)
. bys a: egen sum_w_de_mean = sum(`w_de_mean')
. egen pick_a = tag(a), missing
. list a sum_w_de_mean if pick_a, noob sep(0)

  +--------------+
  | a   sum_w_~n |
  |--------------|
  | 1   1.480717 |
  | 2   7.743554 |
  | 3    31.2845 |
  | 4   17.72627 |
  | 5   10.54993 |
  | .    5.21503 |
  +--------------+

Last edited by Dirk Enzmann; 24 Aug 2018, 15:20.

Announcement