Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the formula for aweight?

    Does anyone know what is the exact formula for aweight?

    I have a variable "x", and its weight "w". I want to do
    Code:
    tab x [aw=w], m
    , could anyone tell me how I can calculate the weighted frequencies manually? I want to code the process in some other software. Thank you very much in advance!

  • #2
    ai , other software, an unholy sin in stataland.

    what about:
    egen total=sum(weight)
    gen weight1 = w / total

    Comment


    • #3
      The details of weighted estimation are discussed in Section 20.24 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        The details of weighted estimation are discussed in Section 20.24 of the Stata User's Guide PDF included with your Stata installation and accessible through Stata's Help menu.
        Thank you, William. But the weighted estimation discusses about regression for at least two variables. What about only one variable? (I only want to tabulate one variable, and see the weighted frequency).

        Comment


        • #5
          To me your question is not clear: If you have already an a_weight variable (you call it w), why do you need a formula for it? If you want to know how the weight has been constructed, you have to ask the person who did this, because this can be done in many ways. Normaly an a_weight is scaled such that the mean of the weights is 1. Is this what you are looking for?

          Comment


          • #6
            Originally posted by Dirk Enzmann View Post
            To me your question is not clear: If you have already an a_weight variable (you call it w), why do you need a formula for it? If you want to know how the weight has been constructed, you have to ask the person who did this, because this can be done in many ways. Normaly an a_weight is scaled such that the mean of the weights is 1. Is this what you are looking for?
            Thank you, Dirk! Let me clarify it here. Yes, the weight variable w is already created.
            Code:
            . tab a, m nol
            
                          |
                      a  |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |     73,015       13.68       13.68
                      2 |    308,638       57.82       71.50
                      3 |      7,888        1.48       72.98
                      4 |      4,196        0.79       73.77
                      5 |      4,176        0.78       74.55
                      6 |      3,094        0.58       75.13
                      . |    132,763       24.87      100.00
            ------------+-----------------------------------
                  Total |    533,770      100.00
            
            
            . tab a [aw=w], m
            
                          |
                     a   |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |73,814.2124       13.83       13.83
                      2 |301,603.919       56.50       70.33
                      3 | 7,546.3846        1.41       71.75
                      4 | 3,464.6331        0.65       72.40
                      5 | 3,435.2578        0.64       73.04
                      6 | 4,270.5778        0.80       73.84
                      . | 139,635.02       26.16      100.00
            ------------+-----------------------------------
                  Total |    533,770      100.00
            I just want to know how the weighted frequency, such as "73,814.2124" for a==1 is constructed. Do you know how?

            Comment


            • #7
              If the weights are normlized to sum to N (as will be automatically done when using analytic weights) and the weights are constant within the categories of your variable a, the frequencies of the weighted data are simply the product of the weighted frequencies per category multiplied by w.

              Perhaps the following demonstration helps:
              Code:
              . sysuse auto, clear
              (1978 Automobile Data)
              
              . tab1 foreign
              
              -> tabulation of foreign  
              
                 Car type |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                 Domestic |         52       70.27       70.27
                  Foreign |         22       29.73      100.00
              ------------+-----------------------------------
                    Total |         74      100.00
              
              . /* Create w1 (normalized to sum to N)
              >    to produce 37 cases per category of foreign: */
              . qui gen double w1 = 37/52 if foreign==0
              . qui replace w1 = 37/22 if foreign==1
              
              . /* Create w2 (normalized to sum to N)
              >    to reverse the number of cases per category of foreign: */
              . qui gen double w2 = 22/52 if foreign==0
              . qui replace w2 = 52/22 if foreign==1
              
              . * Show that the mean of the weights is 1:
              . sum w1 w2
              
                  Variable |        Obs        Mean    Std. Dev.       Min        Max
              -------------+---------------------------------------------------------
                        w1 |         74           1    .4465115   .7115385   1.681818
                        w2 |         74           1    .8930231   .4230769   2.363636
              
              
              . * Frequencies of weighted data:
              . tab1 foreign [aw = w1]
              
              -> tabulation of foreign  
              
                 Car type |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                 Domestic |         37       50.00       50.00
                  Foreign |         37       50.00      100.00
              ------------+-----------------------------------
                    Total |         74      100.00
              
              . tab1 foreign [aw = w2]
              
              -> tabulation of foreign  
              
                 Car type |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                 Domestic |         22       29.73       29.73
                  Foreign |         52       70.27      100.00
              ------------+-----------------------------------
                    Total |         74      100.00
              
              . * ------------------------------------------------------------------------------
              . * Demonstrate that frequencies of weighted data are simply freq*w :
              .
              . contract foreign, f(freq)
              
              . * Multiply frequencies with w1:
              . qui gen freq_w1 = round(freq * 37/52) if foreign==0
              . qui replace freq_w1 = round(freq * 37/22) if foreign==1
              
              . * Multiply frequencies with w2:
              . qui gen freq_w2 = round(freq * 22/52) if foreign==0
              . qui replace freq_w2 = round(freq * 52/22) if foreign==1
              
              . * Original frequencies and frequencies weighted by w1 and w2:
              . list
              
                   +-------------------------------------+
                   |  foreign   freq   freq_w1   freq_w2 |
                   |-------------------------------------|
                1. | Domestic     52        37        22 |
                2. |  Foreign     22        37        52 |
                   +-------------------------------------+
              Last edited by Dirk Enzmann; 23 Aug 2018, 06:59. Reason: (I changed the format of the weight variables from float to double to avoid nasty decimals in the weighted frequencies)

              Comment


              • #8
                But the weighted estimation discusses about regression for at least two variables.
                For others who may read this, regression is the example the manual uses for illustration. But the text explains how the weights are used in general circumstances. In the case of tabulation, each observation counts not as 1 observation but as the value of it's weight, after the weights are rescaled to sum to the same total number of observations. Consider the following example, with 3 observations and weighs summing to 6.
                Code:
                . list, clean noobs
                
                    x   w  
                    1   2  
                    .   1  
                    4   3  
                
                . tabulate x [aw=w], missing
                
                          x |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |          1       33.33       33.33
                          4 |        1.5       50.00       83.33
                          . |         .5       16.67      100.00
                ------------+-----------------------------------
                      Total |          3      100.00
                For each observation, its frequency is its weight divided by (3/6) so the total frequency matches the number of observations in the dataset.

                Comment


                • #9
                  Thank you very much, Dirk and William. This is very helpful!

                  But the problem is, the "w" weight in my data is different for each individual, and "a" is a categorical variable (e.g., the education level). Do you know how we can construct the weights for each category based on the individual weights?

                  Comment


                  • #10
                    Thank you, everyone. I have solved it

                    Code:
                          
                    
                    . tab a [aw=w], m
                    
                         a |      Freq.     Percent        Cum.
                    ------------+-----------------------------------
                           1 |73,814.2124       13.83       13.83
                           2 |301,603.919       56.50       70.33
                           3 | 7,546.3846        1.41       71.75
                           4 | 3,464.6331        0.65       72.40
                           5 | 3,435.2578        0.64       73.04
                           6 | 4,270.5778        0.80       73.84
                           . | 139,635.02       26.16      100.00
                    ------------+-----------------------------------
                          Total |    533,770      100.00
                    
                    . egen mean_w=mean(w)
                    
                    . gen w_de_mean=w/mean_w
                    
                    . bysort a: egen sum_w_de_mean=sum(w_de_mean)
                    
                    . keep a sum_w_de_mean
                    
                    . duplicates drop
                    
                    
                          . list
                    
                         +---------------------+
                         |       a    sum_w_~n |
                         |---------------------|
                      1. |     1      73814.21 |
                      2. |     2      301603.9 |
                      3. |     3      7546.384 |
                      4. |     4      3464.633 |
                      5. |     5      3435.258 |
                         |---------------------|
                      6. |     6      4270.578 |
                      7. |     .        139635 |
                         +---------------------+

                    Comment


                    • #11
                      Another variant without the necessity to drop variables and cases (that is not slower and uses somewhat less memory) would be
                      Code:
                      . sysuse auto, clear
                      (1978 Automobile Data)
                      
                      . rename price w
                      . rename rep78 a
                      
                      . * ------------------------------------------
                      . tab a [aw=w], mi
                      
                           Repair |
                      Record 1978 |      Freq.     Percent        Cum.
                      ------------+-----------------------------------
                                1 | 1.48071692        2.00        2.00
                                2 | 7.74355422       10.46       12.47
                                3 | 31.2845041       42.28       54.74
                                4 |  17.726269       23.95       78.70
                                5 | 10.5499256       14.26       92.95
                                . | 5.21503017        7.05      100.00
                      ------------+-----------------------------------
                            Total |         74      100.00
                      
                      . * ------------------------------------------
                      . tempvar w_de_mean
                      . qui sum w, meanonly
                      . gen `w_de_mean' = w/r(mean)
                      . bys a: egen sum_w_de_mean = sum(`w_de_mean')
                      . egen pick_a = tag(a), missing
                      . list a sum_w_de_mean if pick_a, noob sep(0)
                      
                        +--------------+
                        | a   sum_w_~n |
                        |--------------|
                        | 1   1.480717 |
                        | 2   7.743554 |
                        | 3    31.2845 |
                        | 4   17.72627 |
                        | 5   10.54993 |
                        | .    5.21503 |
                        +--------------+
                      Last edited by Dirk Enzmann; 24 Aug 2018, 16:20.

                      Comment

                      Working...
                      X