Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculate average income of 20% lowest income household

    Dear all,

    I have household survey data. I wish to calculate average income of 20% lowest income household, 20% low-middle income household; 20% middle income household, 20% high middle income household and 20% highest income household from the household data. However, I have no idea to compute this at the moment. Could anyone give me advice on this issue please?

    Thank you very much
    Best regards
    Linh

  • #2
    Hi Linh,

    You'll probably want to take a look at the Statalist post "Quartiles, Quintiles, Deciles, and Percentiles" (see post #14) (which is where I took the example from below).

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id int year long income
      8 2010  10146
     26 2010  11174
     67 2010  12490
     95 2010  14750
     17 2010  15392
     96 2010  15481
     68 2010  16039
     39 2010  16896
      3 2010  17656
     31 2010  17974
     78 2010  18385
     92 2010  18417
     64 2010  21173
     21 2010  22585
     93 2010  25789
     41 2010  29347
     52 2010  32706
     71 2010  33138
      9 2010  33310
     55 2010  33631
     35 2010  34246
     74 2010  34562
     47 2010  36497
     12 2010  39458
     42 2010  44494
     28 2010  44914
     50 2010  45368
     69 2010  46077
     24 2010  46679
     32 2010  47606
     83 2010  48673
     88 2010  50659
      5 2010  53302
     86 2010  53347
     73 2010  54214
     54 2010  54732
     85 2010  55895
     38 2010  56922
     29 2010  57893
     72 2010  60131
     16 2010  63376
     65 2010  64058
     90 2010  65238
     46 2010  66863
     76 2010  67018
    100 2010  68745
     15 2010  69031
     63 2010  69923
      6 2010  71757
     53 2010  72277
     49 2010  74093
     79 2010  74296
     87 2010  75773
     40 2010  76920
     22 2010  77845
     33 2010  78005
     97 2010  80743
     27 2010  80949
     19 2010  83517
      4 2010  83694
     77 2010  83710
     82 2010  84529
     11 2010  84599
     44 2010  87208
     18 2010  87913
     48 2010  88552
     91 2010  88939
      7 2010  89292
     56 2010  89751
     81 2010  89834
     80 2010  90717
     70 2010  92030
     57 2010  94179
     14 2010  95399
     34 2010  95427
     99 2010  99290
     25 2010  99516
      1 2010 101805
     13 2010 102272
     66 2010 103682
     37 2010 104408
     98 2010 104637
     60 2010 105810
     20 2010 108473
     36 2010 110640
     62 2010 111550
     61 2010 111954
     30 2010 112570
     58 2010 113010
     23 2010 113743
     43 2010 113936
     94 2010 114439
     89 2010 115663
     84 2010 116281
      2 2010 116677
     45 2010 116889
     59 2010 121481
     10 2010 121729
     75 2010 122364
     51 2010 122740
    end
    
    
    // I didn't know if you already had your groups (lowest income household, low-middle income, middle income, etc) already defined
    ssc install sumdist
    
    
    sumdist income if year==2010, n(5) qgp(group)  // The n(5) splits into 5 equal groups (quintiles)
     
    Distributional summary statistics, 5 quantile groups
    
    ---------------------------------------------------------------------------
    Quantile  |
    group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
    ----------+----------------------------------------------------------------
            1 |   33631.000       45.953        5.972        5.972     4164.790
            2 |   60131.000       82.163       13.847       19.819    13821.480
            3 |   83694.000      114.359       20.994       40.813    28462.690
            4 |  103682.000      141.671       26.508       67.321    46949.130
            5 |                                32.679      100.000    69739.070
    ---------------------------------------------------------------------------
    Share = quantile group share of total income;
    L(p)=cumulative group share; GL(p)=L(p)*mean(income)
    
    
    
    tabstat income, by(group) stats(n mean median min max) format(%9.1gc)
    
    Summary for variables: income
         by categories of: group (Quantile group)
    
       group |         N      mean       p50       min       max
    ---------+--------------------------------------------------
           1 |        20    20,824    18,180    10,146    33,631
           2 |        20    48,283    48,140    34,246    60,131
           3 |        20    73,206    73,185    63,376    83,694
           4 |        20    92,432    90,276    83,710   103,682
           5 |        20   113,950   113,840   104,408   122,740
    ---------+--------------------------------------------------
       Total |       100    69,739    73,185    10,146   122,740
    ------------------------------------------------------------
    
    // This is how to get the 20th percentile of each group (if that's what you wanted)
    table group, c(mean income median income p20 income) format(%9.1gc) row
    
    ----------------------------------------------------
    Quantile  |
    group     | mean(income)   med(income)   p20(income)
    ----------+-----------------------------------------
            1 |       20,824        18,180        15,071
            2 |       48,283        48,140        41,976
            3 |       73,206        73,185        66,941
            4 |       92,432        90,276        87,561
            5 |      113,950       113,840       109,557
              |
        Total |       69,739        73,185        33,939
    ----------------------------------------------------
    
    // NOTE: Could also use Stata's xtile command to divide into quartiles, quintiles, deciles, etc
    xtile quintile = income, nq(5)
    
    . tabstat income, by(quintile) stats(n mean median min max) format(%9.1gc)
    
    Summary for variables: income
         by categories of: quintile (5 quantiles of income)
    
    quintile |         N      mean       p50       min       max
    ---------+--------------------------------------------------
           1 |        20    20,824    18,180    10,146    33,631
           2 |        20    48,283    48,140    34,246    60,131
           3 |        20    73,206    73,185    63,376    83,694
           4 |        20    92,432    90,276    83,710   103,682
           5 |        20   113,950   113,840   104,408   122,740
    ---------+--------------------------------------------------
       Total |       100    69,739    73,185    10,146   122,740
    ------------------------------------------------------------
    Last edited by David Benson; 04 Nov 2019, 00:46.

    Comment


    • #3
      Hi David,

      Thank you for your reply. To be honest, I am confused with the stuff that I am dealing with. Actually, I would like to create 5 dummy variables:
      1. 1st variable - var1: Average income of 20% of lowest income household. Var1=1 if the household belongs to the group: 20% of lowest income household and var1=0 if the household does not belong...
      2. 2nd variable - var2: Average income of 20% of low-middle income household. Var2=1 if the household belongs to the group: 20% of low-middle income household and var2=0 if the household does not belong...
      3. 3rd variable - var3: Average income of 20% of middle income household. Var3=1 if the household belongs to the group: 20% of lowest income household and var3=0 if the household does not belong...
      4. 4th variable - var4: Average income of 20% of high-middle income household. Var4=1 if the household belongs to the group: 20% of lowest income household and var4=0 if the household does not belong...
      5. 5th variable - var5: Average income of 20% of lowest income household. Var5=1 if the household belongs to the group: 20% of lowest income household and var5=0 if the household does not belong...

      These variables will be generated from the household data survey. At the moment, I do not have any the 5 groupd of houshold income (lowest, low-middle, middle, high-middle and highest).
      Your advice above is really helpful but to be honest Its hard for me to find the correct answer by myself as I am new to stata. Could you please help me to give the proper command to solve my problem. Sorry if my question really disturb you.

      Thank you again,
      Best regards
      Linh

      Comment


      • #4
        David Benson provided you with all the elements of the solutions to your questions. It helped that you provided illustrative data using dataex, but now may be a good time to do some reading of the Stata manuals. I recommend starting with [U] You might be interested in the following code, which simply expands on David's. Copy/paste it all into your do-file editor and run it

        Code:
        clear
        input byte id int year long income
          8 2010  10146
         26 2010  11174
         67 2010  12490
         95 2010  14750
         17 2010  15392
         96 2010  15481
         68 2010  16039
         39 2010  16896
          3 2010  17656
         31 2010  17974
         78 2010  18385
         92 2010  18417
         64 2010  21173
         21 2010  22585
         93 2010  25789
         41 2010  29347
         52 2010  32706
         71 2010  33138
          9 2010  33310
         55 2010  33631
         35 2010  34246
         74 2010  34562
         47 2010  36497
         12 2010  39458
         42 2010  44494
         28 2010  44914
         50 2010  45368
         69 2010  46077
         24 2010  46679
         32 2010  47606
         83 2010  48673
         88 2010  50659
          5 2010  53302
         86 2010  53347
         73 2010  54214
         54 2010  54732
         85 2010  55895
         38 2010  56922
         29 2010  57893
         72 2010  60131
         16 2010  63376
         65 2010  64058
         90 2010  65238
         46 2010  66863
         76 2010  67018
        100 2010  68745
         15 2010  69031
         63 2010  69923
          6 2010  71757
         53 2010  72277
         49 2010  74093
         79 2010  74296
         87 2010  75773
         40 2010  76920
         22 2010  77845
         33 2010  78005
         97 2010  80743
         27 2010  80949
         19 2010  83517
          4 2010  83694
         77 2010  83710
         82 2010  84529
         11 2010  84599
         44 2010  87208
         18 2010  87913
         48 2010  88552
         91 2010  88939
          7 2010  89292
         56 2010  89751
         81 2010  89834
         80 2010  90717
         70 2010  92030
         57 2010  94179
         14 2010  95399
         34 2010  95427
         99 2010  99290
         25 2010  99516
          1 2010 101805
         13 2010 102272
         66 2010 103682
         37 2010 104408
         98 2010 104637
         60 2010 105810
         20 2010 108473
         36 2010 110640
         62 2010 111550
         61 2010 111954
         30 2010 112570
         58 2010 113010
         23 2010 113743
         43 2010 113936
         94 2010 114439
         89 2010 115663
         84 2010 116281
          2 2010 116677
         45 2010 116889
         59 2010 121481
         10 2010 121729
         75 2010 122364
         51 2010 122740
        end
        
        * create quintile groups, identified by "qgroup" membership
        sumdist income if year==2010, n(5) qgp(qgroup)
        return list // for information
        
        * dummy variables identifying groups
        tabulate qgroup, ge(dummy_qgp)
        de dummy_qgp*
        ta dummy_qgp1  // etc. for other groups
        
        * average income by quintile group
        *   Income share of each group i,  = group's income share of total income
        *                         = (group mean X n_group) / (mean X n_total)
        *    But (n_group/n_total) = 1/5, by construction
        * So,   group mean = 5 * mean * group_share
        * We can generate a new variable in which each person attributed the mean
        * of the quintile group to which belongs, using the saved results
        
        ge qgpmean = .
        forvalues q = 1/5 {
            replace qgpmean = r(mean) * r(sh`q') if qgroup == `q'
        }
        ta qgpmean
        
        * Alternatively, ...
        
        mean income if year == 2010, over(qgroup)
        ereturn list
        matrix m = e(b)
        matrix list m // 1x5 matrix
        di "Mean for second poorest fifth = " m[1,2]
        
        * Putting quintile group means into local macros (economical)
        forvalues g = 1/5 {
            local mean_qgp_`g'= m[1,`g']
            di "Mean for quintile group " `g' "   =   "  `mean_qgp_`g''
        }
        
        bysort qgroup: summarize income if year == 2010


        Comment


        • #5
          Hello @Stephen Jenkins
          Thank you for your reply. Sorry for late feedback from me.
          Actually, I used the code
          Code:
          xtile incomegroup=totalincome, nq(5)
          . As I mentioned my question above, however, I am not sure whether this code is the correct answer to solve my question. Could you help me to explain the code I gave in case I did the wrong code.

          Many thanks Stephen
          Regards
          Linh

          Comment


          • #6
            xtile creates a variable containing quantile categories, the 5 quintile groups in your example. You can calculate the mean income of each group once you have done this. My sumdist uses xtile, so the result should be the same! Look at the code in #4 regarding how to calculate the means

            Comment


            • #7
              Hi Stephen,
              As your explaination, the code xtile can creates quantile categories. So if I wish to create the 5 dummy variables as below. Is the code xtile incomegroup=totalincome, nq(5) correct?

              1. 1st variable - var1: Average income of 20% of lowest income household. Var1=1 if the household belongs to the group: 20% of lowest income household and var1=0 if the household does not belong to the group 20% of lowest income household
              2. 2nd variable - var2: Average income of 20% of low-middle income household. Var2=1 if the household belongs to the group: 20% of low-middle income household and var2=0 if the household does not belong to the group 20% of lower middle income household
              3. 3rd variable - var3: Average income of 20% of middle income household. Var3=1 if the household belongs to the group: 20% of middle income household and var3=0 if the household does not belong to the group 20% of middle income household
              4. 4th variable - var4: Average income of 20% of high-middle income household. Var4=1 if the household belongs to the group: 20% of higher income household and var4=0 if the household does not belong to the group 20% of higher income household
              5. 5th variable - var5: Average income of 20% of lowest income household. Var5=1 if the household belongs to the group: 20% of highest income household and var5=0 if the household does not belong to the group 20% of highest income household

              Thank you
              Best regards
              Linh

              Comment

              Working...
              X