Calculate average income of 20% lowest income household

Linh mt

Join Date: May 2017

Posts: 33
#1

Calculate average income of 20% lowest income household

03 Nov 2019, 22:27

Dear all,

I have household survey data. I wish to calculate average income of 20% lowest income household, 20% low-middle income household; 20% middle income household, 20% high middle income household and 20% highest income household from the household data. However, I have no idea to compute this at the moment. Could anyone give me advice on this issue please?

Thank you very much
Best regards
Linh
Tags: None

David Benson

Join Date: Oct 2018
Posts: 489

03 Nov 2019, 23:32

Hi Linh,

You'll probably want to take a look at the Statalist post "Quartiles, Quintiles, Deciles, and Percentiles" (see post #14) (which is where I took the example from below).

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int year long income
  8 2010  10146
 26 2010  11174
 67 2010  12490
 95 2010  14750
 17 2010  15392
 96 2010  15481
 68 2010  16039
 39 2010  16896
  3 2010  17656
 31 2010  17974
 78 2010  18385
 92 2010  18417
 64 2010  21173
 21 2010  22585
 93 2010  25789
 41 2010  29347
 52 2010  32706
 71 2010  33138
  9 2010  33310
 55 2010  33631
 35 2010  34246
 74 2010  34562
 47 2010  36497
 12 2010  39458
 42 2010  44494
 28 2010  44914
 50 2010  45368
 69 2010  46077
 24 2010  46679
 32 2010  47606
 83 2010  48673
 88 2010  50659
  5 2010  53302
 86 2010  53347
 73 2010  54214
 54 2010  54732
 85 2010  55895
 38 2010  56922
 29 2010  57893
 72 2010  60131
 16 2010  63376
 65 2010  64058
 90 2010  65238
 46 2010  66863
 76 2010  67018
100 2010  68745
 15 2010  69031
 63 2010  69923
  6 2010  71757
 53 2010  72277
 49 2010  74093
 79 2010  74296
 87 2010  75773
 40 2010  76920
 22 2010  77845
 33 2010  78005
 97 2010  80743
 27 2010  80949
 19 2010  83517
  4 2010  83694
 77 2010  83710
 82 2010  84529
 11 2010  84599
 44 2010  87208
 18 2010  87913
 48 2010  88552
 91 2010  88939
  7 2010  89292
 56 2010  89751
 81 2010  89834
 80 2010  90717
 70 2010  92030
 57 2010  94179
 14 2010  95399
 34 2010  95427
 99 2010  99290
 25 2010  99516
  1 2010 101805
 13 2010 102272
 66 2010 103682
 37 2010 104408
 98 2010 104637
 60 2010 105810
 20 2010 108473
 36 2010 110640
 62 2010 111550
 61 2010 111954
 30 2010 112570
 58 2010 113010
 23 2010 113743
 43 2010 113936
 94 2010 114439
 89 2010 115663
 84 2010 116281
  2 2010 116677
 45 2010 116889
 59 2010 121481
 10 2010 121729
 75 2010 122364
 51 2010 122740
end


// I didn't know if you already had your groups (lowest income household, low-middle income, middle income, etc) already defined
ssc install sumdist


sumdist income if year==2010, n(5) qgp(group)  // The n(5) splits into 5 equal groups (quintiles)
 
Distributional summary statistics, 5 quantile groups

---------------------------------------------------------------------------
Quantile  |
group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
----------+----------------------------------------------------------------
        1 |   33631.000       45.953        5.972        5.972     4164.790
        2 |   60131.000       82.163       13.847       19.819    13821.480
        3 |   83694.000      114.359       20.994       40.813    28462.690
        4 |  103682.000      141.671       26.508       67.321    46949.130
        5 |                                32.679      100.000    69739.070
---------------------------------------------------------------------------
Share = quantile group share of total income;
L(p)=cumulative group share; GL(p)=L(p)*mean(income)



tabstat income, by(group) stats(n mean median min max) format(%9.1gc)

Summary for variables: income
     by categories of: group (Quantile group)

   group |         N      mean       p50       min       max
---------+--------------------------------------------------
       1 |        20    20,824    18,180    10,146    33,631
       2 |        20    48,283    48,140    34,246    60,131
       3 |        20    73,206    73,185    63,376    83,694
       4 |        20    92,432    90,276    83,710   103,682
       5 |        20   113,950   113,840   104,408   122,740
---------+--------------------------------------------------
   Total |       100    69,739    73,185    10,146   122,740
------------------------------------------------------------

// This is how to get the 20th percentile of each group (if that's what you wanted)
table group, c(mean income median income p20 income) format(%9.1gc) row

----------------------------------------------------
Quantile  |
group     | mean(income)   med(income)   p20(income)
----------+-----------------------------------------
        1 |       20,824        18,180        15,071
        2 |       48,283        48,140        41,976
        3 |       73,206        73,185        66,941
        4 |       92,432        90,276        87,561
        5 |      113,950       113,840       109,557
          |
    Total |       69,739        73,185        33,939
----------------------------------------------------

// NOTE: Could also use Stata's xtile command to divide into quartiles, quintiles, deciles, etc
xtile quintile = income, nq(5)

. tabstat income, by(quintile) stats(n mean median min max) format(%9.1gc)

Summary for variables: income
     by categories of: quintile (5 quantiles of income)

quintile |         N      mean       p50       min       max
---------+--------------------------------------------------
       1 |        20    20,824    18,180    10,146    33,631
       2 |        20    48,283    48,140    34,246    60,131
       3 |        20    73,206    73,185    63,376    83,694
       4 |        20    92,432    90,276    83,710   103,682
       5 |        20   113,950   113,840   104,408   122,740
---------+--------------------------------------------------
   Total |       100    69,739    73,185    10,146   122,740
------------------------------------------------------------

Last edited by David Benson; 03 Nov 2019, 23:46.

Comment

Linh mt

Join Date: May 2017

Posts: 33
#3

04 Nov 2019, 02:05

Hi David,

Thank you for your reply. To be honest, I am confused with the stuff that I am dealing with. Actually, I would like to create 5 dummy variables:
1. 1st variable - var1: Average income of 20% of lowest income household. Var1=1 if the household belongs to the group: 20% of lowest income household and var1=0 if the household does not belong...
2. 2nd variable - var2: Average income of 20% of low-middle income household. Var2=1 if the household belongs to the group: 20% of low-middle income household and var2=0 if the household does not belong...
3. 3rd variable - var3: Average income of 20% of middle income household. Var3=1 if the household belongs to the group: 20% of lowest income household and var3=0 if the household does not belong...
4. 4th variable - var4: Average income of 20% of high-middle income household. Var4=1 if the household belongs to the group: 20% of lowest income household and var4=0 if the household does not belong...
5. 5th variable - var5: Average income of 20% of lowest income household. Var5=1 if the household belongs to the group: 20% of lowest income household and var5=0 if the household does not belong...

These variables will be generated from the household data survey. At the moment, I do not have any the 5 groupd of houshold income (lowest, low-middle, middle, high-middle and highest).
Your advice above is really helpful but to be honest Its hard for me to find the correct answer by myself as I am new to stata. Could you please help me to give the proper command to solve my problem. Sorry if my question really disturb you.

Thank you again,
Best regards
Linh
Comment

Stephen Jenkins

Join Date: Apr 2014
Posts: 1425

04 Nov 2019, 03:46

David Benson provided you with all the elements of the solutions to your questions. It helped that you provided illustrative data using dataex, but now may be a good time to do some reading of the Stata manuals. I recommend starting with [U] You might be interested in the following code, which simply expands on David's. Copy/paste it all into your do-file editor and run it

Code:

clear
input byte id int year long income
  8 2010  10146
 26 2010  11174
 67 2010  12490
 95 2010  14750
 17 2010  15392
 96 2010  15481
 68 2010  16039
 39 2010  16896
  3 2010  17656
 31 2010  17974
 78 2010  18385
 92 2010  18417
 64 2010  21173
 21 2010  22585
 93 2010  25789
 41 2010  29347
 52 2010  32706
 71 2010  33138
  9 2010  33310
 55 2010  33631
 35 2010  34246
 74 2010  34562
 47 2010  36497
 12 2010  39458
 42 2010  44494
 28 2010  44914
 50 2010  45368
 69 2010  46077
 24 2010  46679
 32 2010  47606
 83 2010  48673
 88 2010  50659
  5 2010  53302
 86 2010  53347
 73 2010  54214
 54 2010  54732
 85 2010  55895
 38 2010  56922
 29 2010  57893
 72 2010  60131
 16 2010  63376
 65 2010  64058
 90 2010  65238
 46 2010  66863
 76 2010  67018
100 2010  68745
 15 2010  69031
 63 2010  69923
  6 2010  71757
 53 2010  72277
 49 2010  74093
 79 2010  74296
 87 2010  75773
 40 2010  76920
 22 2010  77845
 33 2010  78005
 97 2010  80743
 27 2010  80949
 19 2010  83517
  4 2010  83694
 77 2010  83710
 82 2010  84529
 11 2010  84599
 44 2010  87208
 18 2010  87913
 48 2010  88552
 91 2010  88939
  7 2010  89292
 56 2010  89751
 81 2010  89834
 80 2010  90717
 70 2010  92030
 57 2010  94179
 14 2010  95399
 34 2010  95427
 99 2010  99290
 25 2010  99516
  1 2010 101805
 13 2010 102272
 66 2010 103682
 37 2010 104408
 98 2010 104637
 60 2010 105810
 20 2010 108473
 36 2010 110640
 62 2010 111550
 61 2010 111954
 30 2010 112570
 58 2010 113010
 23 2010 113743
 43 2010 113936
 94 2010 114439
 89 2010 115663
 84 2010 116281
  2 2010 116677
 45 2010 116889
 59 2010 121481
 10 2010 121729
 75 2010 122364
 51 2010 122740
end

* create quintile groups, identified by "qgroup" membership
sumdist income if year==2010, n(5) qgp(qgroup)
return list // for information

* dummy variables identifying groups
tabulate qgroup, ge(dummy_qgp)
de dummy_qgp*
ta dummy_qgp1  // etc. for other groups

* average income by quintile group
*   Income share of each group i,  = group's income share of total income
*                         = (group mean X n_group) / (mean X n_total)
*    But (n_group/n_total) = 1/5, by construction
* So,   group mean = 5 * mean * group_share
* We can generate a new variable in which each person attributed the mean
* of the quintile group to which belongs, using the saved results

ge qgpmean = .
forvalues q = 1/5 {
    replace qgpmean = r(mean) * r(sh`q') if qgroup == `q'
}
ta qgpmean

* Alternatively, ...

mean income if year == 2010, over(qgroup)
ereturn list
matrix m = e(b)
matrix list m // 1x5 matrix
di "Mean for second poorest fifth = " m[1,2]

* Putting quintile group means into local macros (economical)
forvalues g = 1/5 {
    local mean_qgp_`g'= m[1,`g']
    di "Mean for quintile group " `g' "   =   "  `mean_qgp_`g''
}

bysort qgroup: summarize income if year == 2010

Comment

Linh mt

Join Date: May 2017

Posts: 33
#5

23 Nov 2019, 04:26

Hello @Stephen Jenkins
Thank you for your reply. Sorry for late feedback from me.
Actually, I used the code

Code:

xtile incomegroup=totalincome, nq(5)

. As I mentioned my question above, however, I am not sure whether this code is the correct answer to solve my question. Could you help me to explain the code I gave in case I did the wrong code.

Many thanks Stephen
Regards
Linh
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1425
#6

23 Nov 2019, 05:43

xtile creates a variable containing quantile categories, the 5 quintile groups in your example. You can calculate the mean income of each group once you have done this. My sumdist uses xtile, so the result should be the same! Look at the code in #4 regarding how to calculate the means
Comment
Linh mt

Join Date: May 2017

Posts: 33
#7

23 Nov 2019, 22:06

Hi Stephen,
As your explaination, the code xtile can creates quantile categories. So if I wish to create the 5 dummy variables as below. Is the code xtile incomegroup=totalincome, nq(5) correct?

1. 1st variable - var1: Average income of 20% of lowest income household. Var1=1 if the household belongs to the group: 20% of lowest income household and var1=0 if the household does not belong to the group 20% of lowest income household
2. 2nd variable - var2: Average income of 20% of low-middle income household. Var2=1 if the household belongs to the group: 20% of low-middle income household and var2=0 if the household does not belong to the group 20% of lower middle income household
3. 3rd variable - var3: Average income of 20% of middle income household. Var3=1 if the household belongs to the group: 20% of middle income household and var3=0 if the household does not belong to the group 20% of middle income household
4. 4th variable - var4: Average income of 20% of high-middle income household. Var4=1 if the household belongs to the group: 20% of higher income household and var4=0 if the household does not belong to the group 20% of higher income household
5. 5th variable - var5: Average income of 20% of lowest income household. Var5=1 if the household belongs to the group: 20% of highest income household and var5=0 if the household does not belong to the group 20% of highest income household

Thank you
Best regards
Linh
Comment

Announcement