Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • discrepancies in estimates between sumdist and pshare

    Hi friends,

    I have income data for two populations (population id=1 and 2). I have collapsed my data into the form of cross-tabulation because in my research I assume all people in the same cell has exactly the same income. My original data is pasted at the end of this post. I want to calculate income share of each decile in each of the 2 populations separately. I have tried pshare (by Ben Jann) and sumdist (By Stephen Jenkins). Both can be installed by "ssc install ...". However, their estimates are slightly different. I was wondering if it is due to my incorrect usage of these commands?

    For example, the income share of the top 10% in population with id=1 is 25.35% by pshare but 25.31 by sumdist. I also do not understand why there is an additional row with "." in the first column in the output of sumdist.

    Thank you in advance!

    Code:
    . pshare estimate income [iw=freq], over(id) n(10) gini
    (variance estimation not supported with iweights)
    
    Percentile shares (proportion)    Number of obs   =         40
    
                1: id = 1
                2: id = 2
    
    --------------------------------------------------------------
          income |      Coef.   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
    1            |
            0-10 |   .0193008          .             .           .
           10-20 |    .034384          .             .           .
           20-30 |   .0535935          .             .           .
           30-40 |   .0679086          .             .           .
           40-50 |   .0808105          .             .           .
           50-60 |   .0935073          .             .           .
           60-70 |   .1067746          .             .           .
           70-80 |     .12858          .             .           .
           80-90 |   .1615961          .             .           .
          90-100 |   .2535447          .             .           .
    -------------+------------------------------------------------
    2            |
            0-10 |   .0844804          .             .           .
           10-20 |    .087861          .             .           .
           20-30 |   .0886734          .             .           .
           30-40 |   .0904544          .             .           .
           40-50 |   .0927363          .             .           .
           50-60 |   .0940319          .             .           .
           60-70 |   .0971719          .             .           .
           70-80 |   .1013188          .             .           .
           80-90 |   .1245144          .             .           .
          90-100 |   .1387574          .             .           .
    --------------------------------------------------------------
    
    -------------------------
                 |      Gini
    -------------+-----------
               1 |  .3525345
               2 |  .0837606
    -------------------------
    
    
    
    . sumdist income [aw=freq] if id==1, ngps(10)
    Distributional summary statistics, 10 quantile groups
    
    ---------------------------------------------------------------------------
    Quantile  |
    group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
    ----------+----------------------------------------------------------------
            1 |    15090.00        22.30         1.93         1.93      1510.64
            2 |    31181.00        46.07         3.44         5.38      4204.04
            3 |    44526.00        65.79         5.37        10.75      8401.91
            4 |    53645.00        79.27         6.78        17.52     13700.59
            5 |    67677.00       100.00         8.09        25.62     20027.01
            6 |    76289.00       112.73         9.36        34.98     27347.58
            7 |    85703.00       126.64        10.65        45.63     35676.55
            8 |   104176.00       153.93        12.87        58.51     45741.38
            9 |   131401.00       194.16        16.18        74.69     58393.76
           10 |                                 25.31       100.00     78183.37
            . |                                  0.00       100.00     78183.37
    ---------------------------------------------------------------------------
    
    . sumdist income [aw=freq] if id==2, ngps(10)
    Distributional summary statistics, 10 quantile groups
    
    ---------------------------------------------------------------------------
    Quantile  |
    group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
    ----------+----------------------------------------------------------------
            1 |    66284.00        92.84        11.99        11.99      9144.18
            2 |    67528.00        94.59         7.39        19.37     14777.63
            3 |    67677.00        94.80         7.08        26.45     20178.05
            4 |    70314.00        98.49         8.77        35.22     26866.54
            5 |    71393.00       100.00        13.36        48.58     37056.61
            6 |    74037.00       103.70        10.45        59.03     45026.36
            7 |    74224.00       103.97         5.40        64.43     49145.43
            8 |    82858.00       116.06        11.07        75.50     57592.88
            9 |    97434.00       136.48        10.66        86.16     65721.22
           10 |                                 13.84       100.00     76281.04
            . |                                  0.00       100.00     76281.04
    ---------------------------------------------------------------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(income freq id)
     58340  46 2
     63079  35 2
     66056 132 2
     66284 175 2
     67528 230 2
     67677 220 2
     67773  88 2
     68921  79 2
     70314 100 2
     70686 196 2
     70696  56 2
     71393 144 2
     71651 123 2
     72303  11 2
     74037 167 2
     74224 153 2
     75841 188 2
     82858 109 2
     97434 230 2
    105867 275 2
     15090 276 1
     17279   0 1
     24071 166 1
     29035   0 1
     31181 110 1
     32539   0 1
     34466   0 1
     38506   0 1
     38660 122 1
     39778   0 1
     39786   0 1
     40480   0 1
     44162   0 1
     44190   0 1
     44421   0 1
     44526 154 1
     45027   0 1
     47497   0 1
     47751   0 1
     48988   0 1
     50961   0 1
     51042   0 1
     51800  78 1
     51840   0 1
     51950   0 1
     52751   0 1
     53645 197 1
     53694   0 1
     55029   0 1
     57285   0 1
     57428   0 1
     57832   0 1
     58306   0 1
     58340   0 1
     59029   0 1
     59230   0 1
     59411  34 1
     59767   0 1
     62126   0 1
     62644   0 1
     63079   0 1
     63122   0 1
     63521 230 1
     66056   0 1
     66284   0 1
     67528   0 1
     67677  12 1
     67773   0 1
     68921   0 1
     70314   0 1
     70686   0 1
     70696   0 1
     71393   0 1
     71651   0 1
     72303 219 1
     74037   0 1
     74161   0 1
     74224   0 1
     75841   0 1
     76289  57 1
     76468   0 1
     78953   0 1
     79146   0 1
     81072   0 1
     81112   0 1
     82224 174 1
     82370   0 1
     82858   0 1
     83180   0 1
     83960   0 1
     85703 101 1
     86941   0 1
     88456   0 1
     88713   0 1
     91106   0 1
     91921   0 1
     93882   0 1
     96068   0 1
     96173   0 1
     96513 131 1
     97434   0 1
     98657   0 1
     98719   0 1
    102264   0 1
    104176 145 1
    104184   0 1
    104435   0 1
    105847   0 1
    105867   0 1
    105867   0 1
    111956   0 1
    112972   0 1
    115492  87 1
    116767   0 1
    118917   0 1
    120198   0 1
    121313   0 1
    121679   0 1
    122757   0 1
    130627   0 1
    131401 189 1
    133087   0 1
    134584   0 1
    135674   0 1
    144051   0 1
    145649   0 1
    147470   0 1
    150625   0 1
    159560  45 1
    205999 230 1
     15090   0 2
     17279   0 2
     24071   0 2
     29035   0 2
     31181   0 2
     32539   0 2
     34466   0 2
     38506   0 2
     38660   0 2
     39778   0 2
     39786   0 2
     40480   0 2
     44162   0 2
     44190   0 2
     44421   0 2
     44526   0 2
     45027   0 2
     47497   0 2
     47751   0 2
     48988   0 2
     50961   0 2
     51042   0 2
     51800   0 2
     51840   0 2
     51950   0 2
     52751   0 2
     53645   0 2
     53694   0 2
     55029   0 2
     57285   0 2
     57428   0 2
     57832   0 2
     58306   0 2
     59029   0 2
     59230   0 2
     59411   0 2
     59767   0 2
     62126   0 2
     62644   0 2
     63122   0 2
     63521   0 2
     74161   0 2
     76289   0 2
     76468   0 2
     78953   0 2
     79146   0 2
     81072   0 2
     81112   0 2
     82224   0 2
     82370   0 2
     83180   0 2
     83960   0 2
     85703   0 2
     86941   0 2
     88456   0 2
     88713   0 2
     91106   0 2
     91921   0 2
     93882   0 2
     96068   0 2
     96173   0 2
     96513   0 2
     98657   0 2
     98719   0 2
    102264   0 2
    104176   0 2
    104184   0 2
    104435   0 2
    105847   0 2
    105867   0 2
    111956   0 2
    112972   0 2
    115492   0 2
    116767   0 2
    118917   0 2
    120198   0 2
    121313   0 2
    121679   0 2
    122757   0 2
    130627   0 2
    131401   0 2
    133087   0 2
    134584   0 2
    135674   0 2
    144051   0 2
    145649   0 2
    147470   0 2
    150625   0 2
    159560   0 2
    205999   0 2
    end
    Last edited by shem shen; 08 Dec 2019, 22:07.

  • #2
    Short answer: I can't explain the slight differences in estimates of the income share of the top 10% (to the 4th d.p.!). It also exists if one uses svylorenz (on SSC). I suspect that Ben Jann and I are using slightly different algorithms. (We have benchmarked our code against each other's when Ben was developing his pshare and I don't recall us finding a problem with share estimation.) It might also be to do with how the programmes are treated your grouped data (N_groups = 220).
    Similarly, I have no idea why there is an extra row in sumdist relative to usual -- this has not occurred in my testing over the years (admittedly not with Stata 16). Again it might be something to do with the small number of groups? Note that with Stata 15, I get a note that there are 2 missing obs created (I don't get this in Stata 16)

    Code:
    . sumdist income [fw=freq] if id==1, ngps(10)
    (2 missing values generated)
    
    
    Distributional summary statistics, 10 quantile groups
    
    ---------------------------------------------------------------------------
    Quantile  |
    group     |    Quantile  % of median     Share, %      L(p), %        GL(p)
    ----------+----------------------------------------------------------------
            1 |   15090.000       22.297        1.932        1.932     1510.642
            2 |   31181.000       46.073        3.445        5.377     4204.039
            3 |   44526.000       65.792        5.369       10.746     8401.908
            4 |   53645.000       79.266        6.777       17.524    13700.589
            5 |   67677.000      100.000        8.092       25.615    20027.005
            6 |   76289.000      112.725        9.363       34.979    27347.582
            7 |   85703.000      126.635       10.653       45.632    35676.555
            8 |  104176.000      153.931       12.873       58.505    45741.380
            9 |  131401.000      194.159       16.183       74.688    58393.753
           10 |                                25.312      100.000    78183.369
            . |                                 0.000      100.000    78183.369
    ---------------------------------------------------------------------------
    Share = quantile group share of total income; 
    L(p)=cumulative group share; GL(p)=L(p)*mean(income)
    I set trace on and then re-ran (in Stata 15) the same command. It seems the 2 missing values arise when sumdist calls the weighting variable "freq". I can't tell what the issue is in your case. Again I can only say I've not seen it before and suspect it's something to do with only having 220 groups.

    Comment


    • #3
      Hi Professor Jenkins, thank you very much for your quick response. Yes, this is just a minor difference. I will let you know if I have any new findings on this particular issue.

      Comment

      Working...
      X