Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quintile

    hi I am new to the stata forum. I am currently working on my thesis topic " poverty and consumption inequality". I need help in making Quintile groups in stata 13 for different variables

  • #2
    You could try something along these lines:

    .ÿquietlyÿsysuseÿauto,ÿclear

    .ÿcentileÿgear_ratio,ÿcentile(20ÿ40ÿ60ÿ80)

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ--ÿBinom.ÿInterp.ÿ--
    ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿObsÿÿPercentileÿÿÿÿCentileÿÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+-------------------------------------------------------------
    ÿÿgear_ratioÿ|ÿÿÿÿÿÿÿÿ74ÿÿÿÿÿÿÿÿÿ20ÿÿÿÿÿÿÿÿ2.56ÿÿÿÿÿÿÿÿÿÿÿÿ2.46ÿÿÿÿÿÿÿÿ2.73
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ40ÿÿÿÿÿÿÿÿ2.93ÿÿÿÿÿÿÿÿÿÿÿÿ2.73ÿÿÿÿÿÿÿÿ2.97
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ60ÿÿÿÿÿÿÿÿ3.08ÿÿÿÿÿÿÿÿÿÿÿÿ2.94ÿÿÿÿÿÿÿÿ3.21
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ80ÿÿÿÿÿÿÿÿ3.54ÿÿÿÿÿÿÿÿÿÿÿÿ3.21ÿÿÿÿÿÿÿÿ3.70

    .ÿgenerateÿbyteÿquantile_groupÿ=ÿ1

    .ÿforvaluesÿiÿ=ÿ1/4ÿ{
    ÿÿ2.ÿquietlyÿreplaceÿquantile_groupÿ=ÿquantile_groupÿ+ÿ1ÿifÿgear_ratioÿ>ÿr(c_`i')
    ÿÿ3.ÿ}

    .ÿtableÿquantile_group,ÿcontents(minÿgear_ratioÿmeanÿgear_ratioÿmaxÿgear_ratioÿnÿgear_ratio)

    --------------------------------------------------------------------------
    quantile_ÿ|
    groupÿÿÿÿÿ|ÿÿmin(gear_r~o)ÿÿmean(gear_r~o)ÿÿÿmax(gear_r~o)ÿÿÿÿÿN(gear_r~o)
    ----------+---------------------------------------------------------------
    ÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.19ÿÿÿÿÿÿÿÿÿÿÿÿ2.41ÿÿÿÿÿÿÿÿÿÿÿÿ2.56ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ16
    ÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.73ÿÿÿÿÿÿÿÿÿÿÿÿ2.82ÿÿÿÿÿÿÿÿÿÿÿÿ2.93ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ20
    ÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.94ÿÿÿÿÿÿÿÿÿÿÿÿ3.05ÿÿÿÿÿÿÿÿÿÿÿÿ3.08ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ14
    ÿÿÿÿÿÿÿÿ4ÿ|ÿÿÿÿÿÿÿÿÿÿÿ3.15ÿÿÿÿÿÿÿÿÿÿÿÿ3.26ÿÿÿÿÿÿÿÿÿÿÿÿ3.37ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ7
    ÿÿÿÿÿÿÿÿ5ÿ|ÿÿÿÿÿÿÿÿÿÿÿ3.54ÿÿÿÿÿÿÿÿÿÿÿÿ3.68ÿÿÿÿÿÿÿÿÿÿÿÿ3.89ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ17
    --------------------------------------------------------------------------

    .

    Comment


    • #3
      help xtile for a direct route to this. (Note that xtile can also accept weights whereas centile does not.)

      Code:
      . set obs 1000
      number of observations (_N) was 74, now 1,000
      
      . ge y = rnormal(100, 10)
      
      . su y, de
      
                                    y
      -------------------------------------------------------------
            Percentiles      Smallest
       1%     75.85376       68.65125
       5%     83.46385       72.31691
      10%     86.39888        72.3798       Obs               1,000
      25%     93.17144       73.14119       Sum of Wgt.       1,000
      
      50%     99.54034                      Mean           99.56744
                              Largest       Std. Dev.      9.914096
      75%     106.2554       125.2206
      90%     112.3539       125.2546       Variance       98.28931
      95%     115.9575       126.1244       Skewness      -.0771469
      99%      121.436       127.9486       Kurtosis       2.843817
      
      . xtile y_q = y, nq(5)
      
      . ta y_q
      
      5 quantiles |
             of y |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |        200       20.00       20.00
                2 |        200       20.00       40.00
                3 |        200       20.00       60.00
                4 |        200       20.00       80.00
                5 |        200       20.00      100.00
      ------------+-----------------------------------
            Total |      1,000      100.00
      Joseph Coveney's (@Joseph Coveney) method works , and it is particularly instructive -- it is a salutary warning that you should not expect the numbers of observations within each quantile group to be exactly the same in practice, especially if the total number of observations is 'small'. Another potential complication is ties in the data. Look at the following:

      Code:
      . quietly sysuse auto, clear
      
      . xtile gr_q = gear_ratio, nq(5) 
      
      . ta gr_q
      
      5 quantiles |
               of |
       gear_ratio |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |         16       21.62       21.62
                2 |         20       27.03       48.65
                3 |         14       18.92       67.57
                4 |         10       13.51       81.08
                5 |         14       18.92      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      
      . centile gear_ratio, centile(20 40 60 80)
      
                                                             -- Binom. Interp. --
          Variable |       Obs  Percentile    Centile        [95% Conf. Interval]
      -------------+-------------------------------------------------------------
        gear_ratio |        74         20        2.56            2.46        2.73
                   |                   40        2.93            2.73        2.97
                   |                   60        3.08            2.94        3.21
                   |                   80        3.54            3.21        3.70
      
      . generate byte quantile_group = 1
       
      . forvalues i = 1/4 {
        2.   quietly replace quantile_group = quantile_group + 1 if gear_ratio > r(c_`i')
        3.    }
      
      . tabulate quantile_group gr_q
      
      quantile_g |               5 quantiles of gear_ratio
            roup |         1          2          3          4          5 |     Total
      -----------+-------------------------------------------------------+----------
               1 |        16          0          0          0          0 |        16 
               2 |         0         20          0          0          0 |        20 
               3 |         0          0         14          0          0 |        14 
               4 |         0          0          0          7          0 |         7 
               5 |         0          0          0          3         14 |        17 
      -----------+-------------------------------------------------------+----------
           Total |        16         20         14         10         14 |        74 
      
      
      . list gear_ratio if quantile_group == 5 & gr_q == 4
      
           +----------+
           | gear_r~o |
           |----------|
       20. |     3.54 |
       45. |     3.54 |
       58. |     3.54 |
           +----------+
      I am unsure at this stage what definitional quirks are leading to the differences between my and Joseph's calculations.

      BTW Asad. Welcome to Stata and the Statalist Forum. There are many community-contributed commands for inequality and poverty analysis available from SSC. search will find you many of them. (Beware unashamed trumpet blow ...) Some you might want to check out using ssc describe name are the following, where name = povdeco, ineqdeco and ineqdec0, sumdist, svylorenz, svyatk_svygei, glcurve







      Comment


      • #4
        Originally posted by Stephen Jenkins View Post
        help xtile for a direct route to this. (Note that xtile can also accept weights whereas centile does not.)

        Code:
        . set obs 1000
        number of observations (_N) was 74, now 1,000
        
        . ge y = rnormal(100, 10)
        
        . su y, de
        
        y
        -------------------------------------------------------------
        Percentiles Smallest
        1% 75.85376 68.65125
        5% 83.46385 72.31691
        10% 86.39888 72.3798 Obs 1,000
        25% 93.17144 73.14119 Sum of Wgt. 1,000
        
        50% 99.54034 Mean 99.56744
        Largest Std. Dev. 9.914096
        75% 106.2554 125.2206
        90% 112.3539 125.2546 Variance 98.28931
        95% 115.9575 126.1244 Skewness -.0771469
        99% 121.436 127.9486 Kurtosis 2.843817
        
        . xtile y_q = y, nq(5)
        
        . ta y_q
        
        5 quantiles |
        of y | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 200 20.00 20.00
        2 | 200 20.00 40.00
        3 | 200 20.00 60.00
        4 | 200 20.00 80.00
        5 | 200 20.00 100.00
        ------------+-----------------------------------
        Total | 1,000 100.00
        Joseph Coveney's (@Joseph Coveney) method works , and it is particularly instructive -- it is a salutary warning that you should not expect the numbers of observations within each quantile group to be exactly the same in practice, especially if the total number of observations is 'small'. Another potential complication is ties in the data. Look at the following:

        Code:
        . quietly sysuse auto, clear
        
        . xtile gr_q = gear_ratio, nq(5)
        
        . ta gr_q
        
        5 quantiles |
        of |
        gear_ratio | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 16 21.62 21.62
        2 | 20 27.03 48.65
        3 | 14 18.92 67.57
        4 | 10 13.51 81.08
        5 | 14 18.92 100.00
        ------------+-----------------------------------
        Total | 74 100.00
        
        
        . centile gear_ratio, centile(20 40 60 80)
        
        -- Binom. Interp. --
        Variable | Obs Percentile Centile [95% Conf. Interval]
        -------------+-------------------------------------------------------------
        gear_ratio | 74 20 2.56 2.46 2.73
        | 40 2.93 2.73 2.97
        | 60 3.08 2.94 3.21
        | 80 3.54 3.21 3.70
        
        . generate byte quantile_group = 1
        
        . forvalues i = 1/4 {
        2. quietly replace quantile_group = quantile_group + 1 if gear_ratio > r(c_`i')
        3. }
        
        . tabulate quantile_group gr_q
        
        quantile_g | 5 quantiles of gear_ratio
        roup | 1 2 3 4 5 | Total
        -----------+-------------------------------------------------------+----------
        1 | 16 0 0 0 0 | 16
        2 | 0 20 0 0 0 | 20
        3 | 0 0 14 0 0 | 14
        4 | 0 0 0 7 0 | 7
        5 | 0 0 0 3 14 | 17
        -----------+-------------------------------------------------------+----------
        Total | 16 20 14 10 14 | 74
        
        
        . list gear_ratio if quantile_group == 5 & gr_q == 4
        
        +----------+
        | gear_r~o |
        |----------|
        20. | 3.54 |
        45. | 3.54 |
        58. | 3.54 |
        +----------+
        I am unsure at this stage what definitional quirks are leading to the differences between my and Joseph's calculations.

        BTW Asad. Welcome to Stata and the Statalist Forum. There are many community-contributed commands for inequality and poverty analysis available from SSC. search will find you many of them. (Beware unashamed trumpet blow ...) Some you might want to check out using ssc describe name are the following, where name = povdeco, ineqdeco and ineqdec0, sumdist, svylorenz, svyatk_svygei, glcurve






        let me try this as i am handling a large data set. can you please guide me that if i type the above commands will it automatically make quintiles for me and decided the correct values for it. like i want to quintiles to be like the first 20% as the poorest group according to given consumption expenditure data. the next group should be of poor the next middle income and the next one as rich and last one as the richest of all on basis of expenditure

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          You could try something along these lines:

          .ÿquietlyÿsysuseÿauto,ÿclear

          .ÿcentileÿgear_ratio,ÿcentile(20ÿ40ÿ60ÿ80)

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ--ÿBinom.ÿInterp.ÿ--
          ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿObsÿÿPercentileÿÿÿÿCentileÿÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+-------------------------------------------------------------
          ÿÿgear_ratioÿ|ÿÿÿÿÿÿÿÿ74ÿÿÿÿÿÿÿÿÿ20ÿÿÿÿÿÿÿÿ2.56ÿÿÿÿÿÿÿÿÿÿÿÿ2.46ÿÿÿÿÿÿÿÿ2.73
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ40ÿÿÿÿÿÿÿÿ2.93ÿÿÿÿÿÿÿÿÿÿÿÿ2.73ÿÿÿÿÿÿÿÿ2.97
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ60ÿÿÿÿÿÿÿÿ3.08ÿÿÿÿÿÿÿÿÿÿÿÿ2.94ÿÿÿÿÿÿÿÿ3.21
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ80ÿÿÿÿÿÿÿÿ3.54ÿÿÿÿÿÿÿÿÿÿÿÿ3.21ÿÿÿÿÿÿÿÿ3.70

          .ÿgenerateÿbyteÿquantile_groupÿ=ÿ1

          .ÿforvaluesÿiÿ=ÿ1/4ÿ{
          ÿÿ2.ÿquietlyÿreplaceÿquantile_groupÿ=ÿquantile_groupÿ+ÿ1ÿifÿgear_ratioÿ>ÿr(c_`i')
          ÿÿ3.ÿ}

          .ÿtableÿquantile_group,ÿcontents(minÿgear_ratioÿmeanÿgear_ratioÿmaxÿgear_ratioÿnÿgear_ratio)

          --------------------------------------------------------------------------
          quantile_ÿ|
          groupÿÿÿÿÿ|ÿÿmin(gear_r~o)ÿÿmean(gear_r~o)ÿÿÿmax(gear_r~o)ÿÿÿÿÿN(gear_r~o)
          ----------+---------------------------------------------------------------
          ÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.19ÿÿÿÿÿÿÿÿÿÿÿÿ2.41ÿÿÿÿÿÿÿÿÿÿÿÿ2.56ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ16
          ÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.73ÿÿÿÿÿÿÿÿÿÿÿÿ2.82ÿÿÿÿÿÿÿÿÿÿÿÿ2.93ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ20
          ÿÿÿÿÿÿÿÿ3ÿ|ÿÿÿÿÿÿÿÿÿÿÿ2.94ÿÿÿÿÿÿÿÿÿÿÿÿ3.05ÿÿÿÿÿÿÿÿÿÿÿÿ3.08ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ14
          ÿÿÿÿÿÿÿÿ4ÿ|ÿÿÿÿÿÿÿÿÿÿÿ3.15ÿÿÿÿÿÿÿÿÿÿÿÿ3.26ÿÿÿÿÿÿÿÿÿÿÿÿ3.37ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ7
          ÿÿÿÿÿÿÿÿ5ÿ|ÿÿÿÿÿÿÿÿÿÿÿ3.54ÿÿÿÿÿÿÿÿÿÿÿÿ3.68ÿÿÿÿÿÿÿÿÿÿÿÿ3.89ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ17
          --------------------------------------------------------------------------

          .
          thanks I will try the above commands and let you know if thats what i require. because i am new to stata user. i really appreciate your help can you please tell me willl these commands make correct quintile groups according to expenditure values or expenditure made by the households within given survey data

          Comment


          • #6
            -help xtile-

            Comment


            • #7
              Originally posted by Stephen Jenkins View Post
              -help xtile-
              Respected Stephen Jenkins my number of observations for 1 year food consumption data is 14084. I means 14084 are the total number of household. each household has different values for different food consumption summed up. so I want to make Quintile for them. like the first group shows the poorest the 2nd poor the 3rd middle income 4th rich 5th richest based on consumption expenditure. consumption expenditure for each household is the sum of Value( price paid for different food items). I have total 8 years consumption data which was collected through questionnaire. so i have already separated food and non food data into different excel sheets. I want to make quintile and then compare all 5 groups of each year with next year up to all 8 years data. i also want to find out Gini coefficient for each year to compare among all 8 years. the variables i consider for household are highest education level of household head, employment sector of household head, province and region (rural/urban) this is what i want to do in my thesis. I shall be thankful if you can kindly help me out with this.

              Comment


              • #8
                Asad: sorry, but I don't have time or other resources to provide online supervision -- you should make use of your university's resources. Focus here on Stata-related issues, and you may get some helpful advice from various people, not only me. Read the Forum FAQ and digest the advice about how to post effectively (how you should formulate questions; how to report Stata code used and output received, etc.)
                Related: you have shown no Stata input or output so far. (As the FAQ advises, you do not have to show all of your data with dataex in order to seek help, just a meaningful amount.) Moreover, it is unclear from your posts that you have read the help files of the programs that you have been directed towards (let alone the academic literature that is cited within them). For example, if you study the help-file for xtile, you'll see how to create quintile groups for each year of your data. (The Gini coefficient can be calculated using programs that I referred you to earlier.)

                Comment


                • #9
                  ok sir thank you let me study the complete help file for xtile. Shall i uploaded my data too ? as you said I had not shown any input and output so far. I currently found Astile from an author when i type xtile in help section.

                  Comment


                  • #10
                    I just tried and learned how to make quintiles by using the syntax xtile quint = var1 , nq(5) . i face one problem i want to filter the food and non food commodities in 1 variable can you guide about that. i searched in help and also tried sort data but failed to do so.

                    Comment


                    • #11
                      i want to filter the food and non food commodities in 1 variable
                      I, for one, don't know what you mean by that.

                      If food spending is held in a variable called "f" and non-food spending in a variable called "nf", then you could create a total (food + nonfood) spending variable thus:

                      Code:
                      generate totalspending = f + nf
                      Re your questions in #9, please read the Forum FAQ in its entirety, and especially the parts about how to present Stata code input and output to the Forum, and also how to use dataex

                      Comment


                      • #12
                        I downloaded data from Statistics Bureau which contains survey data. the Variable was named as "itc" which contained both food and non-food data combined. the observation include items like milk, apple etc as food and also books , car etc as non food data with their respective value(paid & consumed) in the next column( that column was named as V1) which shows the amount paid for buying specific product. I am attaching screen shot so you can have a look. What I did firstly that I separated food and non food data in excel by using Pivot and it summed up all the Values of (V1 which was the amount paid to buy/consume the product) for each household accordingly and I succeeded in doing that. Then after learning about Quintile makiing in stata I copied the the Column V1 from excel sheet and pasted into stata sheet to run the xtile command in order to make quintiles but unfortunately I saw the error mismatch when i run the command. Now I want to ask is there any way to copy paste that excel arranged data into stata so then i will directly just run the quintile command. If no then I have to seperate the food items from non food in stata to which I gave a try manage value label. Both food and non food are inside 1 variable but not separate in the data.

                        Comment


                        • #13


                          the
                          Attached Files

                          Comment


                          • #14
                            Sir Stephen Finally I managed to make Quintile of food data which I had separated in Excel sheet. the above mentioned error occurred due to row treated as variable name now it is solved. Sir I will now have to Combine Some variables with it and Find Gini Coefficient year wise for 8 different periods.
                            Attached Files

                            Comment

                            Working...
                            X