Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constructing Gini Index for Household data

    Dear All
    I'm computing the Gini coefficient in Stata 16 with the" ineqdeco" command. I'm using panel data from the Household Surveys in 2009, 2012, and 2016 with about 59,000 observations. I'd like to estimate the Gini coefficient for each household for each year, but the Gini coefficient is only generated for Household Code (grouped variable). As a result, each household has the same Gini coefficient over all three years. However, I want to calculate the Gini index for each household_Code for each year.

    For example, the output appears as:

    Year Household_Code Gini
    2009 1 0.6456
    2012 1 0.6456
    2016 1 0.6456
    2009 2 0.3423
    2012 2 0.3423
    2016 2 0.3423

    Code:
    egen Household_Code = group(District Psu Snumber)
    su Household_Code, meanonly
    gen gini = .
    program do_it
        qui ineqdeco Household_Income, by(Household_Code)
        replace gini = r(gini)
    end
    runby do_it, by(Household_Code) verbose
    I also tried with the following code, but the error shows as"too many variables".

    Code:
    egen Household_Code = group(District Psu Snumber)
    su Household_Code, meanonly
    gen gini = .
    program do_it
        qui ineqdeco Household_Income, by(Year Household_Code)
        replace gini = r(gini)
    end
    runby do_it, by(Year Household_Code) verbose
    Could someone help me figure out the stata command to calculate the Gini index for each household_Code for each year?
    Last edited by Sandamali Wijayarathne; 28 Nov 2021, 23:53.

  • #2
    Sandamali:
    welcome to this forum.
    provided that I'm totally unexpereinced with the community-contributed module -ineqdeco- (as your kindly asked to mention it for sound reasons detailed in the FAQ), you might try to link -Household_Code- and -Year- via:
    Code:
    egen wanted=group( Household_Code Year )
    Let's hope that Stephen Jenkins chimes in (my hope does not imply that you should contact him privately, as well reported in the FAQ).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Lazzaro
      I highly appreciate your reply. I tried with the command you suggested. Although, it helps to create group variables, when using the "ineqdeco" command, an error message appears indicating "too many variables" (r134).
      I'm waiting for the reply from Stephen Jenkins and Nick Cox which will be extremely beneficial to my PhD.

      Regards
      Sandamali

      Comment


      • #4
        I am unclear what your data set-up is. I think you intend to have one observation on 'Household_Income' per household and households are uniquely identified by "Household_Code". And each household may appear in different years.

        Assuming this set-up, the "by()" should refer to "by(Year)", I think. You are getting the error message becauuse -ineqdeco- is trying to calculate a Gini separately for each and every household in the dataset and that number is too large.

        Comment


        • #5
          Let me suggest the following. In post #1 you do not discuss the idea of Gini decomposition, but rather that you want to calculate a Gini "for each household for each year".

          I think you intend to calculate three Gini coefficients - for the distribution of Household_Income in 2009, in 2012, and in 2016 - and then assign to each observation the Gini coefficient for that year's distribution of Household_Income.

          For that, the by(Year) option on your ineqdeco command is not needed or wanted. The output of help ineqdeco tells us the by() option is used for decomposition, which is what you would be doing if, for example, you wanted to see to what degree the inequality in Household_Income reflects differences between households in different Districts, versus to what extent the inequality reflects a commonality across all Districts. ( Stephen Jenkins could say this much better than I, so if this leads him to further thoughts, his comments should be taken more seriously than mine.) But it doesn't make sense to think about the difference between households, because each household has but one observation in a given year.

          If my understanding is correct, though, you do want the by(Year) option on your runby command, to separately calculate inequality in 2009, 2012, and 2016.

          I think your code would be modified to
          Code:
          program do_it
              ineqdeco Household_Income
              generate gini = r(gini)
          end
          runby do_it, by(Year) verbose
          The 2009 observations will have the Gini for the distribution of Household_Income in 2009; similarly for the 2012 and 2016 observations. Each household will have the same three values for the three years.

          For those not familiar with runby, the following should be equivalent.
          Code:
          gen gini = .
          foreach y in 2009 2012 2016 {
              ineqdeco Household_Income if Year==`y'
              replace gini = r(gini) if Year==`y'
              }
          Last edited by William Lisowski; 29 Nov 2021, 08:24.

          Comment


          • #6
            Dear Stephen and William
            Thank you very much for both the reply and both are working. However, it produces different outcomes (I have no idea for the difference). I had a significant mistake initially, thinking that Gini is calculated for each household for each year, but after the discussion with both of you, I realised that Gini is calculated for each year and then applied to each household.

            Finally, I decided to use the following code to calculate Gini for each Household.

            Code:
             gen gini = .
            program do_it    
            qui ineqdeco Household_Income    
            replace gini = r(gini)
            end
            runby do_it, by(Year) verbose

            Comment

            Working...
            X