Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split vaiable into groups and build quintiles within the groups

    Hi everyone,

    I have the variable Household size (hhmemb) with 5.000 observations and the income for each household. Now I want to group the variable household size into 1 Person Households, 2 Person Household ... 5+ Person Household. Within these Groups I want to build Quintiles according to the income. I tried the following:

    gen hhgroup = cond(hhmemb == 1, "1 Person Household", ///
    hhmemb == 2, "2 Person Household", ///
    hhmemb == 3, "3 Person Household", ///
    hhmemb == 4, "4 Person Household", ///
    hhmemb >= 5, "5+ Person Household", ///)
    sort hhgroup inc

    by hhgroup: egen q20 = pctile(inc), p(20)
    by hhgroup: egen q40 = pctile(inc), p(40)
    by hhgroup: egen q60 = pctile(inc), p(60)
    by hhgroup: egen q80 = pctile(inc), p(80)

    list hhgroup Income q20 q40 q60 q80

    The first part works fine but then stata says: "variable hhgroup not found".
    Last edited by elli charles; 23 Jan 2024, 10:17.

  • #2
    was hhgroup created? the ///) at the end would likely cause it to fail.

    Comment


    • #3
      g hhgroup = hhmemb
      replace hhgroup = 5 if hhgroup > 5

      bys hhgroup: egen q20 = pctile(inc), p(20)
      bys hhgroup: egen q40 = pctile(inc), p(40)
      bys hhgroup: egen q60 = pctile(inc), p(60)
      bsy hhgroup: egen q80 = pctile(inc), p(80)

      Comment


      • #4
        i thought i created hhgroup with the first part of the code: gen hhgroup = cond(hhmemb == 1, "1 Person Household",..
        currently I just have hhmemb wich tells me how many people live in one household. I want to use that and put it into groups.

        Comment


        • #5
          Hmm, interesting. This is a sort of a bug in Stata.*

          Your first part did not "work fine." If you stopped the code immediately after, you would see that the variable hhgroup, was not actually created. Nor should it have been, because your command to create it has incorrect syntax. The problem is that no error message is reported. One might have expected Stata to halt with an error message.

          I'm not going to go into the correct way to use -cond()- here, because in this situation its use is unnecessary and vastly more complicated than is needed.

          Here's a better way to create this variable:
          Code:
          forvalues i = 1/4 {
              label define hhgroup `i' "`i' Person Household", modify
          }
          label define hhgroup 5 "5+ Person Household", modify
          gen int hhgroup:hhgroup = min(hhmemb, 5)
          Note: Instead of producing a string variable, this code produces a value-labeled numeric variable--which will be much more useful. For example, you can use this variable as a variable in regression and calculate any kind of statistics you like with it.

          *It is subtle, and not technically a bug. On my setup, if I try to run your code exactly as written, I do get an error message:
          [/code]
          . gen hhgroup = cond(hhmemb == 1, "1 Person Household", ///
          > hhmemb == 2, "2 Person Household", ///
          > hhmemb == 3, "3 Person Household", ///
          > hhmemb == 4, "4 Person Household", ///
          > hhmemb >= 5, "5+ Person Household" ///)
          > sort hhgroup inc
          too few ')' or ']'
          r(132);
          [/code]

          However, if I highlight only the -gen- command (down through the closing right paren), and just run that, no error message appears.

          Code:
          . gen hhgroup = cond(hhmemb == 1, "1 Person Household", ///
          > hhmemb == 2, "2 Person Household", ///
          > hhmemb == 3, "3 Person Household", ///
          > hhmemb == 4, "4 Person Household", ///
          > hhmemb >= 5, "5+ Person Household" ///)
          
          end of do-file
          This happens, I believe, because the /// in the final line tells Stat to ignore the final paren and also to expect the command to continue on the next line. So Stata doesn't know that the command is done and thinks more is to come. So Stata neither parses nor attempts to run the command because it's waiting for the rest of it.

          So, technically Stata is performing within the bounds of how it defines the behavior to expect in all of these circumstances, which is not a bug. And although in this circumstance Stata's behavior is counter-intuitive, it would be impossible to "fix" this, because there is no way Stata can tell that the user isn't actually going to provide the final part of the -gen hhgroup- command soon.

          Added: Crossed with #2, 3, 4.

          Comment


          • #6
            thanks clyde. you're right I need a numeric variable and not a string variable.

            Comment

            Working...
            X