Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating household size variable

    I'm using Household Income Expenditure Survey data for a research. But there is no variable called Household Size. Instead of that there is a variable called 'serial number of family members (serial_no)'. I want to generate Household Size variable.

    I have attached an image of data editor window
    Attached Files

  • #2
    Please read https://www.statalist.org/forums/help#stata which explains why attachments are not as much help as you hope. My guess would be that

    Code:
    egen hh_size = max(serial_no), by(snumber)
    may be what you want. Oddly hhno appears to be something else.

    If snumber is ambiguous (not an identifier) within the dataset, you need more variables in by().

    (Assuming that people here are familiar with the Household Income Expenditure Survey, or any other data source, cuts down your readership. I don't even know what country or countries it refers to and indeed have never heard of it.)

    Comment


    • #3
      I expect that Nick's code is not what you want, because your observation 18 has a serial_no of 41, an unlikely size for a household. My thought is that it is an indicator for someone - a former wife, perhaps, since the relationship is 2 and the sex is 2 - who no longer lives in the household. Should they be counted in the household size?

      I also suspect that your dataset is panel data with moonthly observatiions for more than one month (although your variable month shows only the value 1 in your screenshot), so you will need to recognize that the household size may change from month to month.

      And, as Nick suggests, if the snumber that appears to identify your households restarts from 1 when the district or sector changes, you will need to take that into account.

      To adivse you better on the code you need, we need a much more complete explanation of your data than a screenshot or sample data can give.

      Comment


      • #4
        William Lisowski Well spotted on the 41, but it looks like a data entry error to me. Previous values run 1, 2, 3 and it looks as if later variables in the same observation are messed up because the 1 digit belongs elsewhere,

        Given such problems,

        Code:
        bysort snumber : gen hh_size = _N
        may work better.

        Comment


        • #5
          Thank both of you.

          egen hh_size = max(serial_no), by(snumber)
          command worked. https://www.statalist.org/forums/help#stata was very useful.

          Comment


          • #6
            As William Lisowski pointed out, that will give you 41 for one household.

            Comment


            • #7
              41 represent household members who usually live else where in the country or abroad, they are out of my topic due to the survey does not collect further information about them (like income or expenditure).

              Comment


              • #8
                Hi! I a question related to this post. I am also using HIES data for my research, but I do not have the variable snumber. Can anyone suggest how can I generate it? I have serial_no in my data but no snumber

                Comment


                • #9
                  #8 Please follow the duplicate thread at https://www.statalist.org/forums/for...ld-member-data if interested.

                  Comment

                  Working...
                  X