Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating individual ID in a household survey dataset

    Dear Users,
    I've got household survey dataset with information on each household member including the household heads. The dataset has not individual ID but it has household. In the setting of the dataset, the first person in the household is the head. How do I create a variable of individual ID such that all the household heads with take the value of 1?
    Thank you,
    Dapel

  • #2
    If the household ID is hhid and you want to generate pid as the individual ID within each household, first sort by hhid with the stable option to preserve the sequence within the household:
    Code:
    sort hhid , stable
    by hhid: generate pid = _n
    Now number 1 in each household is the head.

    Comment


    • #3
      This is beautiful. It works perfectly. Thanks a million dear Svend. How do I create the variable for the household size in each household?

      Comment


      • #4
        Code:
        sort hhid pid
        by hhid: generate hhsize=_N
        
        * About _n and _N, see help subscripting

        Comment


        • #5
          This is really helpful. Thanks

          Comment


          • #6
            Be very careful regarding precision issues when creating ID variables. (There are previous posts about this.) If there are many respondents -- hence large numbers for ID -- then the default float format of generate may not be sufficient. Consider generate long IDVAR = .... Or hold IDs using a string variable. (See the previous posts.)

            Comment


            • #7
              Dear Prof Jenkins, I've got over six hundred thousand observations.

              Comment


              • #8
                Stephen Jenkins is right that, in general, individual ID variables should be stored as longs or strings. In my proposed code, however, I deliberately used int because what we are generating here is a variable that counts up the number of individuals in a single household. For this purpose even -byte- would probably have been sufficient. And I was trying to be mindful of not wasting storage since I know that household survey data tends to include hundreds of thousands or millions of observations.

                Comment

                Working...
                X