Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating unique household ID using PSU and sub-PSU household number

    Hi all,

    I've been searching around for a while without finding an answer to this, so I would be really grateful for any help!

    I am analysing a nation-wide household survey which contains about 30,000 individual observations. The data has 500 PSUs (xhpsu), and within each
    PSU, a number of households (xhnum) were sampled. The sample was also stratified (xstra).

    . des xhpsu xhnum xstra

    storage display value
    variable name type format label variable label
    --------------------------------------------------------------------------------------------
    xhpsu int %8.0g psu
    xhnum byte %8.0g household number
    xstra int %8.0g XSTRA strata in the sample

    . sum xhpsu xhnum xstra

    Variable | Obs Mean Std. Dev. Min Max
    -------------+--------------------------------------------------------
    xhpsu | 28769 1481.253 225.1336 1001 1800
    xhnum | 28769 9.287254 5.263503 1 34
    xstra | 28769 254.7213 63.50279 100 325

    In browse view, the individual observations (household members) are like this:

    xhpsu xhnum xstra
    1001 11 100/moun
    1001 1 100/moun
    1001 16 100/moun
    ...
    xhpsu xhnum xstra
    1002 10 100/moun
    1002 13 100/moun
    1002 7 100/moun
    ...
    xhpsu xhnum xstra
    1003 17 100/moun
    1003 16 100/moun
    1003 7 100/moun

    and so on.

    How can I link each household member to a unique household ID using xhpsu, xhnum and xstra? The purpose of this would be to analyse characteristics for the household as a whole.

    Thanks very much in advance for any help.

    Magnus
    Last edited by Are Magnus; 27 Oct 2015, 12:26.

  • #2
    Well, your "variable" xstra doesn't vary: it's always 100/moun, so it seems you can ignore it. Or perhaps that's just an artifact of the sample of data you showed us. Assuming that xstra does really vary in the whole data set and needs to be accounted for in the identification of distinct households:

    Code:
    egen hhid = group(xhpsu xstra xhnum)
    should do it.

    Comment


    • #3
      Thanks very much, Clyde! Yes, xstra does vary as well.

      Comment


      • #4
        Hello again,

        I encountered a new problem and tried to search around for solutions but to no avail... would be really grateful for any leads!

        So, as explained above, the households are identified with a village ID (xhpsu) and a household number within that village (xhnum). I have so far used this command to group them, which worked well:

        egen hhid = group(xhpsu xhnum)

        However, now that I've started to merge the various file (the survey covers different topics... education, health, expenditure, migration etc...), I've realized that I need to identify them using a unique code or automatically generated number based on just xhpsu xhnum. This is because, for example, only some households have migrants and are included in the migration file. Using the egen command in the migration file, generates hhid 1, 2, 3, 4 etc, for a subset of households which do not correspond to the hhid 1, 2, 3, 4 in the files where all the households are included.

        So, my questions is: what is the easiest and most reliable way of generating a unique code based on only xhpsu xhnum and that could work across all files? Is there a way, for example, to combine them like this:

        xhpsu xhnum hhid_new
        1001 1 10011
        1001 1 10011
        1001 2 10012
        1001 2 10012
        1002 1 10021
        1002 4 10024
        ...

        1005 1 10051
        1005 10 100510
        1006 17 100617

        A potential problem (maybe?) would be that hhid_new are of different lengths?

        Is this a good idea or are there smarter ways to go about this?

        Thanks very much in advance!

        Magnus

        Comment


        • #5
          I'd consider using egen, concat() with spaces as separators.

          Comment


          • #6
            Thanks a lot! That works!!

            Comment


            • #7
              Sorry to disturb after four years but I have similar issue that i am seeking help into. If I am not mistaken, As inquired by Magnus he said he had both xhpsu, xhnum and xstra in main file (like poverty.dta, sample.dta or weight calculation files) and also in the files covering different topics (like health, education, distance etc). But the dataset I have (which is also a NLSS household survey) does have variables xhpsu, xhnum and xstra in main files like poverty, sample.dta but not in individual files (like health, education etc). Can I add these xhpsu, xhnum, xstra in these files so that I can merge them efficiently. Please view attached picture for more understanding.
              Attached Files

              Comment

              Working...
              X