Creating unique household ID using PSU and sub-PSU household number

Are Magnus

Join Date: Oct 2015

Posts: 11
#1

Creating unique household ID using PSU and sub-PSU household number

27 Oct 2015, 11:24

Hi all,

I've been searching around for a while without finding an answer to this, so I would be really grateful for any help!

I am analysing a nation-wide household survey which contains about 30,000 individual observations. The data has 500 PSUs (xhpsu), and within each
PSU, a number of households (xhnum) were sampled. The sample was also stratified (xstra).

. des xhpsu xhnum xstra

storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------------
xhpsu int %8.0g psu
xhnum byte %8.0g household number
xstra int %8.0g XSTRA strata in the sample

. sum xhpsu xhnum xstra

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
xhpsu | 28769 1481.253 225.1336 1001 1800
xhnum | 28769 9.287254 5.263503 1 34
xstra | 28769 254.7213 63.50279 100 325

In browse view, the individual observations (household members) are like this:

xhpsu xhnum xstra
1001 11 100/moun
1001 1 100/moun
1001 16 100/moun
...
xhpsu xhnum xstra
1002 10 100/moun
1002 13 100/moun
1002 7 100/moun
...
xhpsu xhnum xstra
1003 17 100/moun
1003 16 100/moun
1003 7 100/moun

and so on.

How can I link each household member to a unique household ID using xhpsu, xhnum and xstra? The purpose of this would be to analyse characteristics for the household as a whole.

Thanks very much in advance for any help.

Magnus

Last edited by Are Magnus; 27 Oct 2015, 11:26.
Tags: household, lsms, psu, survey, village
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#2

27 Oct 2015, 12:12

Well, your "variable" xstra doesn't vary: it's always 100/moun, so it seems you can ignore it. Or perhaps that's just an artifact of the sample of data you showed us. Assuming that xstra does really vary in the whole data set and needs to be accounted for in the identification of distinct households:

Code:

egen hhid = group(xhpsu xstra xhnum)

should do it.
1 like
Comment
Are Magnus

Join Date: Oct 2015

Posts: 11
#3

27 Oct 2015, 12:15

Thanks very much, Clyde! Yes, xstra does vary as well.
Comment
Are Magnus

Join Date: Oct 2015

Posts: 11
#4

09 Jan 2016, 06:43

Hello again,

I encountered a new problem and tried to search around for solutions but to no avail... would be really grateful for any leads!

So, as explained above, the households are identified with a village ID (xhpsu) and a household number within that village (xhnum). I have so far used this command to group them, which worked well:

egen hhid = group(xhpsu xhnum)

However, now that I've started to merge the various file (the survey covers different topics... education, health, expenditure, migration etc...), I've realized that I need to identify them using a unique code or automatically generated number based on just xhpsu xhnum. This is because, for example, only some households have migrants and are included in the migration file. Using the egen command in the migration file, generates hhid 1, 2, 3, 4 etc, for a subset of households which do not correspond to the hhid 1, 2, 3, 4 in the files where all the households are included.

So, my questions is: what is the easiest and most reliable way of generating a unique code based on only xhpsu xhnum and that could work across all files? Is there a way, for example, to combine them like this:

xhpsu xhnum hhid_new
1001 1 10011
1001 1 10011
1001 2 10012
1001 2 10012
1002 1 10021
1002 4 10024
...

1005 1 10051
1005 10 100510
1006 17 100617

A potential problem (maybe?) would be that hhid_new are of different lengths?

Is this a good idea or are there smarter ways to go about this?

Thanks very much in advance!

Magnus
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35336
#5

09 Jan 2016, 08:55

I'd consider using egen, concat() with spaces as separators.
Comment
Are Magnus

Join Date: Oct 2015

Posts: 11
#6

09 Jan 2016, 13:10

Thanks a lot! That works!!
Comment
Ashis Adhikary

Join Date: Oct 2019

Posts: 1
#7

22 Oct 2019, 07:05

Sorry to disturb after four years but I have similar issue that i am seeking help into. If I am not mistaken, As inquired by Magnus he said he had both xhpsu, xhnum and xstra in main file (like poverty.dta, sample.dta or weight calculation files) and also in the files covering different topics (like health, education, distance etc). But the dataset I have (which is also a NLSS household survey) does have variables xhpsu, xhnum and xstra in main files like poverty, sample.dta but not in individual files (like health, education etc). Can I add these xhpsu, xhnum, xstra in these files so that I can merge them efficiently. Please view attached picture for more understanding.
Attached Files
Comment

Announcement

Creating unique household ID using PSU and sub-PSU household number

Comment

Comment

Comment

Comment

Comment

Comment