How to randomly select one member from a household and use them in the panel data?

Sue Park

Join Date: Apr 2020

Posts: 6
#1

How to randomly select one member from a household and use them in the panel data?

22 Apr 2020, 15:43

Hi! As a novice user for Stata, I am so glad that I found this forum, and I sincerely appreciate any input or help in advance. I am using a longitudinal (panel) data of 12 waves. I am trying to randomly choose and use the 12 waves of data of only one member of the household for my longitudinal analyses, even though there could be one or more people from the same household included the survey design. I was able to randomly select only one member of the household by using this syntax (as follows) and got the results as formatted as below.

<SYNTAX>
set seed 12345
gen random = uniform()
bysort hhid (random) : gen byte select = _n == 1
sort hhidpn wave

**hhidpn is an unique id for participants,hhid is household id, pn is person number for your information**

<RESULTS AS simplified examples>

hhidpn wave hhid pn select
3010 1 3 10 0
3010 2 3 10 0
3010 3 3 10 0
3010 4 3 10 1
3010 5 3 10 0
3010 6 3 10 0
3010 7 3 10 0
3010 8 3 10 0
3010 9 3 10 0
3010 10 3 10 0
3010 11 3 10 0
3010 12 3 10 0
3020 1 3 20 0
3020 2 3 20 0
3020 3 3 20 0
3020 4 3 20 0
3020 5 3 20 0
3020 6 3 20 0
3020 7 3 20 0
3020 8 3 20 0
3020 9 3 20 0
3020 10 3 20 0
3020 11 3 20 0
3020 12 3 20 0

So, in this simplified example, even though both 3010 and 3020 (hhidpn) are from a same household of 3(hhid), hhidpn 3010 has been only selected ("select =1") and I would like to use 3010's all variables collected from "12 waves" for my analyses.

In this case, how can I keep and use all "12 waves of variables" only from the randomly selected hhidpn (such as 3010) in my longitudinal set of data?

It could be maybe simple one, but I am actually confused even after looking up previous posts in the forum. Any advice might be appreciated!

Last edited by Sue Park; 22 Apr 2020, 16:01.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

22 Apr 2020, 16:03

You can - egen - with tag(), then you can - sample - if tag. Please beware you should - preserve - the data, see the results for the the randomly selected individuals, do the estimations , then - restore - it.

Best regards,

Marcos
Comment
Sue Park

Join Date: Apr 2020

Posts: 6
#3

22 Apr 2020, 20:28

Originally posted by Marcos Almeida View Post

You can - egen - with tag(), then you can - sample - if tag. Please beware you should - preserve - the data, see the results for the the randomly selected individuals, do the estimations , then - restore - it.

Thank you very much, Marcos!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35807

23 Apr 2020, 01:31

Sorry to contradict Marcos Almeida but I don't understand quite what is recommended in #2.

What tag() does in practice is to select the first observation in a group on the grounds that if they are all the same then it doesn't matter which one you select. Selecting the first or the last are the only rules that apply to groups that might be as small as one observation.

What you want is I think different and goes just one command beyond the code in #1.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(hhidpn wave hhid pn)
3010  1 3 10
3010  2 3 10
3010  3 3 10
3010  4 3 10
3010  5 3 10
3010  6 3 10
3010  7 3 10
3010  8 3 10
3010  9 3 10
3010 10 3 10
3010 11 3 10
3010 12 3 10
3020  1 3 20
3020  2 3 20
3020  3 3 20
3020  4 3 20
3020  5 3 20
3020  6 3 20
3020  7 3 20
3020  8 3 20
3020  9 3 20
3020 10 3 20
3020 11 3 20
3020 12 3 20
end

set seed 12345
gen random = uniform()
bysort hhid (random) : gen byte select = _n == 1
bysort hhid hhidpn (select) : replace select = select[_N]

list, sepby(hhid hhidpn)

     +-----------------------------------------------+
     | hhidpn   wave   hhid   pn     random   select |
     |-----------------------------------------------|
  1. |   3010      5      3   10   .5744513        0 |
  2. |   3010      3      3   10   .6893833        0 |
  3. |   3010      2      3   10   .4004426        0 |
  4. |   3010      9      3   10   .4693434        0 |
  5. |   3010      4      3   10   .5597356        0 |
  6. |   3010      8      3   10   .6889245        0 |
  7. |   3010      7      3   10   .0286627        0 |
  8. |   3010      6      3   10   .2076905        0 |
  9. |   3010     10      3   10   .2071526        0 |
 10. |   3010     11      3   10   .0039323        0 |
 11. |   3010      1      3   10   .3576297        0 |
 12. |   3010     12      3   10   .0130297        0 |
     |-----------------------------------------------|
 13. |   3020      2      3   20   .6161914        1 |
 14. |   3020      7      3   20   .1079424        1 |
 15. |   3020     10      3   20   .9400924        1 |
 16. |   3020      4      3   20   .4106361        1 |
 17. |   3020     11      3   20   .6912759        1 |
 18. |   3020      8      3   20    .366684        1 |
 19. |   3020      3      3   20   .8948836        1 |
 20. |   3020      6      3   20   .0267289        1 |
 21. |   3020     12      3   20   .9186656        1 |
 22. |   3020      5      3   20   .2607687        1 |
 23. |   3020      1      3   20   .4204224        1 |
 24. |   3020      9      3   20   .0038868        1 |
     +-----------------------------------------------+

Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

23 Apr 2020, 04:39

Nick Cox : sorry for the clumsy code.

This is just to explain the way I tried to tackle the issue:

Code:

egen byte mytag = tag(hhidpn)
preserve
keep if mytag
set seed 1234
sample 1, count
list
restore
gen touse = hhidpn ==3020
list, sepby( hhidpn touse )

     +-------------------------------------------+
     | hhidpn   wave   hhid   pn   mytag   touse |
     |-------------------------------------------|
  1. |   3010      1      3   10       1       0 |
  2. |   3010      2      3   10       0       0 |
  3. |   3010      3      3   10       0       0 |
  4. |   3010      4      3   10       0       0 |
  5. |   3010      5      3   10       0       0 |
  6. |   3010      6      3   10       0       0 |
  7. |   3010      7      3   10       0       0 |
  8. |   3010      8      3   10       0       0 |
  9. |   3010      9      3   10       0       0 |
 10. |   3010     10      3   10       0       0 |
 11. |   3010     11      3   10       0       0 |
 12. |   3010     12      3   10       0       0 |
     |-------------------------------------------|
 13. |   3020      1      3   20       1       1 |
 14. |   3020      2      3   20       0       1 |
 15. |   3020      3      3   20       0       1 |
 16. |   3020      4      3   20       0       1 |
 17. |   3020      5      3   20       0       1 |
 18. |   3020      6      3   20       0       1 |
 19. |   3020      7      3   20       0       1 |
 20. |   3020      8      3   20       0       1 |
 21. |   3020      9      3   20       0       1 |
 22. |   3020     10      3   20       0       1 |
 23. |   3020     11      3   20       0       1 |
 24. |   3020     12      3   20       0       1 |
     +-------------------------------------------+

.

Best regards,

Marcos

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35807
#6

23 Apr 2020, 05:28

Marcos Almeida Thanks very much for the detail. I see how that would work for selecting one person, but for selecting many you would need more work. You can do it in place in any case. .
Comment
Sue Park

Join Date: Apr 2020

Posts: 6
#7

24 Apr 2020, 08:49

Marcos Almeida Nick Cox I appreciate a lot for both of your input. I am learning a lot now from your input and logic!
bysort hhid hhidpn (select) : replace select = select[_N] Nick Cox Especially, I am so glad that I learn this code thanks to you, because this looks like a logic I wanted to express to solve my issue. Thanks very much, and Hope you are staying well, healthy, and safe!
Comment

Announcement