Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to randomly select one member from a household and use them in the panel data?


    Hi! As a novice user for Stata, I am so glad that I found this forum, and I sincerely appreciate any input or help in advance. I am using a longitudinal (panel) data of 12 waves. I am trying to randomly choose and use the 12 waves of data of only one member of the household for my longitudinal analyses, even though there could be one or more people from the same household included the survey design. I was able to randomly select only one member of the household by using this syntax (as follows) and got the results as formatted as below.

    <SYNTAX>
    set seed 12345
    gen random = uniform()
    bysort hhid (random) : gen byte select = _n == 1
    sort hhidpn wave

    **hhidpn is an unique id for participants,hhid is household id, pn is person number for your information**

    <RESULTS AS simplified examples>

    hhidpn wave hhid pn select
    3010 1 3 10 0
    3010 2 3 10 0
    3010 3 3 10 0
    3010 4 3 10 1
    3010 5 3 10 0
    3010 6 3 10 0
    3010 7 3 10 0
    3010 8 3 10 0
    3010 9 3 10 0
    3010 10 3 10 0
    3010 11 3 10 0
    3010 12 3 10 0
    3020 1 3 20 0
    3020 2 3 20 0
    3020 3 3 20 0
    3020 4 3 20 0
    3020 5 3 20 0
    3020 6 3 20 0
    3020 7 3 20 0
    3020 8 3 20 0
    3020 9 3 20 0
    3020 10 3 20 0
    3020 11 3 20 0
    3020 12 3 20 0


    So, in this simplified example, even though both 3010 and 3020 (hhidpn) are from a same household of 3(hhid), hhidpn 3010 has been only selected ("select =1") and I would like to use 3010's all variables collected from "12 waves" for my analyses.

    In this case, how can I keep and use all "12 waves of variables" only from the randomly selected hhidpn (such as 3010) in my longitudinal set of data?

    It could be maybe simple one, but I am actually confused even after looking up previous posts in the forum. Any advice might be appreciated!
    Last edited by Sue Park; 22 Apr 2020, 16:01.

  • #2
    You can - egen - with tag(), then you can - sample - if tag. Please beware you should - preserve - the data, see the results for the the randomly selected individuals, do the estimations , then - restore - it.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      You can - egen - with tag(), then you can - sample - if tag. Please beware you should - preserve - the data, see the results for the the randomly selected individuals, do the estimations , then - restore - it.
      Thank you very much, Marcos!

      Comment


      • #4
        Sorry to contradict Marcos Almeida but I don't understand quite what is recommended in #2.

        What tag() does in practice is to select the first observation in a group on the grounds that if they are all the same then it doesn't matter which one you select. Selecting the first or the last are the only rules that apply to groups that might be as small as one observation.

        What you want is I think different and goes just one command beyond the code in #1.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(hhidpn wave hhid pn)
        3010  1 3 10
        3010  2 3 10
        3010  3 3 10
        3010  4 3 10
        3010  5 3 10
        3010  6 3 10
        3010  7 3 10
        3010  8 3 10
        3010  9 3 10
        3010 10 3 10
        3010 11 3 10
        3010 12 3 10
        3020  1 3 20
        3020  2 3 20
        3020  3 3 20
        3020  4 3 20
        3020  5 3 20
        3020  6 3 20
        3020  7 3 20
        3020  8 3 20
        3020  9 3 20
        3020 10 3 20
        3020 11 3 20
        3020 12 3 20
        end
        
        set seed 12345
        gen random = uniform()
        bysort hhid (random) : gen byte select = _n == 1
        bysort hhid hhidpn (select) : replace select = select[_N]
        
        list, sepby(hhid hhidpn)
        
             +-----------------------------------------------+
             | hhidpn   wave   hhid   pn     random   select |
             |-----------------------------------------------|
          1. |   3010      5      3   10   .5744513        0 |
          2. |   3010      3      3   10   .6893833        0 |
          3. |   3010      2      3   10   .4004426        0 |
          4. |   3010      9      3   10   .4693434        0 |
          5. |   3010      4      3   10   .5597356        0 |
          6. |   3010      8      3   10   .6889245        0 |
          7. |   3010      7      3   10   .0286627        0 |
          8. |   3010      6      3   10   .2076905        0 |
          9. |   3010     10      3   10   .2071526        0 |
         10. |   3010     11      3   10   .0039323        0 |
         11. |   3010      1      3   10   .3576297        0 |
         12. |   3010     12      3   10   .0130297        0 |
             |-----------------------------------------------|
         13. |   3020      2      3   20   .6161914        1 |
         14. |   3020      7      3   20   .1079424        1 |
         15. |   3020     10      3   20   .9400924        1 |
         16. |   3020      4      3   20   .4106361        1 |
         17. |   3020     11      3   20   .6912759        1 |
         18. |   3020      8      3   20    .366684        1 |
         19. |   3020      3      3   20   .8948836        1 |
         20. |   3020      6      3   20   .0267289        1 |
         21. |   3020     12      3   20   .9186656        1 |
         22. |   3020      5      3   20   .2607687        1 |
         23. |   3020      1      3   20   .4204224        1 |
         24. |   3020      9      3   20   .0038868        1 |
             +-----------------------------------------------+

        Comment


        • #5
          Nick Cox : sorry for the clumsy code.

          This is just to explain the way I tried to tackle the issue:


          Code:
          egen byte mytag = tag(hhidpn)
          preserve
          keep if mytag
          set seed 1234
          sample 1, count
          list
          restore
          gen touse = hhidpn ==3020
          list, sepby( hhidpn touse )
          
               +-------------------------------------------+
               | hhidpn   wave   hhid   pn   mytag   touse |
               |-------------------------------------------|
            1. |   3010      1      3   10       1       0 |
            2. |   3010      2      3   10       0       0 |
            3. |   3010      3      3   10       0       0 |
            4. |   3010      4      3   10       0       0 |
            5. |   3010      5      3   10       0       0 |
            6. |   3010      6      3   10       0       0 |
            7. |   3010      7      3   10       0       0 |
            8. |   3010      8      3   10       0       0 |
            9. |   3010      9      3   10       0       0 |
           10. |   3010     10      3   10       0       0 |
           11. |   3010     11      3   10       0       0 |
           12. |   3010     12      3   10       0       0 |
               |-------------------------------------------|
           13. |   3020      1      3   20       1       1 |
           14. |   3020      2      3   20       0       1 |
           15. |   3020      3      3   20       0       1 |
           16. |   3020      4      3   20       0       1 |
           17. |   3020      5      3   20       0       1 |
           18. |   3020      6      3   20       0       1 |
           19. |   3020      7      3   20       0       1 |
           20. |   3020      8      3   20       0       1 |
           21. |   3020      9      3   20       0       1 |
           22. |   3020     10      3   20       0       1 |
           23. |   3020     11      3   20       0       1 |
           24. |   3020     12      3   20       0       1 |
               +-------------------------------------------+
          
          .
          Best regards,

          Marcos

          Comment


          • #6
            Marcos Almeida Thanks very much for the detail. I see how that would work for selecting one person, but for selecting many you would need more work. You can do it in place in any case. .

            Comment


            • #7
              Marcos Almeida Nick Cox I appreciate a lot for both of your input. I am learning a lot now from your input and logic!
              bysort hhid hhidpn (select) : replace select = select[_N] Nick Cox Especially, I am so glad that I learn this code thanks to you, because this looks like a logic I wanted to express to solve my issue. Thanks very much, and Hope you are staying well, healthy, and safe!

              Comment

              Working...
              X