Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching several observations via new variable

    Hello everyone,

    Currently I'm working with a dataset that contains several waves with various ID's (mothers) and their corresponding children, who also have their own ID's .
    Each ID of the mother is matched in one line with the ID of their children.
    However, In the dataset also contains mothers who have more than ohne child, so one ID can be potentially matched with several CIDs (As you can see in the example).
    Wave ID CID
    1 21001 1001
    1 21001 1002
    To see how many mothers have more than one child (via the CID) in the Dataset, I need a new variable that contains mothers with multiple children.
    This command should say (by sense)

    "gen children if there is more than one CID hat belongs to the ID"

    Hopefully you got my point and can help me figure this out.

    Thanks in advance!

  • #2
    It is not clear to me whether the same mothers and children carry the same ID and CID, respectively, into subsequent waves, or if the numbering changes. I will assume that the IDs and CIDs do not change from wave to wave.

    Code:
    by ID (CID), sort: gen n_children = sum(CID != CID[_n-1])
    by ID (CID): replace n_children = n_children[_N]
    The variable n_children will tell you the number of distinct children a given woman in the data has. Note that a woman who has, say, one child at wave 1 but subsequently has another, will be designated as having 2 children in all of her observations.

    Comment


    • #3
      Thank you very much for the code.
      As you mentioned and assumed correctly, the IDs and CIDs don't change over time, they stay identical.
      Sadly the "n_children" variable only has the value "1" over all observations, which can't be correct.
      Last edited by Hannah Zllnr; 27 Jan 2022, 17:08.

      Comment


      • #4
        The code works with the data in the tableau shown in #1:

        Code:
        . * Example generated by -dataex-. For more info, type help dataex
        . clear
        
        . input byte wave int(id cid)
        
                 wave        id       cid
          1. 1 21001 1001
          2. 1 21001 1002
          3. end
        
        .
        . by id (cid), sort: gen n_children = sum(cid != cid[_n-1])
        
        . by id (cid): replace n_children = n_children[_N]
        (1 real change made)
        
        .
        . list, noobs clean abbrev(12)
        
            wave      id    cid   n_children  
               1   21001   1001            2  
               1   21001   1002            2  
        
        .
        end of do-file


        Please use -dataex-, as I have done here, and post example data that exhibits this difficulty and I will troubleshoot. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
        Last edited by Clyde Schechter; 27 Jan 2022, 18:03.

        Comment


        • #5
          Thanks for your help Clyde, I used Dataex and the following table is my sample:


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long(cid id) int wave
            6151201   6151000 10
            6151201   6151000  5
            6151201   6151000  6
            6151201   6151000  9
            6151201   6151000  7
            6151201   6151000  8
            6151201   6151000 11
            6151203   6151000  9
            6151203   6151000 11
            6151203   6151000 10
           35363201  35363000  6
           35363201  35363000  4
           35363201  35363000  7
           35363201  35363000  8
           35363202  35363000  8
           35363202  35363000  7
           35363202  35363000  4
           35363202  35363000  6
           35363203  35363000  8
           35363203  35363000  7
           77982201  77982000  6
           77982201  77982000  3
           77982201  77982000  8
           77982201  77982000  9
           77982201  77982000  2
           77982202  77982000  9
           77982202  77982000  6
           77982202  77982000  8
           77982202  77982000 10
           77982202  77982000 11
          118729201 118729000  9
          118729201 118729000  4
          118729201 118729000  6
          118729201 118729000  8
          118729201 118729000  5
          118729202 118729000  6
          118729202 118729000 11
          118729202 118729000  9
          118729202 118729000  8
          118729202 118729000 10
          138339203 138339000  2
          138339203 138339000  6
          138339203 138339000  5
          138339203 138339000  3
          138339203 138339000  4
          138339203 138339000  7
          138339203 138339000  8
          138339204 138339000  8
          138339204 138339000  7
          138339204 138339000  6
          157976201 157976000  7
          157976201 157976000  6
          157976201 157976000  8
          157976201 157976000  4
          157976201 157976000 11
          157976201 157976000  9
          157976202 157976000  7
          157976202 157976000 11
          157976202 157976000  9
          157976202 157976000  8
          164090201 164090000  6
          164090201 164090000  7
          164090201 164090000  5
          164090201 164090000  4
          164090201 164090000  8
          164090202 164090000  8
          164090202 164090000  7
          164090202 164090000  9
          164090202 164090000  5
          164090202 164090000  6
          191816201 191816000  8
          191816201 191816000  9
          191816201 191816000 10
          191816201 191816000 11
          191816201 191816000  7
          191816202 191816000  7
          191816202 191816000  9
          191816202 191816000  8
          191816202 191816000 10
          191816202 191816000 11
          193344201 193344000  7
          193344201 193344000  6
          193344201 193344000  4
          193344201 193344000  5
          193344201 193344000  3
          193344202 193344000  5
          193344202 193344000 11
          193344202 193344000  6
          193344202 193344000  7
          193344203 193344000 11
          198387201 198387000  5
          198387201 198387000  2
          198387201 198387000  4
          198387201 198387000  6
          198387201 198387000  7
          198387201 198387000  8
          198387202 198387000  5
          198387202 198387000  8
          198387202 198387000  7
          198387202 198387000  6
          end
          label values wave WAVE_prt2
          label def WAVE_prt2 2 "2 2009/10", modify
          label def WAVE_prt2 3 "3 2010/11", modify
          label def WAVE_prt2 4 "4 2011/12", modify
          label def WAVE_prt2 5 "5 2012/13", modify
          label def WAVE_prt2 6 "6 2013/14", modify
          label def WAVE_prt2 7 "7 2014/15", modify
          label def WAVE_prt2 8 "8 2015/16", modify
          label def WAVE_prt2 9 "9 2016/17", modify
          label def WAVE_prt2 10 "10 2017/18", modify
          label def WAVE_prt2 11 "11 2018/19", modify

          Comment


          • #6
            An alternative code to get what you want is

            Code:
            bys id (cid): gen wanted= cid[1]!=cid[_N]
            That said, Clyde's code or the above does not support the assertion:

            Sadly the "n_children" variable only has the value "1" over all observations, which can't be correct.
            (at least for the sample that you provide). You can use contract to see this:

            Code:
            preserve
            contract id cid
            l, sepby(id)
            restore
            Res.:

            Code:
            . l, sepby(id)
            
                 +-------------------------------+
                 |       cid          id   _freq |
                 |-------------------------------|
              1. |   6151201     6151000       7 |
              2. |   6151203     6151000       3 |
                 |-------------------------------|
              3. |  35363201    35363000       4 |
              4. |  35363202    35363000       4 |
              5. |  35363203    35363000       2 |
                 |-------------------------------|
              6. |  77982201    77982000       5 |
              7. |  77982202    77982000       5 |
                 |-------------------------------|
              8. | 118729201   118729000       5 |
              9. | 118729202   118729000       5 |
                 |-------------------------------|
             10. | 138339203   138339000       7 |
             11. | 138339204   138339000       3 |
                 |-------------------------------|
             12. | 157976201   157976000       6 |
             13. | 157976202   157976000       4 |
                 |-------------------------------|
             14. | 164090201   164090000       5 |
             15. | 164090202   164090000       5 |
                 |-------------------------------|
             16. | 191816201   191816000       5 |
             17. | 191816202   191816000       5 |
                 |-------------------------------|
             18. | 193344201   193344000       5 |
             19. | 193344202   193344000       4 |
             20. | 193344203   193344000       1 |
                 |-------------------------------|
             21. | 198387201   198387000       6 |
             22. | 198387202   198387000       4 |
                 +-------------------------------+
            
            .

            Comment


            • #7
              Re #3 & #5: As Andrew Musau points out, this example does not produce the problematic results you reported.:

              Code:
              . * Example generated by -dataex-. To install: ssc install dataex
              . clear
              
              . input long(cid id) int wave
              
                            cid            id      wave
                1.   6151201   6151000 10
                2.   6151201   6151000  5
                3.   6151201   6151000  6
                4.   6151201   6151000  9
                5.   6151201   6151000  7
                6.   6151201   6151000  8
                7.   6151201   6151000 11
                8.   6151203   6151000  9
                9.   6151203   6151000 11
               10.   6151203   6151000 10
               11.  35363201  35363000  6
               12.  35363201  35363000  4
               13.  35363201  35363000  7
               14.  35363201  35363000  8
               15.  35363202  35363000  8
               16.  35363202  35363000  7
               17.  35363202  35363000  4
               18.  35363202  35363000  6
               19.  35363203  35363000  8
               20.  35363203  35363000  7
               21.  77982201  77982000  6
               22.  77982201  77982000  3
               23.  77982201  77982000  8
               24.  77982201  77982000  9
               25.  77982201  77982000  2
               26.  77982202  77982000  9
               27.  77982202  77982000  6
               28.  77982202  77982000  8
               29.  77982202  77982000 10
               30.  77982202  77982000 11
               31. 118729201 118729000  9
               32. 118729201 118729000  4
               33. 118729201 118729000  6
               34. 118729201 118729000  8
               35. 118729201 118729000  5
               36. 118729202 118729000  6
               37. 118729202 118729000 11
               38. 118729202 118729000  9
               39. 118729202 118729000  8
               40. 118729202 118729000 10
               41. 138339203 138339000  2
               42. 138339203 138339000  6
               43. 138339203 138339000  5
               44. 138339203 138339000  3
               45. 138339203 138339000  4
               46. 138339203 138339000  7
               47. 138339203 138339000  8
               48. 138339204 138339000  8
               49. 138339204 138339000  7
               50. 138339204 138339000  6
               51. 157976201 157976000  7
               52. 157976201 157976000  6
               53. 157976201 157976000  8
               54. 157976201 157976000  4
               55. 157976201 157976000 11
               56. 157976201 157976000  9
               57. 157976202 157976000  7
               58. 157976202 157976000 11
               59. 157976202 157976000  9
               60. 157976202 157976000  8
               61. 164090201 164090000  6
               62. 164090201 164090000  7
               63. 164090201 164090000  5
               64. 164090201 164090000  4
               65. 164090201 164090000  8
               66. 164090202 164090000  8
               67. 164090202 164090000  7
               68. 164090202 164090000  9
               69. 164090202 164090000  5
               70. 164090202 164090000  6
               71. 191816201 191816000  8
               72. 191816201 191816000  9
               73. 191816201 191816000 10
               74. 191816201 191816000 11
               75. 191816201 191816000  7
               76. 191816202 191816000  7
               77. 191816202 191816000  9
               78. 191816202 191816000  8
               79. 191816202 191816000 10
               80. 191816202 191816000 11
               81. 193344201 193344000  7
               82. 193344201 193344000  6
               83. 193344201 193344000  4
               84. 193344201 193344000  5
               85. 193344201 193344000  3
               86. 193344202 193344000  5
               87. 193344202 193344000 11
               88. 193344202 193344000  6
               89. 193344202 193344000  7
               90. 193344203 193344000 11
               91. 198387201 198387000  5
               92. 198387201 198387000  2
               93. 198387201 198387000  4
               94. 198387201 198387000  6
               95. 198387201 198387000  7
               96. 198387201 198387000  8
               97. 198387202 198387000  5
               98. 198387202 198387000  8
               99. 198387202 198387000  7
              100. 198387202 198387000  6
              101. end
              
              . label values wave WAVE_prt2
              
              . label def WAVE_prt2 2 "2 2009/10", modify
              
              . label def WAVE_prt2 3 "3 2010/11", modify
              
              . label def WAVE_prt2 4 "4 2011/12", modify
              
              . label def WAVE_prt2 5 "5 2012/13", modify
              
              . label def WAVE_prt2 6 "6 2013/14", modify
              
              . label def WAVE_prt2 7 "7 2014/15", modify
              
              . label def WAVE_prt2 8 "8 2015/16", modify
              
              . label def WAVE_prt2 9 "9 2016/17", modify
              
              . label def WAVE_prt2 10 "10 2017/18", modify
              
              . label def WAVE_prt2 11 "11 2018/19", modify
              
              .
              . by id (cid), sort: gen n_children = sum(cid != cid[_n-1])
              
              . by id (cid): replace n_children = n_children[_N]
              (63 real changes made)
              
              .
              end of do-file
              
              . tab n_children
              
               n_children |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        2 |         80       80.00       80.00
                        3 |         20       20.00      100.00
              ------------+-----------------------------------
                    Total |        100      100.00
              I cannot see any reason that the code in #2 would produce incorrect results, and I cannot help thinking that you made some change that broke it.

              If you do present an example of data where that code fails, I am happy to try to figure out what is going wrong, but so far, I stand by my original code.

              By the way, the code in #6 functions a bit differently from what I posted in #2. The code in #2 tags each observation with a count of the number of distinct children each mother is associated with in the data set. The code in #6 instead creates a dichotomous variable that indicates whether there are multiple children, or only one.

              Comment


              • #8
                Hello ,

                I tested it again after I changed my sample and now it works perfectly.
                Thanks to both Clyde and Andrew for your helpful input.

                Comment

                Working...
                X