Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting duplicate IDs to make IDs uniquely identifiable in household survey.

    Dear Experts,

    I am conducting a survey analysis and facing challenges with making case IDs uniquely identifiable. I hope someone can assist me with it.

    I have household survey data with more than 9000 observations and my ID variable is caseid. v001 is the cluster number, v133 is education in single years, and so on. After carefully looking at the data, I noticed some minor differences in a few repeated caseids that are confusing. While most responses are the same across caseids, I noticed that the variable
    Code:
    bord
    had different responses between repeated caseids, and the variable
    b11
    follows a similar pattern.

    I've followed the following stages to resolve the problem.

    1. I sorted the data using the following code:
    Code:
    sort caseid
    2. then I tested for unique identification using
    Code:
    isid caseid
    and Stata reported that
    Code:
    variable caseid does not uniquely identify the observations
    [CODE]
    3. The trouble begins when I run a duplicate report using
    Code:
    duplicates report
    and
    Code:
    duplicates list
    . At this point, Stata produce results suggesting no duplicates (if I am correct).


    copies observations surplus

    1 9733 0


    4. But when I run Duplicates in terms of caseid using the code
    Code:
    duplicates report caseid
    , I obtain the following results suggesting that there are duplicate IDs. My concern now is on how to proceed.
    copies observations surplus
    1 3719 0
    2 4518 2259
    3 1347 898
    4 124 93
    5 25 20

    5. Now, I try to drop duplicate IDS
    Code:
     duplicates drop
    , but get results that
    (0 observations are duplicates)
    6. What I've done is flag the duplicate IDs as suggested by one member in this forum
    Code:
    duplicates tag caseid, gen(flag)
    . My data example is shown below:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 caseid int v001 byte(v101 v133 v012 v714 hw1 bord v190) int hw70 byte(v152 v632) int b11 byte flag
    "       1 125  2" 1 5  9 31 1 22  4 2 -255 47 .  62 0
    "       1 125  3" 1 5  0 19 0  .  1 2    . 47 .   . 0
    "       1 147  2" 1 5 10 22 1  .  1 4    . 38 .   . 0
    "       1 191  3" 1 5  1 20 0  .  1 3    . 30 .   . 0
    "       1 198  1" 1 5  6 26 1 50  4 3   22 26 .  22 0
    "       1 252  2" 1 5  0 37 0  2  8 3 -292 51 .  34 1
    "       1 252  2" 1 5  0 37 0 36  7 3 -405 51 .  37 1
    "       1 310  3" 1 5  6 15 0  .  1 2    . 50 .   . 0
    "       1 334  2" 1 5  5 36 1 16  9 2 -214 45 .  46 0
    "       1 819  1" 1 5  4 41 1  .  4 3    . 41 .  28 1
    "       1 819  1" 1 5  4 41 1  .  3 3    . 41 .  97 1
    "       1 897  1" 1 5  0 40 0  2 13 3 -106 40 .  36 1
    "       1 897  1" 1 5  0 40 0 39 12 3 -471 40 .  28 1
    "       11040  1" 1 5  3 32 0  .  2 3    . 32 . 115 0
    "       11052  2" 1 5  3 16 0 11  1 2 -185 25 .   . 0
    "       11103  2" 1 5  0 38 1  .  7 2    . 45 .  33 0
    "       11128  2" 1 5  0 26 1  3  4 3   75 37 .  27 1
    "       11128  2" 1 5  0 26 1 31  3 3 -181 37 .  39 1
    "       11204  2" 1 5  6 27 1  5  5 2 -153 28 .  29 2
    "       11204  2" 1 5  6 27 1 34  4 2  -68 28 .  22 2
    "       11204  2" 1 5  6 27 1 56  3 2  -20 28 .  11 2
    "       2  16  5" 2 4  8 18 0  .  2 3    . 56 .  31 1
    "       2  16  5" 2 4  8 18 0  .  1 3    . 56 .   . 1
    "       2  16  7" 2 4  6 36 1  .  6 3    . 56 .  42 0
    "       2  75  2" 2 4  4 38 1  .  5 2    . 44 .  27 1
    "       2  75  2" 2 4  4 38 1  .  4 2    . 44 .  69 1
    "       2 191  1" 2 4  6 33 1  .  5 2    . 30 1  44 0
    "       2 240  4" 2 4  6 22 1 15  1 1 -345 80 .   . 0
    "       2 431  2" 2 4  6 38 1  .  6 3    . 52 .  14 1
    "       2 431  2" 2 4  6 38 1  .  5 3    . 52 .  30 1
    "       2 432  3" 2 4  5 25 1  1  6 2  -47 48 .  41 2
    "       2 432  3" 2 4  5 25 1  .  5 2    . 48 .  23 2
    "       2 432  3" 2 4  5 25 1 43  4 2 -515 48 .  23 2
    "       2 608  2" 2 4  6 35 1 41  5 2  -37 32 .  39 0
    "       2 681  3" 2 4  9 23 1 30  2 2 -120 56 3  44 0
    "       2 689  2" 2 4  6 28 1  6  6 1  116 38 .  30 1
    "       2 689  2" 2 4  6 28 1  .  5 1    . 38 .  30 1
    "       2 757  3" 2 4  6 35 1  1  7 3 -257 85 .  31 1
    "       2 757  3" 2 4  6 35 1 32  6 3 -483 85 .  30 1
    "       2 757 12" 2 4  6 27 1 17  4 3 -145 85 1  23 1
    "       2 757 12" 2 4  6 27 1 41  3 3   21 85 1  42 1
    "       2 757 18" 2 4  4 23 1 25  3 3 -367 85 .  27 1
    "       2 757 18" 2 4  4 23 1 53  2 3 -289 85 .  30 1
    "       2 757 22" 2 4  9 22 1  1  4 3 -353 85 .  32 1
    "       2 757 22" 2 4  9 22 1 33  3 3 -403 85 .  28 1
    "       2 764  2" 2 4  7 23 1  .  4 3    . 30 .  27 2
    "       2 764  2" 2 4  7 23 1  .  3 3    . 30 .  24 2
    "       2 764  2" 2 4  7 23 1  .  2 3    . 30 .  26 2
    "       2 905  2" 2 4 11 24 1  .  1 3    . 35 .   . 0
    "       21024  2" 2 4  2 32 1  .  7 2    . 48 .  52 0
    "       21138  2" 2 4  5 33 1  .  7 1    . 35 .  43 1
    "       21138  2" 2 4  5 33 1  .  6 1    . 35 .  27 1
    "       21138  9" 2 4  5 25 1  .  5 1    . 35 .  27 2
    "       21138  9" 2 4  5 25 1  .  4 1    . 35 .  24 2
    "       21138  9" 2 4  5 25 1  .  3 1    . 35 .  26 2
    "       3  62  3" 3 2  4 40 1  . 10 3    . 64 .  24 1
    "       3  62  3" 3 2  4 40 1  .  9 3    . 64 .  40 1
    "       3 191  2" 3 2  8 25 1 13  4 3 -177 26 .  30 1
    "       3 191  2" 3 2  8 25 1 43  3 3 9998 26 .  33 1
    "       3 368  5" 3 2  7 27 1 32  4 3  102 37 3  34 0
    "       3 368  9" 3 2 13 27 0 14  2 3 -139 37 3  22 1
    "       3 368  9" 3 2 13 27 0 37  1 3  -81 37 3   . 1
    "       3 369  2" 3 2  1 45 1 17 10 2 -295 47 3  70 0
    "       3 369  3" 3 2  5 18 0 30  1 2 -214 47 .   . 0
    "       3 478  2" 3 2  8 29 1 30  4 4 -260 45 .  23 1
    "       3 478  2" 3 2  8 29 1 53  3 4 -308 45 .  34 1
    "       3 478  8" 3 2 14 28 0  .  1 4    . 45 .   . 0
    "       3 552  2" 3 2  6 40 1 29  7 3  -46 58 .  84 0
    "       3 552  3" 3 2  7 18 0 19  1 3 -253 58 .   . 0
    "       3 665  2" 3 2 12 23 1  .  1 3    . 22 3   . 0
    "       3 712  6" 3 2  8 27 1  .  2 5    . 53 .   . 1
    "       3 712  6" 3 2  8 27 1  .  1 5    . 53 .   . 1
    "       3 715 12" 3 2  3 30 1 41  3 4   -1 57 .  74 0
    "       3 724  9" 3 2  5 43 1  .  8 2    . 48 .  48 0
    "       3 756  5" 3 2  6 24 1 34  3 2 -236 49 .  42 0
    "       3 814  4" 3 2 10 17 0  .  1 3    . 54 .   . 0
    "       3 814 11" 3 2 11 22 0  .  1 3    . 54 .   . 0
    "       3 845  3" 3 2  5 27 1  .  4 3    . 55 .  27 1
    "       3 845  3" 3 2  5 27 1  .  3 3    . 55 .  28 1
    "       3 879  3" 3 2 11 22 0 32  1 3    6 49 .   . 0
    "       3 896  3" 3 2  6 21 1 18  2 3 -155 70 .  23 1
    "       3 896  3" 3 2  6 21 1 42  1 3 -138 70 .   . 1
    "       3 896  6" 3 2  6 25 1 27  3 3 -117 70 .  48 0
    "       3 932  7" 3 2  6 33 1  .  1 5    . 43 .   . 0
    "       3 932  8" 3 2  9 26 1  .  1 5    . 43 .   . 0
    "       3 957  2" 3 2  8 20 1  7  1 4  157 42 3   . 0
    "       31005  2" 3 2  6 31 1  .  3 3    . 35 .  47 0
    "       31005  8" 3 2  8 26 1  .  4 3    . 35 .  29 1
    "       31005  8" 3 2  8 26 1  .  3 3    . 35 .  53 1
    "       31007  3" 3 2  7 22 1  9  3 3    6 53 .  16 1
    "       31007  3" 3 2  7 22 1  .  2 3    . 53 .  47 1
    "       31007  4" 3 2  7 19 1  .  2 3    . 53 .  31 1
    "       31007  4" 3 2  7 19 1 49  1 3 -166 53 .   . 1
    "       31038  3" 3 2 11 19 1  .  1 3    . 56 3   . 0
    "       31038  4" 3 2  9 29 1  .  3 3    . 56 3  58 0
    "       31038 11" 3 2  6 26 1  .  1 3    . 56 .   . 0
    "       31038 12" 3 2  7 19 0  .  1 3    . 56 .   . 0
    "       31038 13" 3 2  7 34 1  .  6 3    . 56 .  66 1
    "       31038 13" 3 2  7 34 1  .  5 3    . 56 .  66 1
    "       31072  2" 3 2  9 23 1  .  1 3    . 39 .   . 0
    end
    label values v101 V101
    label def V101 2 "centre (without yaounde)", modify
    label def V101 4 "east", modify
    label def V101 5 "far-north", modify
    label values v133 V133
    label values v714 V714
    label def V714 0 "no", modify
    label def V714 1 "yes", modify
    label values v190 V190
    label def V190 1 "poorest", modify
    label def V190 2 "poorer", modify
    label def V190 3 "middle", modify
    label def V190 4 "richer", modify
    label def V190 5 "richest", modify
    label values hw70 HW70
    label def HW70 9998 "flagged cases", modify
    label values v152 V152
    label values v632 V632
    label def V632 1 "mainly respondent", modify
    label def V632 3 "joint decision", modify
    Thank you in advance!
    Last edited by Beri Parfait; 02 Nov 2024, 16:04.

  • #2
    In addition to bord and b11, hw1 and hw70 also can take on different values within a caseid cluster. There may be other variables like that you haven't yet noticed. I suggest you do this:

    Code:
    local conflicts
    foreach v of varlist _all {
        by caseid (`v'), sort: gen byte flag_`v' = `v'[1] != `v'[_N]
        capture assert flag_`v' == 0
        if c(rc) != 0 {
            local conflicts `conflicts' `v'
        }
    }
    display `"`conflicts'"'
    egen any_flag = rowmax(flag_*)
    browse caseid `conflicts' if any_flag
    The output of the -display- command will give you a complete list of all the variables that take on different values within a single caseid, and the -browse- command will show all the relevant variables to you.

    You then have to figure out what to do about it. It may be that one or more of these variables needs to be used in conjunction with caseid to form a true unique observation identification variable. Or it may be that these contrasting observations with the same caseid represent data errors that you have to figure out how to fix. Nothing more specific can be said at this point because I have no idea what these variables are nor what they mean in the real world. You will have to use your understanding of the meaning of the variables to decide which, if any, can be used with caseid to form a unique id variable, and how to resolve the conflicting values of these variables among the others. Ways of resolving conflicts involve retaining just one of the values, or combining the values in some way such as a mean or max or median or min, or with some other mathematical expression, replacing all of the conflicting values with missing value, or dropping the entire group of observations with the caseid if there is no basis for any of the other actions.

    Comment


    • #3
      Dear Clyde Schechter

      Thank you for your feedback and observation. I apologise for the long write-up. I want to be explicit so that readers can comprehend the data and assist (Make IDs uniquely identifiable). I have gone through the suggestions, You mentioned that:

      Ways of resolving conflicts involve retaining just one of the values, or combining the values in some way such as a mean or max or median or min, or with some other mathematical expression, replacing all of the conflicting values with missing value, or dropping the entire group of observations with the caseid if there is no basis for any of the other actions.
      My initial intuition was to drop duplicates, but now that there are variations in some responses, that might be a terrible idea. If I retain just one of the values, I will need to justify but it is not yet clear to me even how to proceed with doing that.

      The objective is to examine the effect of maternal education (v133) on a child's nutritional status (hw70). The data comes from the X-country Demographic Health Survey (DHS).

      I ran the code in #2 and discovered the full list of variables in the dataset with different values within a caseid cluster. They are as defined below:

      hw1: child's age in months
      bord: birth order number: I do not need this variable and it will be safely dropped from the data
      hw70: height/age standard deviation (new who)
      b11: preceding birth interval (months): I do not need this variable and it will be safely dropped from the data
      hw57: anemia level (categorical)
      hw71: weight/age standard deviation (new who)
      hw72: weight/height standard deviation (new who)
      hw73: bmi standard deviation (new who)

      To further shed light on the dataset, I add the variables important for understanding the survey:
      caseid: case identification
      v001: cluster number
      v101: region (the country has ten regions)
      v133: education in single years

      v745b v739 v743f are additional variables I added to the dataex but the responses are not conflicting.

      I would appreciate any further suggestions you may have on how to proceed if the dataset is clearer enough!
      Thank you


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str15 caseid int v001 byte(v101 v133 hw1 bord) int(hw70 b11) byte hw57 int(hw71 hw72 hw73) byte(v745b v739 v743f)
      "       1 125  2" 1 5  9 22  4 -255  62 4 -136  -13   38 0 1 4
      "       1 125  3" 1 5  0  .  1    .   . .    .    .    . 0 . .
      "       1 147  2" 1 5 10  .  1    .   . .    .    .    . 0 4 4
      "       1 191  3" 1 5  1  .  1    .   . .    .    .    . 0 . 4
      "       1 198  1" 1 5  6 50  4   22  22 4  -38  -80  -81 0 1 4
      "       1 252  2" 1 5  0  2  8 -292  34 . -250   41 -124 0 . 4
      "       1 252  2" 1 5  0 36  7 -405  37 2 -244   12   57 0 . 4
      "       1 310  3" 1 5  6  .  1    .   . .    .    .    . 0 . 4
      "       1 334  2" 1 5  5 16  9 -214  46 3 -144  -58  -17 0 . 4
      "       1 819  1" 1 5  4  .  3    .  97 .    .    .    . 2 1 4
      "       1 819  1" 1 5  4  .  4    .  28 .    .    .    . 2 1 4
      "       1 897  1" 1 5  0  2 13 -106  36 . -134  -71 -104 0 . 4
      "       1 897  1" 1 5  0 39 12 -471  28 3 -221   87  167 0 . 4
      "       11040  1" 1 5  3  .  2    . 115 .    .    .    . 0 . .
      "       11052  2" 1 5  3 11  1 -185   . 3 -156  -87  -64 2 . 4
      "       11103  2" 1 5  0  .  7    .  33 .    .    .    . 0 1 4
      "       11128  2" 1 5  0  3  4   75  27 .  118   98  102 0 1 4
      "       11128  2" 1 5  0 31  3 -181  39 3  -31   96  125 0 1 4
      "       11204  2" 1 5  6  5  5 -153  29 . -344 -352 -357 0 1 2
      "       11204  2" 1 5  6 56  3  -20  11 4   54   96  100 0 1 2
      "       11204  2" 1 5  6 34  4  -68  22 4   90  184  194 0 1 2
      "       2  16  5" 2 4  8  .  1    .   . .    .    .    . 0 . .
      "       2  16  5" 2 4  8  .  2    .  31 .    .    .    . 0 . .
      "       2  16  7" 2 4  6  .  6    .  42 .    .    .    . 2 . .
      "       2  75  2" 2 4  4  .  5    .  27 .    .    .    . 2 2 4
      "       2  75  2" 2 4  4  .  4    .  69 .    .    .    . 2 2 4
      "       2 191  1" 2 4  6  .  5    .  44 .    .    .    . 1 1 2
      "       2 240  4" 2 4  6 15  1 -345   . 2  -18  203  258 0 . .
      "       2 431  2" 2 4  6  .  5    .  30 .    .    .    . 0 2 2
      "       2 431  2" 2 4  6  .  6    .  14 .    .    .    . 0 2 2
      "       2 432  3" 2 4  5  1  6  -47  41 .  -75  -43  -73 0 1 4
      "       2 432  3" 2 4  5 43  4 -515  23 4 -282   49  135 0 1 4
      "       2 432  3" 2 4  5  .  5    .  23 .    .    .    . 0 1 4
      "       2 608  2" 2 4  6 41  5  -37  39 3  -39  -24  -27 0 1 2
      "       2 681  3" 2 4  9 30  2 -120  44 3  -18   64   84 0 1 4
      "       2 689  2" 2 4  6  6  6  116  30 2   65   14    0 0 4 2
      "       2 689  2" 2 4  6  .  5    .  30 .    .    .    . 0 4 2
      "       2 757  3" 2 4  6 32  6 -483  30 4 -229   82  158 0 . 2
      "       2 757  3" 2 4  6  1  7 -257  31 .  103 9998  373 0 . 2
      "       2 757 12" 2 4  6 41  3   21  42 4    2  -13  -20 0 1 4
      "       2 757 12" 2 4  6 17  4 -145  23 4  -56   14   45 0 1 4
      "       2 757 18" 2 4  4 25  3 -367  27 2 -234  -42   19 0 1 4
      "       2 757 18" 2 4  4 53  2 -289  30 2 -143   53   81 0 1 4
      "       2 757 22" 2 4  9 33  3 -403  28 3 -230    2   69 0 . 2
      "       2 757 22" 2 4  9  1  4 -353  32 .  -92  342  162 0 . 2
      "       2 764  2" 2 4  7  .  2    .  26 .    .    .    . 2 1 4
      "       2 764  2" 2 4  7  .  4    .  27 .    .    .    . 2 1 4
      "       2 764  2" 2 4  7  .  3    .  24 .    .    .    . 2 1 4
      "       2 905  2" 2 4 11  .  1    .   . .    .    .    . 0 2 2
      "       21024  2" 2 4  2  .  7    .  52 .    .    .    . 2 2 2
      "       21138  2" 2 4  5  .  6    .  27 .    .    .    . 0 1 4
      "       21138  2" 2 4  5  .  7    .  43 .    .    .    . 0 1 4
      "       21138  9" 2 4  5  .  5    .  27 .    .    .    . 0 1 4
      "       21138  9" 2 4  5  .  4    .  24 .    .    .    . 0 1 4
      "       21138  9" 2 4  5  .  3    .  26 .    .    .    . 0 1 4
      "       3  62  3" 3 2  4  . 10    .  24 .    .    .    . 2 2 2
      "       3  62  3" 3 2  4  .  9    .  40 .    .    .    . 2 2 2
      "       3 191  2" 3 2  8 13  4 -177  30 4 -222 -187 -163 0 2 2
      "       3 191  2" 3 2  8 43  3 9998  33 2 -173  361  463 0 2 2
      "       3 368  5" 3 2  7 32  4  102  34 4   82   42   23 0 2 2
      "       3 368  9" 3 2 13 14  2 -139  22 4  -12   67   95 0 . 2
      "       3 368  9" 3 2 13 37  1  -81   . 4   79  183  192 0 . 2
      "       3 369  2" 3 2  1 17 10 -295  70 3  -83   76  137 2 1 4
      "       3 369  3" 3 2  5 30  1 -214   . 3 -157  -57  -29 0 . .
      "       3 478  2" 3 2  8 30  4 -260  23 3 -222 -111  -79 0 1 1
      "       3 478  2" 3 2  8 53  3 -308  34 3 -219  -41  -20 0 1 1
      "       3 478  8" 3 2 14  .  1    .   . .    .    .    . 0 . .
      "       3 552  2" 3 2  6 29  7  -46  84 4   48  100  110 0 . .
      "       3 552  3" 3 2  7 19  1 -253   . 3   -8  151  201 0 . .
      "       3 665  2" 3 2 12  .  1    .   . .    .    .    . 0 1 4
      "       3 712  6" 3 2  8  .  2    .   . .    .    .    . 0 . .
      "       3 712  6" 3 2  8  .  1    .   . .    .    .    . 0 . .
      "       3 715 12" 3 2  3 41  3   -1  74 2   42   64   61 0 2 2
      "       3 724  9" 3 2  5  .  8    .  48 .    .    .    . 0 . .
      "       3 756  5" 3 2  6 34  3 -236  42 4 -113   26   62 0 2 2
      "       3 814  4" 3 2 10  .  1    .   . .    .    .    . 0 . .
      "       3 814 11" 3 2 11  .  1    .   . .    .    .    . 0 . .
      "       3 845  3" 3 2  5  .  3    .  28 .    .    .    . 0 2 2
      "       3 845  3" 3 2  5  .  4    .  27 .    .    .    . 0 2 2
      "       3 879  3" 3 2 11 32  1    6   . 2  106  148  146 0 . .
      "       3 896  3" 3 2  6 18  2 -155  23 4  -15   78  109 0 . .
      "       3 896  3" 3 2  6 42  1 -138   . 4   40  172  181 0 . .
      "       3 896  6" 3 2  6 27  3 -117  48 3   20  114  134 0 2 2
      "       3 932  7" 3 2  6  .  1    .   . .    .    .    . 0 2 2
      "       3 932  8" 3 2  9  .  1    .   . .    .    .    . 0 . .
      "       3 957  2" 3 2  8  7  1  157   . 4  179  145  126 0 1 2
      "       31005  2" 3 2  6  .  3    .  47 .    .    .    . 0 2 2
      "       31005  8" 3 2  8  .  3    .  53 .    .    .    . 1 . .
      "       31005  8" 3 2  8  .  4    .  29 .    .    .    . 1 . .
      "       31007  3" 3 2  7  9  3    6  16 4  -40  -53  -57 0 1 2
      "       31007  3" 3 2  7  .  2    .  47 .    .    .    . 0 1 2
      "       31007  4" 3 2  7 49  1 -166   . 4 -184 -124 -116 0 . .
      "       31007  4" 3 2  7  .  2    .  31 .    .    .    . 0 . .
      "       31038  3" 3 2 11  .  1    .   . .    .    .    . 0 2 2
      "       31038  4" 3 2  9  .  3    .  58 .    .    .    . 0 1 4
      "       31038 11" 3 2  6  .  1    .   . .    .    .    . 0 2 2
      "       31038 12" 3 2  7  .  1    .   . .    .    .    . 0 . .
      "       31038 13" 3 2  7  .  6    .  66 .    .    .    . 0 . .
      "       31038 13" 3 2  7  .  5    .  66 .    .    .    . 0 . .
      "       31072  2" 3 2  9  .  1    .   . .    .    .    . 0 2 2
      end
      label values v101 V101
      label def V101 2 "centre (without yaounde)", modify
      label def V101 4 "east", modify
      label def V101 5 "far-north", modify
      label values v133 V133
      label values hw70 HW70
      label def HW70 9998 "flagged cases", modify
      label values hw57 HW57
      label def HW57 2 "moderate", modify
      label def HW57 3 "mild", modify
      label def HW57 4 "not anemic", modify
      label values hw71 HW71
      label values hw72 HW72
      label def HW72 9998 "flagged cases", modify
      label values hw73 HW73
      label values v745b V745B
      label def V745B 0 "does not own", modify
      label def V745B 1 "alone only", modify
      label def V745B 2 "jointly only", modify
      label values v739 V739
      label def V739 1 "respondent alone", modify
      label def V739 2 "respondent and husband/partner", modify
      label def V739 4 "husband/partner alone", modify
      label values v743f V743F
      label def V743F 1 "respondent alone", modify
      label def V743F 2 "respondent and husband/partner", modify
      label def V743F 4 "husband/partner alone", modify
      Last edited by Beri Parfait; 03 Nov 2024, 01:05.

      Comment


      • #4
        Update,

        What I did was I run the following code to drop duplicates in terms of caseid.

        Code:
         duplicates drop caseid, force
        . This is a code I got from the forum. The outcome is frightening as more than 3000 duplicates were dropped.

        Code:
        (3,270 observations deleted)
        I'm stuck now with finding a suitable explanation for what I did, why I did it and how it reshaped my data. Since I do not have a background in coding, I'm not sure if it is reasonable. Perhaps someone could give me more insights about the process and its implications.

        Comment


        • #5
          Dear Stata experts,

          With respect to the comment made by Cyde in #2, The following comment is a reasonable way to proceed:

          The output of the -display- command will give you a complete list of all the variables that take on different values within a single caseid, and the -browse- command will show all the relevant variables to you. You then have to figure out what to do about it. It may be that one or more of these variables needs to be used in conjunction with caseid to form a true unique observation identification variable.
          From reading the published report, the data contains an interview that was done with mothers that had children aged 0-5. In the dataset, there is a variable (b2) indicating when each child was born. Some parents had more than one child that was within the above age group, which is the reason why there are repeated cases, since the parents provided responses for each of their children. Please see a summary of children for which the parents were responding.

          year of |
          birth | Freq. Percent Cum.
          ------------+-----------------------------------
          2013 | 431 4.43 4.43
          2014 | 2,007 20.62 25.05
          2015 | 1,922 19.75 44.80
          2016 | 1,936 19.89 64.69
          2017 | 1,969 20.23 84.92
          2018 | 1,468 15.08 100.00
          ------------+-----------------------------------
          Total | 9,733 100.00



          So, my major preoccupation now is to use caseid in conjunction with b2 to form a true unique observation identification variable as suggested by Clyde. Except for b2 that also varies across caseid (mothers who were interviewed), all the other variables remain the same as explained in #3.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str15 caseid int v001 byte(v101 v133) int b2 byte(hw1 bord) int(hw70 b11) byte hw57 int(hw71 hw72 hw73) byte(v745b v739 v743f)
          "       1 125  2" 1 5  9 2016 22  4 -255  62 4 -136  -13   38 0 1 4
          "       1 125  3" 1 5  0 2015  .  1    .   . .    .    .    . 0 . .
          "       1 147  2" 1 5 10 2015  .  1    .   . .    .    .    . 0 4 4
          "       1 191  3" 1 5  1 2016  .  1    .   . .    .    .    . 0 . 4
          "       1 198  1" 1 5  6 2014 50  4   22  22 4  -38  -80  -81 0 1 4
          "       1 252  2" 1 5  0 2015 36  7 -405  37 2 -244   12   57 0 . 4
          "       1 252  2" 1 5  0 2018  2  8 -292  34 . -250   41 -124 0 . 4
          "       1 310  3" 1 5  6 2018  .  1    .   . .    .    .    . 0 . 4
          "       1 334  2" 1 5  5 2017 16  9 -214  46 3 -144  -58  -17 0 . 4
          "       1 819  1" 1 5  4 2014  .  3    .  97 .    .    .    . 2 1 4
          "       1 819  1" 1 5  4 2016  .  4    .  28 .    .    .    . 2 1 4
          "       1 897  1" 1 5  0 2015 39 12 -471  28 3 -221   87  167 0 . 4
          "       1 897  1" 1 5  0 2018  2 13 -106  36 . -134  -71 -104 0 . 4
          "       11040  1" 1 5  3 2017  .  2    . 115 .    .    .    . 0 . .
          "       11052  2" 1 5  3 2017 11  1 -185   . 3 -156  -87  -64 2 . 4
          "       11103  2" 1 5  0 2016  .  7    .  33 .    .    .    . 0 1 4
          "       11128  2" 1 5  0 2015 31  3 -181  39 3  -31   96  125 0 1 4
          "       11128  2" 1 5  0 2018  3  4   75  27 .  118   98  102 0 1 4
          "       11204  2" 1 5  6 2013 56  3  -20  11 4   54   96  100 0 1 2
          "       11204  2" 1 5  6 2015 34  4  -68  22 4   90  184  194 0 1 2
          "       11204  2" 1 5  6 2018  5  5 -153  29 . -344 -352 -357 0 1 2
          "       2  16  5" 2 4  8 2017  .  2    .  31 .    .    .    . 0 . .
          "       2  16  5" 2 4  8 2014  .  1    .   . .    .    .    . 0 . .
          "       2  16  7" 2 4  6 2014  .  6    .  42 .    .    .    . 2 . .
          "       2  75  2" 2 4  4 2014  .  4    .  69 .    .    .    . 2 2 4
          "       2  75  2" 2 4  4 2016  .  5    .  27 .    .    .    . 2 2 4
          "       2 191  1" 2 4  6 2015  .  5    .  44 .    .    .    . 1 1 2
          "       2 240  4" 2 4  6 2017 15  1 -345   . 2  -18  203  258 0 . .
          "       2 431  2" 2 4  6 2016  .  6    .  14 .    .    .    . 0 2 2
          "       2 431  2" 2 4  6 2015  .  5    .  30 .    .    .    . 0 2 2
          "       2 432  3" 2 4  5 2015 43  4 -515  23 4 -282   49  135 0 1 4
          "       2 432  3" 2 4  5 2015  .  5    .  23 .    .    .    . 0 1 4
          "       2 432  3" 2 4  5 2018  1  6  -47  41 .  -75  -43  -73 0 1 4
          "       2 608  2" 2 4  6 2015 41  5  -37  39 3  -39  -24  -27 0 1 2
          "       2 681  3" 2 4  9 2016 30  2 -120  44 3  -18   64   84 0 1 4
          "       2 689  2" 2 4  6 2018  6  6  116  30 2   65   14    0 0 4 2
          "       2 689  2" 2 4  6 2015  .  5    .  30 .    .    .    . 0 4 2
          "       2 757  3" 2 4  6 2016 32  6 -483  30 4 -229   82  158 0 . 2
          "       2 757  3" 2 4  6 2018  1  7 -257  31 .  103 9998  373 0 . 2
          "       2 757 12" 2 4  6 2017 17  4 -145  23 4  -56   14   45 0 1 4
          "       2 757 12" 2 4  6 2015 41  3   21  42 4    2  -13  -20 0 1 4
          "       2 757 18" 2 4  4 2014 53  2 -289  30 2 -143   53   81 0 1 4
          "       2 757 18" 2 4  4 2016 25  3 -367  27 2 -234  -42   19 0 1 4
          "       2 757 22" 2 4  9 2016 33  3 -403  28 3 -230    2   69 0 . 2
          "       2 757 22" 2 4  9 2018  1  4 -353  32 .  -92  342  162 0 . 2
          "       2 764  2" 2 4  7 2018  .  4    .  27 .    .    .    . 2 1 4
          "       2 764  2" 2 4  7 2016  .  3    .  24 .    .    .    . 2 1 4
          "       2 764  2" 2 4  7 2014  .  2    .  26 .    .    .    . 2 1 4
          "       2 905  2" 2 4 11 2017  .  1    .   . .    .    .    . 0 2 2
          "       21024  2" 2 4  2 2014  .  7    .  52 .    .    .    . 2 2 2
          "       21138  2" 2 4  5 2015  .  6    .  27 .    .    .    . 0 1 4
          "       21138  2" 2 4  5 2018  .  7    .  43 .    .    .    . 0 1 4
          "       21138  9" 2 4  5 2018  .  5    .  27 .    .    .    . 0 1 4
          "       21138  9" 2 4  5 2014  .  3    .  26 .    .    .    . 0 1 4
          "       21138  9" 2 4  5 2016  .  4    .  24 .    .    .    . 0 1 4
          "       3  62  3" 3 2  4 2016  . 10    .  24 .    .    .    . 2 2 2
          "       3  62  3" 3 2  4 2014  .  9    .  40 .    .    .    . 2 2 2
          "       3 191  2" 3 2  8 2015 43  3 9998  33 2 -173  361  463 0 2 2
          "       3 191  2" 3 2  8 2017 13  4 -177  30 4 -222 -187 -163 0 2 2
          "       3 368  5" 3 2  7 2016 32  4  102  34 4   82   42   23 0 2 2
          "       3 368  9" 3 2 13 2017 14  2 -139  22 4  -12   67   95 0 . 2
          "       3 368  9" 3 2 13 2015 37  1  -81   . 4   79  183  192 0 . 2
          "       3 369  2" 3 2  1 2017 17 10 -295  70 3  -83   76  137 2 1 4
          "       3 369  3" 3 2  5 2016 30  1 -214   . 3 -157  -57  -29 0 . .
          "       3 478  2" 3 2  8 2016 30  4 -260  23 3 -222 -111  -79 0 1 1
          "       3 478  2" 3 2  8 2014 53  3 -308  34 3 -219  -41  -20 0 1 1
          "       3 478  8" 3 2 14 2016  .  1    .   . .    .    .    . 0 . .
          "       3 552  2" 3 2  6 2016 29  7  -46  84 4   48  100  110 0 . .
          "       3 552  3" 3 2  7 2017 19  1 -253   . 3   -8  151  201 0 . .
          "       3 665  2" 3 2 12 2014  .  1    .   . .    .    .    . 0 1 4
          "       3 712  6" 3 2  8 2017  .  1    .   . .    .    .    . 0 . .
          "       3 712  6" 3 2  8 2017  .  2    .   . .    .    .    . 0 . .
          "       3 715 12" 3 2  3 2015 41  3   -1  74 2   42   64   61 0 2 2
          "       3 724  9" 3 2  5 2014  .  8    .  48 .    .    .    . 0 . .
          "       3 756  5" 3 2  6 2015 34  3 -236  42 4 -113   26   62 0 2 2
          "       3 814  4" 3 2 10 2017  .  1    .   . .    .    .    . 0 . .
          "       3 814 11" 3 2 11 2014  .  1    .   . .    .    .    . 0 . .
          "       3 845  3" 3 2  5 2017  .  4    .  27 .    .    .    . 0 2 2
          "       3 845  3" 3 2  5 2015  .  3    .  28 .    .    .    . 0 2 2
          "       3 879  3" 3 2 11 2016 32  1    6   . 2  106  148  146 0 . .
          "       3 896  3" 3 2  6 2017 18  2 -155  23 4  -15   78  109 0 . .
          "       3 896  3" 3 2  6 2015 42  1 -138   . 4   40  172  181 0 . .
          "       3 896  6" 3 2  6 2016 27  3 -117  48 3   20  114  134 0 2 2
          "       3 932  7" 3 2  6 2017  .  1    .   . .    .    .    . 0 2 2
          "       3 932  8" 3 2  9 2016  .  1    .   . .    .    .    . 0 . .
          "       3 957  2" 3 2  8 2018  7  1  157   . 4  179  145  126 0 1 2
          "       31005  2" 3 2  6 2017  .  3    .  47 .    .    .    . 0 2 2
          "       31005  8" 3 2  8 2014  .  3    .  53 .    .    .    . 1 . .
          "       31005  8" 3 2  8 2016  .  4    .  29 .    .    .    . 1 . .
          "       31007  3" 3 2  7 2018  9  3    6  16 4  -40  -53  -57 0 1 2
          "       31007  3" 3 2  7 2016  .  2    .  47 .    .    .    . 0 1 2
          "       31007  4" 3 2  7 2014 49  1 -166   . 4 -184 -124 -116 0 . .
          "       31007  4" 3 2  7 2017  .  2    .  31 .    .    .    . 0 . .
          "       31038  3" 3 2 11 2018  .  1    .   . .    .    .    . 0 2 2
          "       31038  4" 3 2  9 2015  .  3    .  58 .    .    .    . 0 1 4
          "       31038 11" 3 2  6 2016  .  1    .   . .    .    .    . 0 2 2
          "       31038 12" 3 2  7 2015  .  1    .   . .    .    .    . 0 . .
          "       31038 13" 3 2  7 2015  .  6    .  66 .    .    .    . 0 . .
          "       31038 13" 3 2  7 2015  .  5    .  66 .    .    .    . 0 . .
          "       31072  2" 3 2  9 2016  .  1    .   . .    .    .    . 0 2 2
          end
          label values v101 V101
          label def V101 2 "centre (without yaounde)", modify
          label def V101 4 "east", modify
          label def V101 5 "far-north", modify
          label values v133 V133
          label values hw70 HW70
          label def HW70 9998 "flagged cases", modify
          label values hw57 HW57
          label def HW57 2 "moderate", modify
          label def HW57 3 "mild", modify
          label def HW57 4 "not anemic", modify
          label values hw71 HW71
          label values hw72 HW72
          label def HW72 9998 "flagged cases", modify
          label values hw73 HW73
          label values v745b V745B
          label def V745B 0 "does not own", modify
          label def V745B 1 "alone only", modify
          label def V745B 2 "jointly only", modify
          label values v739 V739
          label def V739 1 "respondent alone", modify
          label def V739 2 "respondent and husband/partner", modify
          label def V739 4 "husband/partner alone", modify
          label values v743f V743F
          label def V743F 1 "respondent alone", modify
          label def V743F 2 "respondent and husband/partner", modify
          label def V743F 4 "husband/partner alone", modify
          Last edited by Beri Parfait; 03 Nov 2024, 07:52. Reason: I made this edit after discovering the variable that I needed to use in conjunction with caseid to form unique IDs

          Comment


          • #6
            Beri,

            Are you looking at the mother level or their children level? Based on your data description, I guess you are using the DHS. If you are looking at the mother level, you then should you individual module (IR data). In the individual data,
            caseid should uniquely identify the observations. If you are using the children level, caseid and bord should uniquely identify the observations. So, given the data example in #5
            Code:
            isid caseid bord

            Comment


            • #7
              Dung Le 's response in #6 looks like the solution to O.P.'s problem. To actually create a variable that combines caseid and bord and can serve as an id in its own right you can do:
              Code:
              egen child_id = group(caseid bord)
              isid child_id
              A couple of points I think are worth remarking on:
              1. Knowing what the variables mean (as opposed to their cryptic names) is clearly the key to a quick solution. Knowing the nature of the data as a whole is important. Had I known that this was a dataset of information about children and that caseid identified their mother, and know that bord is a birth-order variable, I would have come up with this solution myself.
              2. O.P. came up with a false solution when saying that caseid and b2 would serve as a unique identifier. In fact, even in the example data posted in #5, this files, as -isid caseid b2- produces "variables caseid and b2 do not uniquely identify the observations." Knowing that b2 is the child's age, I think it was predictable that caseid and b2 would not jointly identify unique children because amother can have twins (or higher order multiple births), who are always the same age, or even two children who are born less than 1 year apart and are, at certain times of the year, the same age. This again points out the importance of knowing the meaning of the variables in figuring out how to manage data.

              Comment


              • #8

                Thank you for responding Dung Le and I apologise for the delayed response.

                Yes, I am using DHS data. I do not understand what you mean by
                Code:
                you should then your individual module (IR data).
                , but the proposed solution doesn't seem to improve the situation, although it appears to identify the observations uniquely. I am looking at parent level (their level of maternal education) and how it affects the child's health status. As predicted by Clyde Schechter in #7, I found a variable (b0) identifying the children. Some parents had twins and some parents had two births in a year. Therefore, caseid needs to be used in conjunction with b0 (individual children in the study) to come out with uniquely identifying observations.

                Clyde Schechter code in #7 successfully creates children's, but maybe another step is needed from there is further make case ids uniquely identify with information for each child and other conflicting information in the dataset as explained in #2


                Originally posted by Dung Le View Post
                Beri,

                Are you looking at the mother level or their children level? Based on your data description, I guess you are using the DHS. If you are looking at the mother level, you then should use individual module (IR data). In the individual data,
                caseid should uniquely identify the observations. If you are using the children level, caseid and bord should uniquely identify the observations. So, given the data example in #5
                Code:
                isid caseid bord


                Please, can someone suggest some additional resources I can read to understand how to navigate this part of data cleaning? (making ids unique for different sets of datasets. I seem to encounter this problem quite often.
                Last edited by Beri Parfait; 05 Nov 2024, 16:23.

                Comment


                • #9
                  Clyde Schechter code in #7 successfully creates children's, but maybe another step is needed from there is further make case ids uniquely identify with information for each child and other conflicting information in the dataset as explained in #2
                  I don't understand what you mean. If the combination of caseid and b0 uniquely identifies observations, then there is no possibility for conflicting information when there is only a single observation in focus.

                  Comment


                  • #10
                    but the proposed solution doesn't seem to improve the situation, although it appears to identify the observations uniquely
                    I don’t understand what you meant here. On the one hand, you said the code can be used to uniquely identify the observations but on the other hand you said the code didn’t give what you wanted. Is finding variables that uniquely identify the observations what you want?

                    When I say IR that means the individual module (women aged 15-49) in the DHS. You may know that a DHS contains several modules such as IR, BR, CR, and so on. Each module deals with different levels (e.g., mother level, household level or child level).

                    Therefore, caseid needs to be used in conjunction with b0 (individual children in the study) to come out with uniquely identifying observations.
                    Your understanding about the data structure is no correct. Since a mother can have more than one child, the variable that needs to go with caseid (mother id) is bord (birth order).

                    You can take a look at the DHS documentations to better understand the data structure. Which data modules are you using? BR (birth history) or CR (children under 5)?

                    Comment


                    • #11
                      Dear Clyde Schechter

                      Thank you for your patience with me.

                      I am looking at mother-level information, but the code you provided created child_id. Since a mother can have more than one child in a year, Dung Le suggested that I use birth order (board) in conjunction with caseid. Kindly see the revised data dataex with bord. All variables are defined in #3.


                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input str15 caseid int v001 byte(v101 bord v133 hw1) int(hw70 b11) byte hw57 int(hw71 hw72 hw73) byte(v745b v739)
                      "       1 125  2" 1 5  4  9 22 -255  62 4 -136  -13   38 0 1
                      "       1 125  3" 1 5  1  0  .    .   . .    .    .    . 0 .
                      "       1 147  2" 1 5  1 10  .    .   . .    .    .    . 0 4
                      "       1 191  3" 1 5  1  1  .    .   . .    .    .    . 0 .
                      "       1 198  1" 1 5  4  6 50   22  22 4  -38  -80  -81 0 1
                      "       1 252  2" 1 5  8  0  2 -292  34 . -250   41 -124 0 .
                      "       1 252  2" 1 5  7  0 36 -405  37 2 -244   12   57 0 .
                      "       1 310  3" 1 5  1  6  .    .   . .    .    .    . 0 .
                      "       1 334  2" 1 5  9  5 16 -214  46 3 -144  -58  -17 0 .
                      "       1 819  1" 1 5  4  4  .    .  28 .    .    .    . 2 1
                      "       1 819  1" 1 5  3  4  .    .  97 .    .    .    . 2 1
                      "       1 897  1" 1 5 13  0  2 -106  36 . -134  -71 -104 0 .
                      "       1 897  1" 1 5 12  0 39 -471  28 3 -221   87  167 0 .
                      "       11040  1" 1 5  2  3  .    . 115 .    .    .    . 0 .
                      "       11052  2" 1 5  1  3 11 -185   . 3 -156  -87  -64 2 .
                      "       11103  2" 1 5  7  0  .    .  33 .    .    .    . 0 1
                      "       11128  2" 1 5  4  0  3   75  27 .  118   98  102 0 1
                      "       11128  2" 1 5  3  0 31 -181  39 3  -31   96  125 0 1
                      "       11204  2" 1 5  5  6  5 -153  29 . -344 -352 -357 0 1
                      "       11204  2" 1 5  4  6 34  -68  22 4   90  184  194 0 1
                      "       11204  2" 1 5  3  6 56  -20  11 4   54   96  100 0 1
                      "       2  16  5" 2 4  2  8  .    .  31 .    .    .    . 0 .
                      "       2  16  5" 2 4  1  8  .    .   . .    .    .    . 0 .
                      "       2  16  7" 2 4  6  6  .    .  42 .    .    .    . 2 .
                      "       2  75  2" 2 4  5  4  .    .  27 .    .    .    . 2 2
                      "       2  75  2" 2 4  4  4  .    .  69 .    .    .    . 2 2
                      "       2 191  1" 2 4  5  6  .    .  44 .    .    .    . 1 1
                      "       2 240  4" 2 4  1  6 15 -345   . 2  -18  203  258 0 .
                      "       2 431  2" 2 4  6  6  .    .  14 .    .    .    . 0 2
                      "       2 431  2" 2 4  5  6  .    .  30 .    .    .    . 0 2
                      "       2 432  3" 2 4  6  5  1  -47  41 .  -75  -43  -73 0 1
                      "       2 432  3" 2 4  5  5  .    .  23 .    .    .    . 0 1
                      "       2 432  3" 2 4  4  5 43 -515  23 4 -282   49  135 0 1
                      "       2 608  2" 2 4  5  6 41  -37  39 3  -39  -24  -27 0 1
                      "       2 681  3" 2 4  2  9 30 -120  44 3  -18   64   84 0 1
                      "       2 689  2" 2 4  6  6  6  116  30 2   65   14    0 0 4
                      "       2 689  2" 2 4  5  6  .    .  30 .    .    .    . 0 4
                      "       2 757  3" 2 4  7  6  1 -257  31 .  103 9998  373 0 .
                      "       2 757  3" 2 4  6  6 32 -483  30 4 -229   82  158 0 .
                      "       2 757 12" 2 4  4  6 17 -145  23 4  -56   14   45 0 1
                      "       2 757 12" 2 4  3  6 41   21  42 4    2  -13  -20 0 1
                      "       2 757 18" 2 4  3  4 25 -367  27 2 -234  -42   19 0 1
                      "       2 757 18" 2 4  2  4 53 -289  30 2 -143   53   81 0 1
                      "       2 757 22" 2 4  4  9  1 -353  32 .  -92  342  162 0 .
                      "       2 757 22" 2 4  3  9 33 -403  28 3 -230    2   69 0 .
                      "       2 764  2" 2 4  4  7  .    .  27 .    .    .    . 2 1
                      "       2 764  2" 2 4  3  7  .    .  24 .    .    .    . 2 1
                      "       2 764  2" 2 4  2  7  .    .  26 .    .    .    . 2 1
                      "       2 905  2" 2 4  1 11  .    .   . .    .    .    . 0 2
                      "       21024  2" 2 4  7  2  .    .  52 .    .    .    . 2 2
                      "       21138  2" 2 4  7  5  .    .  43 .    .    .    . 0 1
                      "       21138  2" 2 4  6  5  .    .  27 .    .    .    . 0 1
                      "       21138  9" 2 4  5  5  .    .  27 .    .    .    . 0 1
                      "       21138  9" 2 4  4  5  .    .  24 .    .    .    . 0 1
                      "       21138  9" 2 4  3  5  .    .  26 .    .    .    . 0 1
                      "       3  62  3" 3 2 10  4  .    .  24 .    .    .    . 2 2
                      "       3  62  3" 3 2  9  4  .    .  40 .    .    .    . 2 2
                      "       3 191  2" 3 2  4  8 13 -177  30 4 -222 -187 -163 0 2
                      "       3 191  2" 3 2  3  8 43 9998  33 2 -173  361  463 0 2
                      "       3 368  5" 3 2  4  7 32  102  34 4   82   42   23 0 2
                      "       3 368  9" 3 2  2 13 14 -139  22 4  -12   67   95 0 .
                      "       3 368  9" 3 2  1 13 37  -81   . 4   79  183  192 0 .
                      "       3 369  2" 3 2 10  1 17 -295  70 3  -83   76  137 2 1
                      "       3 369  3" 3 2  1  5 30 -214   . 3 -157  -57  -29 0 .
                      "       3 478  2" 3 2  4  8 30 -260  23 3 -222 -111  -79 0 1
                      "       3 478  2" 3 2  3  8 53 -308  34 3 -219  -41  -20 0 1
                      "       3 478  8" 3 2  1 14  .    .   . .    .    .    . 0 .
                      "       3 552  2" 3 2  7  6 29  -46  84 4   48  100  110 0 .
                      "       3 552  3" 3 2  1  7 19 -253   . 3   -8  151  201 0 .
                      "       3 665  2" 3 2  1 12  .    .   . .    .    .    . 0 1
                      "       3 712  6" 3 2  2  8  .    .   . .    .    .    . 0 .
                      "       3 712  6" 3 2  1  8  .    .   . .    .    .    . 0 .
                      "       3 715 12" 3 2  3  3 41   -1  74 2   42   64   61 0 2
                      "       3 724  9" 3 2  8  5  .    .  48 .    .    .    . 0 .
                      "       3 756  5" 3 2  3  6 34 -236  42 4 -113   26   62 0 2
                      "       3 814  4" 3 2  1 10  .    .   . .    .    .    . 0 .
                      "       3 814 11" 3 2  1 11  .    .   . .    .    .    . 0 .
                      "       3 845  3" 3 2  4  5  .    .  27 .    .    .    . 0 2
                      "       3 845  3" 3 2  3  5  .    .  28 .    .    .    . 0 2
                      "       3 879  3" 3 2  1 11 32    6   . 2  106  148  146 0 .
                      "       3 896  3" 3 2  2  6 18 -155  23 4  -15   78  109 0 .
                      "       3 896  3" 3 2  1  6 42 -138   . 4   40  172  181 0 .
                      "       3 896  6" 3 2  3  6 27 -117  48 3   20  114  134 0 2
                      "       3 932  7" 3 2  1  6  .    .   . .    .    .    . 0 2
                      "       3 932  8" 3 2  1  9  .    .   . .    .    .    . 0 .
                      "       3 957  2" 3 2  1  8  7  157   . 4  179  145  126 0 1
                      "       31005  2" 3 2  3  6  .    .  47 .    .    .    . 0 2
                      "       31005  8" 3 2  4  8  .    .  29 .    .    .    . 1 .
                      "       31005  8" 3 2  3  8  .    .  53 .    .    .    . 1 .
                      "       31007  3" 3 2  3  7  9    6  16 4  -40  -53  -57 0 1
                      "       31007  3" 3 2  2  7  .    .  47 .    .    .    . 0 1
                      "       31007  4" 3 2  2  7  .    .  31 .    .    .    . 0 .
                      "       31007  4" 3 2  1  7 49 -166   . 4 -184 -124 -116 0 .
                      "       31038  3" 3 2  1 11  .    .   . .    .    .    . 0 2
                      "       31038  4" 3 2  3  9  .    .  58 .    .    .    . 0 1
                      "       31038 11" 3 2  1  6  .    .   . .    .    .    . 0 2
                      "       31038 12" 3 2  1  7  .    .   . .    .    .    . 0 .
                      "       31038 13" 3 2  6  7  .    .  66 .    .    .    . 0 .
                      "       31038 13" 3 2  5  7  .    .  66 .    .    .    . 0 .
                      "       31072  2" 3 2  1  9  .    .   . .    .    .    . 0 2
                      end
                      label values v101 V101
                      label def V101 2 "centre (without yaounde)", modify
                      label def V101 4 "east", modify
                      label def V101 5 "far-north", modify
                      label values v133 V133
                      label values hw70 HW70
                      label def HW70 9998 "flagged cases", modify
                      label values hw57 HW57
                      label def HW57 2 "moderate", modify
                      label def HW57 3 "mild", modify
                      label def HW57 4 "not anemic", modify
                      label values hw71 HW71
                      label values hw72 HW72
                      label def HW72 9998 "flagged cases", modify
                      label values hw73 HW73
                      label values v745b V745B
                      label def V745B 0 "does not own", modify
                      label def V745B 1 "alone only", modify
                      label def V745B 2 "jointly only", modify
                      label values v739 V739
                      label def V739 1 "respondent alone", modify
                      label def V739 2 "respondent and husband/partner", modify
                      label def V739 4 "husband/partner alone", modify


                      Originally posted by Clyde Schechter View Post
                      I don't understand what you mean. If the combination of caseid and b0 uniquely identifies observations, then there is no possibility for conflicting information when only a single observation is in focus.
                      Last edited by Beri Parfait; 10 Nov 2024, 15:35.

                      Comment


                      • #12
                        Originally posted by Dung Le View Post
                        Which data modules are you using? BR (birth history) or CR (children under 5)?
                        I am working with CR (children under 5) and my dependent variable is the mother's level of education (v133)

                        Comment


                        • #13
                          Re #11, as I said in #7, Dung Le got it right in #6. It is the combination of caseid and bord, not caseid and b2, that is a proper unique identifier of children in this data. Even in your original example data, caseid and b2 fail to uniquely identify children, and that is no surprise once we know that b2 is child age: there is no reason a mother can't have two different children of the same age. On the other hand, every mother's child has a different birth order. So -egen child_id = group(caseid bord)- will work.

                          Dung Le was able to take advantage of knowledge of this data set and what its variables mean to reach this conclusion. That was an advantage I did not have as I am not familiar with this data set and had to rely only on whatever was said about it in the thread.

                          Comment


                          • #14
                            Originally posted by Beri Parfait View Post

                            I am working with CR (children under 5) and my dependent variable is the mother's level of education (v133)
                            Beri,

                            Since you are examining the relationship between maternal education and their under-5 children health, I don't think you need to care too much about the variables that uniquely identify the observations. The reason is because the DHS selects only one under-5 child of the mother (let's call the index child) and asks the mother about the health status of the index child (e.g., anthropometry, fever, cough, or diarrhea ...). It means it does not matter how many children a mother may have, the DHS only collects information on the index child (that is one mother and one child). If this is what you are doing, you then just need to use the "IR" data module (the mother module), which contain maternal education (and other characteristics) and child health. In this case, caseid is the variable that uniquely identifies the mother.

                            If you want to look at the relationship between maternal education and child mortality, you then need to use the "BR" (the birth history of the mother module). In this case, caseid and bord are the variable that uniquely identifies the observations because a mother may have more than one deceased child and this module collects all of this information.

                            Hope this helps.

                            Comment

                            Working...
                            X