
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping only households with one wife and one husband

    // Generating indicators for husband, wife, son, and daughter
    gen is_wife = (sex == 3 & (relationtohead == 1 | relationtohead == 2))
    gen is_husband = (sex == 1 & (relationtohead == 1 | relationtohead == 2))
    gen is_daughter = (sex == 3 & (relationtohead == 3 | relationtohead == 4))
    gen is_son = (sex == 1 & (relationtohead == 3 | relationtohead == 4))

    // Checking the number of husbands and wives in each household and wave
    bysort household_id wave: egen wives_per_wave = total(is_wife)
    bysort household_id wave: egen husbands_per_wave = total(is_husband)

    // Determining if there is at most one husband and one wife in each household
    bys household_id: egen max_husbands = max(husbands_per_wave)
    bys household_id: egen max_wives = max(wives_per_wave)
    bys household_id: gen unique_husband = (max_husbands == 1)
    bys household_id: gen unique_wife = (max_wives == 1)

    // Keeping households with at most one husband and one wife
    bys household_id: keep if unique_husband == 1 & unique_wife == 1

    However, although when browsing the data it appears that there is only when husband and one wife (across various waves) for each household, when I do:

    distinct pidlink if is_wife == 1 and distinct pidlink if is_husband == 1, I still get more wives than husbands (regardless of waves given the distinct command. Thus, implying, that I have kept households where more than one wife exists).

    I provide a data example below:

    input str10(household_id person_id) float(wave is_husband is_wife)
    "0010651" "001060004" 5 0 1
    "0010651" "001065102" 5 1 0
    "0010651" "001065103" 5 0 0
    "0010651" "001065104" 5 0 0
    "0010851" "001080003" 5 1 0
    "0010851" "001080012" 5 0 1
    "0010851" "001085103" 5 0 0
    "0010851" "001085104" 5 0 0
    "0010851" "001085105" 5 0 0
    "0010851" "001085106" 5 0 0
    "0010851" "001085107" 5 0 0
    "0012200" "001220001" 1 1 0
    "0012200" "001220001" 2 1 0
    "0012200" "001220001" 3 1 0
    "0012200" "001220001" 4 1 0
    "0012200" "001220001" 5 1 0
    "0012200" "001220002" 1 0 1
    "0012200" "001220002" 2 0 1
    "0012200" "001220002" 3 0 1
    "0012200" "001220002" 4 0 1
    "0012200" "001220002" 5 0 1
    "0012200" "001220003" 1 0 0
    "0012200" "001220003" 2 0 0
    "0012200" "001220003" 3 0 0
    "0012241" "001220003" 4 1 0
    "0012241" "001220003" 5 1 0
    "0012200" "001220004" 1 0 0
    "0012200" "001220004" 2 0 0
    "0012200" "001220004" 3 0 0
    "0012200" "001220004" 4 0 0
    "0012200" "001220004" 5 0 0
    "0012200" "001220005" 1 0 0
    "0012200" "001220005" 2 0 0
    "0012200" "001220005" 3 0 0
    "0012200" "001220005" 4 0 0
    "0012200" "001220005" 5 0 0
    "0012200" "001220006" 1 0 0
    "0012200" "001220006" 2 0 0
    "0012200" "001220006" 3 0 0
    "0012200" "001220006" 4 0 0
    "0012200" "001220006" 5 0 0
    "0012200" "001220007" 1 0 0
    "0012200" "001220007" 2 0 0
    "0012200" "001220007" 3 0 0
    "0012200" "001220007" 4 0 0
    "0012200" "001220007" 5 0 0
    "0012200" "001220008" 1 0 0
    "0012200" "001220008" 2 0 0
    "0012200" "001220008" 3 0 0
    "0012200" "001220009" 1 0 0
    "0012200" "001220009" 2 0 0
    "0012200" "001220009" 3 0 0
    "0012200" "001220009" 4 0 0
    "0012200" "001220010" 1 0 0
    "0012200" "001220010" 2 0 0
    "0012200" "001220010" 3 0 0
    "0012200" "001220010" 4 0 0
    "0012200" "001220010" 5 0 0
    "0012241" "001220011" 4 0 1
    "0012241" "001220011" 5 0 1
    "0012241" "001220013" 4 0 0
    "0012241" "001220013" 5 0 0
    "0012200" "001220014" 4 0 0
    "0012200" "001220014" 5 0 0
    "0012200" "001220015" 4 0 0
    "0012200" "001220015" 5 0 0
    "0012241" "001224104" 4 0 0
    "0012241" "001224104" 5 0 0
    "0012241" "001224105" 4 0 0
    "0012241" "001224105" 5 0 0
    "0012241" "001224106" 5 0 0
    "0012400" "001240001" 1 1 0
    "0012400" "001240001" 2 1 0
    "0012400" "001240001" 5 1 0
    "0012400" "001240002" 1 0 1
    "0012400" "001240002" 2 0 1
    "0012400" "001240002" 5 0 1
    "0012400" "001240003" 1 0 0
    "0012400" "001240003" 2 0 0
    "0012400" "001240003" 5 0 0
    "0012400" "001240004" 1 0 0
    "0012400" "001240004" 2 0 0
    "0012400" "001240004" 5 0 0
    "0012400" "001240005" 1 0 0
    "0012400" "001240005" 2 0 0
    "0012451" "001240005" 5 0 1
    "0012400" "001240006" 1 0 0
    "0012400" "001240006" 2 0 0
    "0012400" "001240006" 5 0 0
    "0012400" "001240007" 1 0 0
    "0012400" "001240007" 2 0 0
    "0012400" "001240007" 5 0 0
    "0012400" "001240008" 1 0 0
    "0012400" "001240008" 2 0 0
    "0012400" "001240008" 5 0 0
    "0012400" "001240009" 1 0 0
    "0012400" "001240009" 2 0 0
    "0012452" "001240009" 5 0 1
    "0012400" "001240010" 1 0 0
    "0012400" "001240010" 2 0 0

    Thank you in advance,


  • #2
    Hi Enrique,

    Thank you for supplying some sample data.

    list if inlist(household, "0012451", "0012452")
    produces the following, indicating that your sample data contain two household-wave pairs where there is a wife but no husband.

         | househ~d   person_id   wave   is_hus~d   is_wife |
     86. |  0012451   001240005      5          0         1 |
     98. |  0012452   001240009      5          0         1 |
    When I run the following part of your code those two observations are dropped. Note that they satisfy the condition in the comment -- "at most one husband and one wife"-- though not the condition "exactly one husband and one wife" your code appears to check for.

    // Checking the number of husbands and wives in each household and wave
    bysort household_id wave: egen wives_per_wave = total(is_wife)
    bysort household_id wave: egen husbands_per_wave = total(is_husband)
    // Determining if there is at most one husband and one wife in each household
    bys household_id: egen max_husbands = max(husbands_per_wave)
    bys household_id: egen max_wives = max(wives_per_wave)
    bys household_id: gen unique_husband = (max_husbands == 1)
    bys household_id: gen unique_wife = (max_wives == 1)
    bys household_id: keep if unique_husband == 1 & unique_wife == 1
    pidlink is not defined, so I can't comment on that part of your code.

    Devra Golbe
    Professor Emerita, Dept. of Economics
    Hunter College, CUNY


    • #3
      drop if state=="Utah"


      • #4
        Originally posted by Devra Golbe View Post
        Hi Enrique,

        Thank you for supplying some sample data.

        list if inlist(household, "0012451", "0012452")
        produces the following, indicating that your sample data contain two household-wave pairs where there is a wife but no husband.

        | househ~d person_id wave is_hus~d is_wife |
        86. | 0012451 001240005 5 0 1 |
        98. | 0012452 001240009 5 0 1 |
        When I run the following part of your code those two observations are dropped. Note that they satisfy the condition in the comment -- "at most one husband and one wife"-- though not the condition "exactly one husband and one wife" your code appears to check for.

        // Checking the number of husbands and wives in each household and wave
        bysort household_id wave: egen wives_per_wave = total(is_wife)
        bysort household_id wave: egen husbands_per_wave = total(is_husband)
        // Determining if there is at most one husband and one wife in each household
        bys household_id: egen max_husbands = max(husbands_per_wave)
        bys household_id: egen max_wives = max(wives_per_wave)
        bys household_id: gen unique_husband = (max_husbands == 1)
        bys household_id: gen unique_wife = (max_wives == 1)
        bys household_id: keep if unique_husband == 1 & unique_wife == 1
        pidlink is not defined, so I can't comment on that part of your code.
        Hi Devra,

        Thank you so much for your help. Pidlink is actually person_id in the data example. Do you know how I could change the code to exactly one husband and one wife?

        Thanks a lot,



        • #5
          bys household wave: egen husbandwife = total(is_husband + is_wife)
          looking for a 2


          • #6
            Originally posted by George Ford View Post
            bys household wave: egen husbandwife = total(is_husband + is_wife)
            looking for a 2
            Hi George,

            Thanks a lot, this is a smart and elegant solution.

            However, some wives appear in certain waves where husbands do not (although both are present in the household). This causes the "distinct pidlink if is_husband == 1" to yield slightly different distinct values to "distinct pidlink if is_wife == 1" (i.e., there are still slightly more wives than husbands) upon keeping instances where husbandwife == 2. Do you know how I can fix this?

            Thanks a lot,

            Last edited by Enrique Alameda; Today, 16:43.


            • #7

              I think your code gives exactly one husband and one wife. Indeed, when I run George's code after the code I quoted in my post husbandwife =2 for all observations. And when I run

              distinct person_id if is_wife == 1
              distinct person_id if is_husb == 1
              I get identical results from each command.

              Perhaps you need to supply a different sample of observations in order to illustrate the issues you are having.

              Devra Golbe
              Professor Emerita, Dept. of Economics
              Hunter College, CUNY


              • #8
                Originally posted by Devra Golbe View Post

                I think your code gives exactly one husband and one wife. Indeed, when I run George's code after the code I quoted in my post husbandwife =2 for all observations. And when I run

                distinct person_id if is_wife == 1
                distinct person_id if is_husb == 1
                I get identical results from each command.

                Perhaps you need to supply a different sample of observations in order to illustrate the issues you are having.
                Hi Devra and George,

                Here is a data example prior to running mine or George's code. I have changed notation to make it easier to understand:

                input str10(pidlink hhid) float(wave husband wife)
                "272010001" "2720100" 1 1 0
                "272010001" "2720100" 2 1 0
                "272010001" "2720100" 3 1 0
                "272010001" "2720100" 4 1 0
                "272010001" "2720100" 5 1 0
                "272010002" "2720100" 1 0 1
                "272010002" "2720100" 2 0 1
                "272010002" "2720100" 3 0 1
                "272010002" "2720100" 4 0 1
                "272010002" "2720151" 5 0 0
                "272010003" "2720100" 1 0 0
                "272010003" "2720100" 2 0 0
                "272010003" "2720100" 3 0 0
                "272010003" "2720100" 4 0 0
                "272010003" "2720100" 5 0 0
                "272010004" "2720100" 2 0 0
                "272010004" "2720100" 3 0 0
                "272010004" "2720100" 4 0 0
                "272010004" "2720100" 5 0 0
                "272010005" "2720100" 5 0 0
                "272010006" "2720100" 5 0 1
                "272020001" "2720200" 1 1 0
                "272020001" "2720211" 2 1 0
                "272020001" "2720211" 3 1 0
                "272020001" "2720211" 4 1 0
                "272020001" "2720211" 5 1 0
                "272020002" "2720200" 1 0 1
                "272020002" "2720211" 2 0 1
                "272020002" "2720211" 3 0 1
                "272020002" "2720211" 4 0 1
                "272020002" "2720211" 5 0 1
                "272020003" "2720200" 1 0 0
                "272020003" "2720211" 2 0 0
                "272020003" "2720211" 3 0 0
                "272020003" "2720211" 4 0 0
                "272020003" "2720211" 5 0 0
                "272020004" "2720200" 2 1 0
                "272020004" "2720200" 3 1 0
                "272020004" "2720200" 4 1 0
                "272020004" "2720200" 5 1 0
                "272020005" "2720200" 2 0 1
                "272020005" "2720200" 3 0 1
                "272020005" "2720200" 4 0 1
                "272020005" "2720200" 5 0 0
                "272021104" "2720211" 2 0 0
                "272021104" "2720211" 3 0 0
                "272021104" "2720211" 4 0 0
                "272021104" "2720211" 5 0 0
                "272030001" "2720300" 1 1 0
                "272030001" "2720300" 2 1 0
                "272030001" "2720300" 3 1 0
                "272030001" "2720300" 4 1 0
                "272030001" "2720300" 5 1 0
                "272030002" "2720300" 1 0 1
                "272030002" "2720300" 2 0 1
                "272030002" "2720300" 3 0 1
                "272030002" "2720300" 4 0 1
                "272030002" "2720300" 5 0 1
                "272030003" "2720300" 1 0 0
                "272030003" "2720300" 2 0 0
                "272030003" "2720300" 3 0 0
                "272030003" "2720341" 4 0 1
                "272030004" "2720300" 2 0 0
                "272030004" "2720300" 3 0 0
                "272030004" "2720300" 4 0 0
                "272030004" "2720300" 5 0 0
                "272030005" "2720300" 3 0 0
                "272030005" "2720300" 4 0 0
                "272030005" "2720300" 5 0 0
                "272034101" "2720341" 4 1 0
                "272034101" "2720341" 5 0 0
                "272034103" "2720341" 4 0 0
                "272034104" "2720341" 5 1 0
                "272040001" "2720400" 1 1 0
                "272040001" "2720400" 2 1 0
                "272040001" "2720400" 3 1 0
                "272040001" "2720400" 4 1 0
                "272040001" "2720400" 5 1 0
                "272040002" "2720400" 1 0 1
                "272040002" "2720400" 2 0 1
                "272040002" "2720400" 3 0 1
                "272040002" "2720400" 4 0 1
                "272040002" "2720400" 5 0 1
                "272040003" "2720400" 1 0 0
                "272040003" "2720400" 2 0 0
                "272040003" "2720400" 3 0 0
                "272040003" "2720400" 4 0 0
                "272040003" "2720451" 5 1 0
                "272040004" "2720400" 1 0 0
                "272040004" "2720400" 2 0 0
                "272040004" "2720400" 3 0 0
                "272040004" "2720400" 4 0 0
                "272040004" "2720400" 5 0 0
                "272040005" "2720400" 4 0 0
                "272040005" "2720400" 5 0 0
                "272050001" "2720500" 1 1 0
                "272050001" "2720500" 2 1 0
                "272050001" "2720500" 3 1 0
                "272050001" "2720500" 4 1 0
                "272050001" "2720500" 5 1 0

                Upon running the code, if I do:

                keep if husbandwife == 2
                (41,234 observations deleted)

                And then the distinct:

                . distinct pidlink if wife == 1

                | Observations
                | total distinct
                pidlink | 39619 15910

                . distinct pidlink if husband == 1

                | Observations
                | total distinct
                pidlink | 40025 16008

                As you can see there is a different number of husband and wives. I provide a data example following this step below:

                input str10(pidlink hhid) float(wave husband wife)
                "001060005" "0010600" 1 0 0
                "001060006" "0010600" 1 0 0
                "001060001" "0010600" 1 1 0
                "001060002" "0010600" 1 0 1
                "001060004" "0010600" 1 0 0
                "001060003" "0010600" 1 0 0
                "001060004" "0010600" 2 0 0
                "001060002" "0010600" 2 0 1
                "001060006" "0010600" 2 0 0
                "001060005" "0010600" 2 0 0
                "001060003" "0010600" 2 0 0
                "001060001" "0010600" 2 1 0
                "001065104" "0010651" 5 0 0
                "001060004" "0010651" 5 0 1
                "001065102" "0010651" 5 1 0
                "001065103" "0010651" 5 0 0
                "001080008" "0010800" 1 0 0
                "001080006" "0010800" 1 0 0
                "001080003" "0010800" 1 0 0
                "001080005" "0010800" 1 0 0
                "001080007" "0010800" 1 0 0
                "001080009" "0010800" 1 0 0
                "001080002" "0010800" 1 0 1
                "001080010" "0010800" 1 0 0
                "001080004" "0010800" 1 0 0
                "001080001" "0010800" 1 1 0
                "001080009" "0010800" 2 0 0
                "001080008" "0010800" 2 0 0
                "001080010" "0010800" 2 0 0
                "001080001" "0010800" 2 1 0
                "001080004" "0010800" 2 0 0
                "001080006" "0010800" 2 0 0
                "001080005" "0010800" 2 0 0
                "001080003" "0010800" 2 0 0
                "001080007" "0010800" 2 0 0
                "001080002" "0010800" 2 0 1
                "001085107" "0010851" 5 0 0
                "001085104" "0010851" 5 0 0
                "001085105" "0010851" 5 0 0
                "001080012" "0010851" 5 0 1
                "001080003" "0010851" 5 1 0
                "001085106" "0010851" 5 0 0
                "001085103" "0010851" 5 0 0
                "001220009" "0012200" 1 0 0
                "001220008" "0012200" 1 0 0
                "001220002" "0012200" 1 0 1
                "001220001" "0012200" 1 1 0
                "001220003" "0012200" 1 0 0
                "001220004" "0012200" 1 0 0
                "001220006" "0012200" 1 0 0
                "001220005" "0012200" 1 0 0
                "001220010" "0012200" 1 0 0
                "001220007" "0012200" 1 0 0
                "001220010" "0012200" 2 0 0
                "001220001" "0012200" 2 1 0
                "001220004" "0012200" 2 0 0
                "001220002" "0012200" 2 0 1
                "001220008" "0012200" 2 0 0
                "001220006" "0012200" 2 0 0
                "001220009" "0012200" 2 0 0
                "001220007" "0012200" 2 0 0
                "001220005" "0012200" 2 0 0
                "001220003" "0012200" 2 0 0
                "001220009" "0012200" 3 0 0
                "001220002" "0012200" 3 0 1
                "001220003" "0012200" 3 0 0
                "001220007" "0012200" 3 0 0
                "001220001" "0012200" 3 1 0
                "001220005" "0012200" 3 0 0
                "001220008" "0012200" 3 0 0
                "001220010" "0012200" 3 0 0
                "001220006" "0012200" 3 0 0
                "001220004" "0012200" 3 0 0
                "001220004" "0012200" 4 0 0
                "001220007" "0012200" 4 0 0
                "001220006" "0012200" 4 0 0
                "001220001" "0012200" 4 1 0
                "001220005" "0012200" 4 0 0
                "001220009" "0012200" 4 0 0
                "001220002" "0012200" 4 0 1
                "001220014" "0012200" 4 0 0
                "001220015" "0012200" 4 0 0
                "001220010" "0012200" 4 0 0
                "001220004" "0012200" 5 0 0
                "001220006" "0012200" 5 0 0
                "001220007" "0012200" 5 0 0
                "001220014" "0012200" 5 0 0
                "001220002" "0012200" 5 0 1
                "001220005" "0012200" 5 0 0
                "001220015" "0012200" 5 0 0
                "001220001" "0012200" 5 1 0
                "001220010" "0012200" 5 0 0
                "001220003" "0012241" 4 1 0
                "001224104" "0012241" 4 0 0
                "001224105" "0012241" 4 0 0
                "001220011" "0012241" 4 0 1
                "001220013" "0012241" 4 0 0
                "001220003" "0012241" 5 1 0
                "001220013" "0012241" 5 0 0
                "001224105" "0012241" 5 0 0

                I am grateful for any help, as I cannot comprehend my mistake.

                Thank you in advance,



                • #9
                  I get the same with this.
                  bys hhid wave: egen husbandwife = total(husband + wife)
                  . keep if husbandwife==2
                  (3 observations deleted)
                  . distinct pidlink if wife == 1
                  total   distinct
                  pidlink         12          6
                  . distinct pidlink if husband ==    1
                  total   distinct
                  pidlink         12          6
                  . distinct hhid if wife == 1
                  total   distinct
                  hhid         12          6
                  . distinct hhid if husband == 1
                  total   distinct
                  hhid         12          6


                  • #10
                    If the problem is missing data, then any solution is going to be a bit of a kluge.

                    One option is to assume that if it's 2 2 1 2 2, then it should be a 2 in between due to missing values. I'd have more confidence if the household member count followed the same pattern.

                    bys hhid wave: egen hhsize = count(hhid)


                    • #11
                      Originally posted by George Ford View Post
                      If the problem is missing data, then any solution is going to be a bit of a kluge.

                      One option is to assume that if it's 2 2 1 2 2, then it should be a 2 in between due to missing values. I'd have more confidence if the household member count followed the same pattern.

                      bys hhid wave: egen hhsize = count(hhid)
                      Hi George,

                      Thanks a lot for all your help.

                      Here is the output from your code. Note that the dataset also contains sons and daughters of these couples (i.e., to identify child-wife pairs):

                      bys hhid wave: egen hhsize = count(hhid)

                      end of do-file

                      . ta hhsize

                      hhsize | Freq. Percent Cum.
                      2 | 10,314 5.96 5.96
                      3 | 34,371 19.88 25.84
                      4 | 48,292 27.93 53.77
                      5 | 36,365 21.03 74.80
                      6 | 21,594 12.49 87.28
                      7 | 11,137 6.44 93.72
                      8 | 6,056 3.50 97.23
                      9 | 2,943 1.70 98.93
                      10 | 1,160 0.67 99.60
                      11 | 495 0.29 99.88
                      12 | 132 0.08 99.96
                      13 | 52 0.03 99.99
                      15 | 15 0.01 100.00
                      Total | 172,926 100.00


                      • #12
                        This might get you started. I've dropped the middle observation of wife for hhid==0012200 to create some interest.

                        I think the kids condition may be redundant in markreplace.

                        The math to address a change in kids should be easy enough to implement, but this might tell you how big the problem is.

                        input str10(pidlink hhid) float(wave husband wife)
                        "001060005" "0010600" 1 0 0
                        "001060006" "0010600" 1 0 0
                        "001060001" "0010600" 1 1 0
                        "001060002" "0010600" 1 0 1
                        "001060004" "0010600" 1 0 0
                        "001060003" "0010600" 1 0 0
                        "001060004" "0010600" 2 0 0
                        "001060002" "0010600" 2 0 1
                        "001060006" "0010600" 2 0 0
                        "001060005" "0010600" 2 0 0
                        "001060003" "0010600" 2 0 0
                        "001060001" "0010600" 2 1 0
                        "001065104" "0010651" 5 0 0
                        "001060004" "0010651" 5 0 1
                        "001065102" "0010651" 5 1 0
                        "001065103" "0010651" 5 0 0
                        "001080008" "0010800" 1 0 0
                        "001080006" "0010800" 1 0 0
                        "001080003" "0010800" 1 0 0
                        "001080005" "0010800" 1 0 0
                        "001080007" "0010800" 1 0 0
                        "001080009" "0010800" 1 0 0
                        "001080002" "0010800" 1 0 1
                        "001080010" "0010800" 1 0 0
                        "001080004" "0010800" 1 0 0
                        "001080001" "0010800" 1 1 0
                        "001080009" "0010800" 2 0 0
                        "001080008" "0010800" 2 0 0
                        "001080010" "0010800" 2 0 0
                        "001080001" "0010800" 2 1 0
                        "001080004" "0010800" 2 0 0
                        "001080006" "0010800" 2 0 0
                        "001080005" "0010800" 2 0 0
                        "001080003" "0010800" 2 0 0
                        "001080007" "0010800" 2 0 0
                        "001080002" "0010800" 2 0 1
                        "001085107" "0010851" 5 0 0
                        "001085104" "0010851" 5 0 0
                        "001085105" "0010851" 5 0 0
                        "001080012" "0010851" 5 0 1
                        "001080003" "0010851" 5 1 0
                        "001085106" "0010851" 5 0 0
                        "001085103" "0010851" 5 0 0
                        "001220009" "0012200" 1 0 0
                        "001220008" "0012200" 1 0 0
                        "001220002" "0012200" 1 0 1
                        "001220001" "0012200" 1 1 0
                        "001220003" "0012200" 1 0 0
                        "001220004" "0012200" 1 0 0
                        "001220006" "0012200" 1 0 0
                        "001220005" "0012200" 1 0 0
                        "001220010" "0012200" 1 0 0
                        "001220007" "0012200" 1 0 0
                        "001220010" "0012200" 2 0 0
                        "001220001" "0012200" 2 1 0
                        "001220004" "0012200" 2 0 0
                        "001220002" "0012200" 2 0 1
                        "001220008" "0012200" 2 0 0
                        "001220006" "0012200" 2 0 0
                        "001220009" "0012200" 2 0 0
                        "001220007" "0012200" 2 0 0
                        "001220005" "0012200" 2 0 0
                        "001220003" "0012200" 2 0 0
                        "001220009" "0012200" 3 0 0
                        /*"001220002" "0012200" 3 0 1*/
                        "001220003" "0012200" 3 0 0
                        "001220007" "0012200" 3 0 0
                        "001220001" "0012200" 3 1 0
                        "001220005" "0012200" 3 0 0
                        "001220008" "0012200" 3 0 0
                        "001220010" "0012200" 3 0 0
                        "001220006" "0012200" 3 0 0
                        "001220004" "0012200" 3 0 0
                        "001220004" "0012200" 4 0 0
                        "001220007" "0012200" 4 0 0
                        "001220006" "0012200" 4 0 0
                        "001220001" "0012200" 4 1 0
                        "001220005" "0012200" 4 0 0
                        "001220009" "0012200" 4 0 0
                        "001220002" "0012200" 4 0 1
                        "001220014" "0012200" 4 0 0
                        "001220015" "0012200" 4 0 0
                        "001220010" "0012200" 4 0 0
                        "001220004" "0012200" 5 0 0
                        "001220006" "0012200" 5 0 0
                        "001220007" "0012200" 5 0 0
                        "001220014" "0012200" 5 0 0
                        "001220002" "0012200" 5 0 1
                        "001220005" "0012200" 5 0 0
                        "001220015" "0012200" 5 0 0
                        "001220001" "0012200" 5 1 0
                        "001220010" "0012200" 5 0 0
                        "001220003" "0012241" 4 1 0
                        "001224104" "0012241" 4 0 0
                        "001224105" "0012241" 4 0 0
                        "001220011" "0012241" 4 0 1
                        "001220013" "0012241" 4 0 0
                        "001220003" "0012241" 5 1 0
                        "001220013" "0012241" 5 0 0
                        "001224105" "0012241" 5 0 0
                        bys hhid wave: egen husbandwife = total(husband + wife)
                        bys hhid wave: egen hhsize = count(hhid)
                        g kids = hhsize - husbandwife
                        bys hhid: egen max_husbandwife = max(husbandwife)
                        keep if max_husbandwife==2 // delete cases not desired
                        bys hhid: egen max_hhsize = max(hhsize)
                        bys hhid: egen max_kids = max(kids)
                        bys hhid: egen min_wave = min(wave)
                        bys hhid: egen max_wave = max(wave)
                        g markreplace = (husbandwife - max_husbandwife)==-1 & (hhsize - max_hhsize)==-1 & (kids==max_kids) & (wave != min_wave) & (wave != !max_wave)
                        * wave included as condition since you don't have data to either side to confirm.
                        g markcheck = (husbandwife - max_husbandwife)==-1 & (hhsize - max_hhsize)==-1 & (kids!=max_kids)
                        replace husbandwife = 2 if markreplace


                        • #13

                          The data sample you provide cannot be the data sample on which you ran your code (> 40,000 observations deleted.) Try to extract a small set of data which produces the phenomenon that concerns you: more distinct wives than husbands.

                          I suspect also that you will get better answers if you describe your data more fully. I guess that you are using some kind of household sample which is resurveyed multiple times (waves.) I don't know anything about this sample-- even how a household is defined. I would not be surprised if the number of children changes over time in the same HH ID. But suppose a head of household changes spouses. Does the HH ID change or stay the same? If the spouse who is not designated head of HH gets a new spouse does he drop out of the survey or get a new HH ID? I'm not certain, given the possible changes in HH composition, whether the numbers of distinct husbands & wives should be the same.
                          Devra Golbe
                          Professor Emerita, Dept. of Economics
                          Hunter College, CUNY

