Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keeping only households with one wife and one husband

    // Generating indicators for husband, wife, son, and daughter
    gen is_wife = (sex == 3 & (relationtohead == 1 | relationtohead == 2))
    gen is_husband = (sex == 1 & (relationtohead == 1 | relationtohead == 2))
    gen is_daughter = (sex == 3 & (relationtohead == 3 | relationtohead == 4))
    gen is_son = (sex == 1 & (relationtohead == 3 | relationtohead == 4))

    // Checking the number of husbands and wives in each household and wave
    bysort household_id wave: egen wives_per_wave = total(is_wife)
    bysort household_id wave: egen husbands_per_wave = total(is_husband)

    // Determining if there is at most one husband and one wife in each household
    bys household_id: egen max_husbands = max(husbands_per_wave)
    bys household_id: egen max_wives = max(wives_per_wave)
    bys household_id: gen unique_husband = (max_husbands == 1)
    bys household_id: gen unique_wife = (max_wives == 1)

    // Keeping households with at most one husband and one wife
    bys household_id: keep if unique_husband == 1 & unique_wife == 1

    However, although when browsing the data it appears that there is only when husband and one wife (across various waves) for each household, when I do:

    distinct pidlink if is_wife == 1 and distinct pidlink if is_husband == 1, I still get more wives than husbands (regardless of waves given the distinct command. Thus, implying, that I have kept households where more than one wife exists).

    I provide a data example below:

    clear
    input str10(household_id person_id) float(wave is_husband is_wife)
    "0010651" "001060004" 5 0 1
    "0010651" "001065102" 5 1 0
    "0010651" "001065103" 5 0 0
    "0010651" "001065104" 5 0 0
    "0010851" "001080003" 5 1 0
    "0010851" "001080012" 5 0 1
    "0010851" "001085103" 5 0 0
    "0010851" "001085104" 5 0 0
    "0010851" "001085105" 5 0 0
    "0010851" "001085106" 5 0 0
    "0010851" "001085107" 5 0 0
    "0012200" "001220001" 1 1 0
    "0012200" "001220001" 2 1 0
    "0012200" "001220001" 3 1 0
    "0012200" "001220001" 4 1 0
    "0012200" "001220001" 5 1 0
    "0012200" "001220002" 1 0 1
    "0012200" "001220002" 2 0 1
    "0012200" "001220002" 3 0 1
    "0012200" "001220002" 4 0 1
    "0012200" "001220002" 5 0 1
    "0012200" "001220003" 1 0 0
    "0012200" "001220003" 2 0 0
    "0012200" "001220003" 3 0 0
    "0012241" "001220003" 4 1 0
    "0012241" "001220003" 5 1 0
    "0012200" "001220004" 1 0 0
    "0012200" "001220004" 2 0 0
    "0012200" "001220004" 3 0 0
    "0012200" "001220004" 4 0 0
    "0012200" "001220004" 5 0 0
    "0012200" "001220005" 1 0 0
    "0012200" "001220005" 2 0 0
    "0012200" "001220005" 3 0 0
    "0012200" "001220005" 4 0 0
    "0012200" "001220005" 5 0 0
    "0012200" "001220006" 1 0 0
    "0012200" "001220006" 2 0 0
    "0012200" "001220006" 3 0 0
    "0012200" "001220006" 4 0 0
    "0012200" "001220006" 5 0 0
    "0012200" "001220007" 1 0 0
    "0012200" "001220007" 2 0 0
    "0012200" "001220007" 3 0 0
    "0012200" "001220007" 4 0 0
    "0012200" "001220007" 5 0 0
    "0012200" "001220008" 1 0 0
    "0012200" "001220008" 2 0 0
    "0012200" "001220008" 3 0 0
    "0012200" "001220009" 1 0 0
    "0012200" "001220009" 2 0 0
    "0012200" "001220009" 3 0 0
    "0012200" "001220009" 4 0 0
    "0012200" "001220010" 1 0 0
    "0012200" "001220010" 2 0 0
    "0012200" "001220010" 3 0 0
    "0012200" "001220010" 4 0 0
    "0012200" "001220010" 5 0 0
    "0012241" "001220011" 4 0 1
    "0012241" "001220011" 5 0 1
    "0012241" "001220013" 4 0 0
    "0012241" "001220013" 5 0 0
    "0012200" "001220014" 4 0 0
    "0012200" "001220014" 5 0 0
    "0012200" "001220015" 4 0 0
    "0012200" "001220015" 5 0 0
    "0012241" "001224104" 4 0 0
    "0012241" "001224104" 5 0 0
    "0012241" "001224105" 4 0 0
    "0012241" "001224105" 5 0 0
    "0012241" "001224106" 5 0 0
    "0012400" "001240001" 1 1 0
    "0012400" "001240001" 2 1 0
    "0012400" "001240001" 5 1 0
    "0012400" "001240002" 1 0 1
    "0012400" "001240002" 2 0 1
    "0012400" "001240002" 5 0 1
    "0012400" "001240003" 1 0 0
    "0012400" "001240003" 2 0 0
    "0012400" "001240003" 5 0 0
    "0012400" "001240004" 1 0 0
    "0012400" "001240004" 2 0 0
    "0012400" "001240004" 5 0 0
    "0012400" "001240005" 1 0 0
    "0012400" "001240005" 2 0 0
    "0012451" "001240005" 5 0 1
    "0012400" "001240006" 1 0 0
    "0012400" "001240006" 2 0 0
    "0012400" "001240006" 5 0 0
    "0012400" "001240007" 1 0 0
    "0012400" "001240007" 2 0 0
    "0012400" "001240007" 5 0 0
    "0012400" "001240008" 1 0 0
    "0012400" "001240008" 2 0 0
    "0012400" "001240008" 5 0 0
    "0012400" "001240009" 1 0 0
    "0012400" "001240009" 2 0 0
    "0012452" "001240009" 5 0 1
    "0012400" "001240010" 1 0 0
    "0012400" "001240010" 2 0 0

    Thank you in advance,

    Enrique

  • #2
    Hi Enrique,

    Thank you for supplying some sample data.

    Code:
    list if inlist(household, "0012451", "0012452")
    produces the following, indicating that your sample data contain two household-wave pairs where there is a wife but no husband.

    Code:
         +--------------------------------------------------+
         | househ~d   person_id   wave   is_hus~d   is_wife |
         |--------------------------------------------------|
     86. |  0012451   001240005      5          0         1 |
     98. |  0012452   001240009      5          0         1 |
         +--------------------------------------------------+
    When I run the following part of your code those two observations are dropped. Note that they satisfy the condition in the comment -- "at most one husband and one wife"-- though not the condition "exactly one husband and one wife" your code appears to check for.

    Code:
    // Checking the number of husbands and wives in each household and wave
    bysort household_id wave: egen wives_per_wave = total(is_wife)
    bysort household_id wave: egen husbands_per_wave = total(is_husband)
    
    // Determining if there is at most one husband and one wife in each household
    bys household_id: egen max_husbands = max(husbands_per_wave)
    bys household_id: egen max_wives = max(wives_per_wave)
    bys household_id: gen unique_husband = (max_husbands == 1)
    bys household_id: gen unique_wife = (max_wives == 1)
    
    bys household_id: keep if unique_husband == 1 & unique_wife == 1
    pidlink is not defined, so I can't comment on that part of your code.

    Devra Golbe
    Professor Emerita, Dept. of Economics
    Hunter College, CUNY

    Comment


    • #3
      drop if state=="Utah"

      Comment


      • #4
        Originally posted by Devra Golbe View Post
        Hi Enrique,

        Thank you for supplying some sample data.

        Code:
        list if inlist(household, "0012451", "0012452")
        produces the following, indicating that your sample data contain two household-wave pairs where there is a wife but no husband.

        Code:
        +--------------------------------------------------+
        | househ~d person_id wave is_hus~d is_wife |
        |--------------------------------------------------|
        86. | 0012451 001240005 5 0 1 |
        98. | 0012452 001240009 5 0 1 |
        +--------------------------------------------------+
        When I run the following part of your code those two observations are dropped. Note that they satisfy the condition in the comment -- "at most one husband and one wife"-- though not the condition "exactly one husband and one wife" your code appears to check for.

        Code:
        // Checking the number of husbands and wives in each household and wave
        bysort household_id wave: egen wives_per_wave = total(is_wife)
        bysort household_id wave: egen husbands_per_wave = total(is_husband)
        
        // Determining if there is at most one husband and one wife in each household
        bys household_id: egen max_husbands = max(husbands_per_wave)
        bys household_id: egen max_wives = max(wives_per_wave)
        bys household_id: gen unique_husband = (max_husbands == 1)
        bys household_id: gen unique_wife = (max_wives == 1)
        
        bys household_id: keep if unique_husband == 1 & unique_wife == 1
        pidlink is not defined, so I can't comment on that part of your code.
        Hi Devra,

        Thank you so much for your help. Pidlink is actually person_id in the data example. Do you know how I could change the code to exactly one husband and one wife?

        Thanks a lot,

        Enrique

        Comment


        • #5
          Code:
          bys household wave: egen husbandwife = total(is_husband + is_wife)
          looking for a 2

          Comment


          • #6
            Originally posted by George Ford View Post
            Code:
            bys household wave: egen husbandwife = total(is_husband + is_wife)
            looking for a 2
            Hi George,

            Thanks a lot, this is a smart and elegant solution.

            However, some wives appear in certain waves where husbands do not (although both are present in the household). This causes the "distinct pidlink if is_husband == 1" to yield slightly different distinct values to "distinct pidlink if is_wife == 1" (i.e., there are still slightly more wives than husbands) upon keeping instances where husbandwife == 2. Do you know how I can fix this?

            Thanks a lot,

            Enrique.
            Last edited by Enrique Alameda; Today, 16:43.

            Comment


            • #7
              Enrique,

              I think your code gives exactly one husband and one wife. Indeed, when I run George's code after the code I quoted in my post husbandwife =2 for all observations. And when I run

              Code:
              distinct person_id if is_wife == 1
              distinct person_id if is_husb == 1
              I get identical results from each command.

              Perhaps you need to supply a different sample of observations in order to illustrate the issues you are having.

              Devra Golbe
              Professor Emerita, Dept. of Economics
              Hunter College, CUNY

              Comment


              • #8
                Originally posted by Devra Golbe View Post
                Enrique,

                I think your code gives exactly one husband and one wife. Indeed, when I run George's code after the code I quoted in my post husbandwife =2 for all observations. And when I run

                Code:
                distinct person_id if is_wife == 1
                distinct person_id if is_husb == 1
                I get identical results from each command.

                Perhaps you need to supply a different sample of observations in order to illustrate the issues you are having.
                Hi Devra and George,

                Here is a data example prior to running mine or George's code. I have changed notation to make it easier to understand:

                clear
                input str10(pidlink hhid) float(wave husband wife)
                "272010001" "2720100" 1 1 0
                "272010001" "2720100" 2 1 0
                "272010001" "2720100" 3 1 0
                "272010001" "2720100" 4 1 0
                "272010001" "2720100" 5 1 0
                "272010002" "2720100" 1 0 1
                "272010002" "2720100" 2 0 1
                "272010002" "2720100" 3 0 1
                "272010002" "2720100" 4 0 1
                "272010002" "2720151" 5 0 0
                "272010003" "2720100" 1 0 0
                "272010003" "2720100" 2 0 0
                "272010003" "2720100" 3 0 0
                "272010003" "2720100" 4 0 0
                "272010003" "2720100" 5 0 0
                "272010004" "2720100" 2 0 0
                "272010004" "2720100" 3 0 0
                "272010004" "2720100" 4 0 0
                "272010004" "2720100" 5 0 0
                "272010005" "2720100" 5 0 0
                "272010006" "2720100" 5 0 1
                "272020001" "2720200" 1 1 0
                "272020001" "2720211" 2 1 0
                "272020001" "2720211" 3 1 0
                "272020001" "2720211" 4 1 0
                "272020001" "2720211" 5 1 0
                "272020002" "2720200" 1 0 1
                "272020002" "2720211" 2 0 1
                "272020002" "2720211" 3 0 1
                "272020002" "2720211" 4 0 1
                "272020002" "2720211" 5 0 1
                "272020003" "2720200" 1 0 0
                "272020003" "2720211" 2 0 0
                "272020003" "2720211" 3 0 0
                "272020003" "2720211" 4 0 0
                "272020003" "2720211" 5 0 0
                "272020004" "2720200" 2 1 0
                "272020004" "2720200" 3 1 0
                "272020004" "2720200" 4 1 0
                "272020004" "2720200" 5 1 0
                "272020005" "2720200" 2 0 1
                "272020005" "2720200" 3 0 1
                "272020005" "2720200" 4 0 1
                "272020005" "2720200" 5 0 0
                "272021104" "2720211" 2 0 0
                "272021104" "2720211" 3 0 0
                "272021104" "2720211" 4 0 0
                "272021104" "2720211" 5 0 0
                "272030001" "2720300" 1 1 0
                "272030001" "2720300" 2 1 0
                "272030001" "2720300" 3 1 0
                "272030001" "2720300" 4 1 0
                "272030001" "2720300" 5 1 0
                "272030002" "2720300" 1 0 1
                "272030002" "2720300" 2 0 1
                "272030002" "2720300" 3 0 1
                "272030002" "2720300" 4 0 1
                "272030002" "2720300" 5 0 1
                "272030003" "2720300" 1 0 0
                "272030003" "2720300" 2 0 0
                "272030003" "2720300" 3 0 0
                "272030003" "2720341" 4 0 1
                "272030004" "2720300" 2 0 0
                "272030004" "2720300" 3 0 0
                "272030004" "2720300" 4 0 0
                "272030004" "2720300" 5 0 0
                "272030005" "2720300" 3 0 0
                "272030005" "2720300" 4 0 0
                "272030005" "2720300" 5 0 0
                "272034101" "2720341" 4 1 0
                "272034101" "2720341" 5 0 0
                "272034103" "2720341" 4 0 0
                "272034104" "2720341" 5 1 0
                "272040001" "2720400" 1 1 0
                "272040001" "2720400" 2 1 0
                "272040001" "2720400" 3 1 0
                "272040001" "2720400" 4 1 0
                "272040001" "2720400" 5 1 0
                "272040002" "2720400" 1 0 1
                "272040002" "2720400" 2 0 1
                "272040002" "2720400" 3 0 1
                "272040002" "2720400" 4 0 1
                "272040002" "2720400" 5 0 1
                "272040003" "2720400" 1 0 0
                "272040003" "2720400" 2 0 0
                "272040003" "2720400" 3 0 0
                "272040003" "2720400" 4 0 0
                "272040003" "2720451" 5 1 0
                "272040004" "2720400" 1 0 0
                "272040004" "2720400" 2 0 0
                "272040004" "2720400" 3 0 0
                "272040004" "2720400" 4 0 0
                "272040004" "2720400" 5 0 0
                "272040005" "2720400" 4 0 0
                "272040005" "2720400" 5 0 0
                "272050001" "2720500" 1 1 0
                "272050001" "2720500" 2 1 0
                "272050001" "2720500" 3 1 0
                "272050001" "2720500" 4 1 0
                "272050001" "2720500" 5 1 0
                end


                Upon running the code, if I do:

                keep if husbandwife == 2
                (41,234 observations deleted)

                And then the distinct:

                . distinct pidlink if wife == 1

                | Observations
                | total distinct
                ---------+----------------------
                pidlink | 39619 15910

                . distinct pidlink if husband == 1

                | Observations
                | total distinct
                ---------+----------------------
                pidlink | 40025 16008



                As you can see there is a different number of husband and wives. I provide a data example following this step below:

                clear
                input str10(pidlink hhid) float(wave husband wife)
                "001060005" "0010600" 1 0 0
                "001060006" "0010600" 1 0 0
                "001060001" "0010600" 1 1 0
                "001060002" "0010600" 1 0 1
                "001060004" "0010600" 1 0 0
                "001060003" "0010600" 1 0 0
                "001060004" "0010600" 2 0 0
                "001060002" "0010600" 2 0 1
                "001060006" "0010600" 2 0 0
                "001060005" "0010600" 2 0 0
                "001060003" "0010600" 2 0 0
                "001060001" "0010600" 2 1 0
                "001065104" "0010651" 5 0 0
                "001060004" "0010651" 5 0 1
                "001065102" "0010651" 5 1 0
                "001065103" "0010651" 5 0 0
                "001080008" "0010800" 1 0 0
                "001080006" "0010800" 1 0 0
                "001080003" "0010800" 1 0 0
                "001080005" "0010800" 1 0 0
                "001080007" "0010800" 1 0 0
                "001080009" "0010800" 1 0 0
                "001080002" "0010800" 1 0 1
                "001080010" "0010800" 1 0 0
                "001080004" "0010800" 1 0 0
                "001080001" "0010800" 1 1 0
                "001080009" "0010800" 2 0 0
                "001080008" "0010800" 2 0 0
                "001080010" "0010800" 2 0 0
                "001080001" "0010800" 2 1 0
                "001080004" "0010800" 2 0 0
                "001080006" "0010800" 2 0 0
                "001080005" "0010800" 2 0 0
                "001080003" "0010800" 2 0 0
                "001080007" "0010800" 2 0 0
                "001080002" "0010800" 2 0 1
                "001085107" "0010851" 5 0 0
                "001085104" "0010851" 5 0 0
                "001085105" "0010851" 5 0 0
                "001080012" "0010851" 5 0 1
                "001080003" "0010851" 5 1 0
                "001085106" "0010851" 5 0 0
                "001085103" "0010851" 5 0 0
                "001220009" "0012200" 1 0 0
                "001220008" "0012200" 1 0 0
                "001220002" "0012200" 1 0 1
                "001220001" "0012200" 1 1 0
                "001220003" "0012200" 1 0 0
                "001220004" "0012200" 1 0 0
                "001220006" "0012200" 1 0 0
                "001220005" "0012200" 1 0 0
                "001220010" "0012200" 1 0 0
                "001220007" "0012200" 1 0 0
                "001220010" "0012200" 2 0 0
                "001220001" "0012200" 2 1 0
                "001220004" "0012200" 2 0 0
                "001220002" "0012200" 2 0 1
                "001220008" "0012200" 2 0 0
                "001220006" "0012200" 2 0 0
                "001220009" "0012200" 2 0 0
                "001220007" "0012200" 2 0 0
                "001220005" "0012200" 2 0 0
                "001220003" "0012200" 2 0 0
                "001220009" "0012200" 3 0 0
                "001220002" "0012200" 3 0 1
                "001220003" "0012200" 3 0 0
                "001220007" "0012200" 3 0 0
                "001220001" "0012200" 3 1 0
                "001220005" "0012200" 3 0 0
                "001220008" "0012200" 3 0 0
                "001220010" "0012200" 3 0 0
                "001220006" "0012200" 3 0 0
                "001220004" "0012200" 3 0 0
                "001220004" "0012200" 4 0 0
                "001220007" "0012200" 4 0 0
                "001220006" "0012200" 4 0 0
                "001220001" "0012200" 4 1 0
                "001220005" "0012200" 4 0 0
                "001220009" "0012200" 4 0 0
                "001220002" "0012200" 4 0 1
                "001220014" "0012200" 4 0 0
                "001220015" "0012200" 4 0 0
                "001220010" "0012200" 4 0 0
                "001220004" "0012200" 5 0 0
                "001220006" "0012200" 5 0 0
                "001220007" "0012200" 5 0 0
                "001220014" "0012200" 5 0 0
                "001220002" "0012200" 5 0 1
                "001220005" "0012200" 5 0 0
                "001220015" "0012200" 5 0 0
                "001220001" "0012200" 5 1 0
                "001220010" "0012200" 5 0 0
                "001220003" "0012241" 4 1 0
                "001224104" "0012241" 4 0 0
                "001224105" "0012241" 4 0 0
                "001220011" "0012241" 4 0 1
                "001220013" "0012241" 4 0 0
                "001220003" "0012241" 5 1 0
                "001220013" "0012241" 5 0 0
                "001224105" "0012241" 5 0 0
                end
                [/CODE]


                I am grateful for any help, as I cannot comprehend my mistake.

                Thank you in advance,

                ​​​​​​​Enrique

                Comment


                • #9
                  I get the same with this.
                  Code:
                  bys hhid wave: egen husbandwife = total(husband + wife)
                  
                  . keep if husbandwife==2
                  (3 observations deleted)
                  
                  . 
                  . distinct pidlink if wife == 1
                  
                  
                  total   distinct
                  
                  pidlink         12          6
                  
                  
                  . distinct pidlink if husband ==    1
                  
                  
                  total   distinct
                  
                  pidlink         12          6
                  
                  
                  . 
                  . distinct hhid if wife == 1
                  
                  
                  total   distinct
                  
                  hhid         12          6
                  
                  
                  . distinct hhid if husband == 1
                  
                  
                  total   distinct
                  
                  hhid         12          6

                  Comment


                  • #10
                    If the problem is missing data, then any solution is going to be a bit of a kluge.

                    One option is to assume that if it's 2 2 1 2 2, then it should be a 2 in between due to missing values. I'd have more confidence if the household member count followed the same pattern.

                    Code:
                    bys hhid wave: egen hhsize = count(hhid)

                    Comment


                    • #11
                      Originally posted by George Ford View Post
                      If the problem is missing data, then any solution is going to be a bit of a kluge.

                      One option is to assume that if it's 2 2 1 2 2, then it should be a 2 in between due to missing values. I'd have more confidence if the household member count followed the same pattern.

                      Code:
                      bys hhid wave: egen hhsize = count(hhid)
                      Hi George,

                      Thanks a lot for all your help.

                      Here is the output from your code. Note that the dataset also contains sons and daughters of these couples (i.e., to identify child-wife pairs):

                      bys hhid wave: egen hhsize = count(hhid)

                      .
                      end of do-file

                      . ta hhsize

                      hhsize | Freq. Percent Cum.
                      ------------+-----------------------------------
                      2 | 10,314 5.96 5.96
                      3 | 34,371 19.88 25.84
                      4 | 48,292 27.93 53.77
                      5 | 36,365 21.03 74.80
                      6 | 21,594 12.49 87.28
                      7 | 11,137 6.44 93.72
                      8 | 6,056 3.50 97.23
                      9 | 2,943 1.70 98.93
                      10 | 1,160 0.67 99.60
                      11 | 495 0.29 99.88
                      12 | 132 0.08 99.96
                      13 | 52 0.03 99.99
                      15 | 15 0.01 100.00
                      ------------+-----------------------------------
                      Total | 172,926 100.00



                      Comment


                      • #12
                        This might get you started. I've dropped the middle observation of wife for hhid==0012200 to create some interest.

                        I think the kids condition may be redundant in markreplace.

                        The math to address a change in kids should be easy enough to implement, but this might tell you how big the problem is.

                        Code:
                        clear
                        input str10(pidlink hhid) float(wave husband wife)
                        "001060005" "0010600" 1 0 0
                        "001060006" "0010600" 1 0 0
                        "001060001" "0010600" 1 1 0
                        "001060002" "0010600" 1 0 1
                        "001060004" "0010600" 1 0 0
                        "001060003" "0010600" 1 0 0
                        "001060004" "0010600" 2 0 0
                        "001060002" "0010600" 2 0 1
                        "001060006" "0010600" 2 0 0
                        "001060005" "0010600" 2 0 0
                        "001060003" "0010600" 2 0 0
                        "001060001" "0010600" 2 1 0
                        "001065104" "0010651" 5 0 0
                        "001060004" "0010651" 5 0 1
                        "001065102" "0010651" 5 1 0
                        "001065103" "0010651" 5 0 0
                        "001080008" "0010800" 1 0 0
                        "001080006" "0010800" 1 0 0
                        "001080003" "0010800" 1 0 0
                        "001080005" "0010800" 1 0 0
                        "001080007" "0010800" 1 0 0
                        "001080009" "0010800" 1 0 0
                        "001080002" "0010800" 1 0 1
                        "001080010" "0010800" 1 0 0
                        "001080004" "0010800" 1 0 0
                        "001080001" "0010800" 1 1 0
                        "001080009" "0010800" 2 0 0
                        "001080008" "0010800" 2 0 0
                        "001080010" "0010800" 2 0 0
                        "001080001" "0010800" 2 1 0
                        "001080004" "0010800" 2 0 0
                        "001080006" "0010800" 2 0 0
                        "001080005" "0010800" 2 0 0
                        "001080003" "0010800" 2 0 0
                        "001080007" "0010800" 2 0 0
                        "001080002" "0010800" 2 0 1
                        "001085107" "0010851" 5 0 0
                        "001085104" "0010851" 5 0 0
                        "001085105" "0010851" 5 0 0
                        "001080012" "0010851" 5 0 1
                        "001080003" "0010851" 5 1 0
                        "001085106" "0010851" 5 0 0
                        "001085103" "0010851" 5 0 0
                        "001220009" "0012200" 1 0 0
                        "001220008" "0012200" 1 0 0
                        "001220002" "0012200" 1 0 1
                        "001220001" "0012200" 1 1 0
                        "001220003" "0012200" 1 0 0
                        "001220004" "0012200" 1 0 0
                        "001220006" "0012200" 1 0 0
                        "001220005" "0012200" 1 0 0
                        "001220010" "0012200" 1 0 0
                        "001220007" "0012200" 1 0 0
                        "001220010" "0012200" 2 0 0
                        "001220001" "0012200" 2 1 0
                        "001220004" "0012200" 2 0 0
                        "001220002" "0012200" 2 0 1
                        "001220008" "0012200" 2 0 0
                        "001220006" "0012200" 2 0 0
                        "001220009" "0012200" 2 0 0
                        "001220007" "0012200" 2 0 0
                        "001220005" "0012200" 2 0 0
                        "001220003" "0012200" 2 0 0
                        "001220009" "0012200" 3 0 0
                        /*"001220002" "0012200" 3 0 1*/
                        "001220003" "0012200" 3 0 0
                        "001220007" "0012200" 3 0 0
                        "001220001" "0012200" 3 1 0
                        "001220005" "0012200" 3 0 0
                        "001220008" "0012200" 3 0 0
                        "001220010" "0012200" 3 0 0
                        "001220006" "0012200" 3 0 0
                        "001220004" "0012200" 3 0 0
                        "001220004" "0012200" 4 0 0
                        "001220007" "0012200" 4 0 0
                        "001220006" "0012200" 4 0 0
                        "001220001" "0012200" 4 1 0
                        "001220005" "0012200" 4 0 0
                        "001220009" "0012200" 4 0 0
                        "001220002" "0012200" 4 0 1
                        "001220014" "0012200" 4 0 0
                        "001220015" "0012200" 4 0 0
                        "001220010" "0012200" 4 0 0
                        "001220004" "0012200" 5 0 0
                        "001220006" "0012200" 5 0 0
                        "001220007" "0012200" 5 0 0
                        "001220014" "0012200" 5 0 0
                        "001220002" "0012200" 5 0 1
                        "001220005" "0012200" 5 0 0
                        "001220015" "0012200" 5 0 0
                        "001220001" "0012200" 5 1 0
                        "001220010" "0012200" 5 0 0
                        "001220003" "0012241" 4 1 0
                        "001224104" "0012241" 4 0 0
                        "001224105" "0012241" 4 0 0
                        "001220011" "0012241" 4 0 1
                        "001220013" "0012241" 4 0 0
                        "001220003" "0012241" 5 1 0
                        "001220013" "0012241" 5 0 0
                        "001224105" "0012241" 5 0 0
                        end
                        
                        bys hhid wave: egen husbandwife = total(husband + wife)
                        bys hhid wave: egen hhsize = count(hhid)
                        g kids = hhsize - husbandwife
                        
                        bys hhid: egen max_husbandwife = max(husbandwife)
                        keep if max_husbandwife==2 // delete cases not desired
                        
                        bys hhid: egen max_hhsize = max(hhsize)
                        bys hhid: egen max_kids = max(kids)
                        bys hhid: egen min_wave = min(wave)
                        bys hhid: egen max_wave = max(wave)
                        
                        g markreplace = (husbandwife - max_husbandwife)==-1 & (hhsize - max_hhsize)==-1 & (kids==max_kids) & (wave != min_wave) & (wave != !max_wave)
                        * wave included as condition since you don't have data to either side to confirm.
                        g markcheck = (husbandwife - max_husbandwife)==-1 & (hhsize - max_hhsize)==-1 & (kids!=max_kids)
                        replace husbandwife = 2 if markreplace

                        Comment


                        • #13
                          Enrique,

                          The data sample you provide cannot be the data sample on which you ran your code (> 40,000 observations deleted.) Try to extract a small set of data which produces the phenomenon that concerns you: more distinct wives than husbands.

                          I suspect also that you will get better answers if you describe your data more fully. I guess that you are using some kind of household sample which is resurveyed multiple times (waves.) I don't know anything about this sample-- even how a household is defined. I would not be surprised if the number of children changes over time in the same HH ID. But suppose a head of household changes spouses. Does the HH ID change or stay the same? If the spouse who is not designated head of HH gets a new spouse does he drop out of the survey or get a new HH ID? I'm not certain, given the possible changes in HH composition, whether the numbers of distinct husbands & wives should be the same.
                          Devra Golbe
                          Professor Emerita, Dept. of Economics
                          Hunter College, CUNY

                          Comment

                          Working...
                          X