Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to drop split households from panel data?

    Hi, I am struggling with dealing the split household's observation. I have 3 rounds of household panel data. Households split after round 1 survey. For ex: in round 1, HH id is 3 . Then this HH gets split in round 2 as 3.1, 3.2. Then in round 3 these split again as 3.11, 3.12 and 3.21, 3.22.
    I would like to keep only HH id with *.1 and *.11 (the main HH from baseline).

    I was trying to execute the command : keep if HHid == *.1 | HHid == *.11

    but this command is not working and i am struggling. It would be really helpful if anyone could give me an insight of it. Thank you!

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float HHid
     3.1
    3.11
    3.12
    3.13
       4
    4.11
    4.34
     5.1
    5.12
    5.99
    6.11
    end
    
    keep if ustrregexm(string(HHid), ".*\.1\b|.*\.11\b")
    Res.:

    Code:
    . l, sep(0)
    
         +------+
         | HHid |
         |------|
      1. |  3.1 |
      2. | 3.11 |
      3. | 4.11 |
      4. |  5.1 |
      5. | 6.11 |
         +------+

    Comment


    • #3
      The code in #2 is a correct solution for a reasonable interpretation of the question posed in #1. But the language there is ambiguous, and I think something different was intended. I think that O.P. wants to look at the integer part of the HHid, and then retain all records having an HHid with that integer part provided that there are also records with that integer part + .1 and + .11. If I have understood that correctly, then:
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float HHid
         1
       1.1
         2
       3.1
      3.11
      3.12
      3.13
         4
      4.11
      4.34
       5.1
      5.12
      5.99
         6
      6.11
      end
      
      gen root_hhid = string(floor(HHid), "%1.0f")
      tostring HHid, gen(hhid) format(%3.2f) force
      foreach x in "10" "11" {
          by root_hhid, sort: egen byte has_`x' = max(hhid == root_hhid + "." + "`x'")
      }
      keep if has_10 & has_11
      In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

      If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Comment


      • #4
        Thank you so much Clyde and Andrew! the code mentioned in #3 works perfectly to retain the HH, I want. Surely, I will use dataex from next time.

        Comment

        Working...
        X