Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning misfit Data

    Hello Everyone,
    I am face with another challenge, I am pretty sure I will get the solution. below is the sample of my dataset
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 hhid str9 memberid float relationdhip str10 phoneno
    "1" "1" 1 "0803431100"
    "1" "1" 2 "0803431100"
    "1" "2" 2 "0803431100"
    "1" "3" 3 "0803431100"
    "2" "1" 1 "0709849988"
    "2" "2" 5 "0709823420"
    "3" "1" 1 "0812782771"
    "3" "2" 2 "08126377" 
    "3" "3" 4 "0812782771"
    "3" "4" 3 "081279882"
    "4" "1" 1 "0809588399"
    "5" "1" 2 "0907626666"
    "5" "2" 1 "0907626666"
    end
    hhid is the household id repeats for all entries in the household
    memberid is theid assigned to each member within the household
    relationship indicates hierarchy within the household (1 is for head of the household)
    phoneno stored the phone number

    the last column stored the telephone number of the household. I want to
    1. ensure that all member of the household have the same phone number using the phone number assigned to the head of the household (relationship ==1);
    2. ensure that the phone number is 11 digits.
    3. list out all the household that did not meet the above criteria for further cleaning.

  • #2
    Hi,

    Try these commands:

    Code:
    clear
    input str4 hhid str9 memberid float relationdhip str10 phoneno
    "1" "1" 1 "0803431100"
    "1" "1" 2 "0803431100"
    "1" "2" 2 "0803431100"
    "1" "3" 3 "0803431100"
    "2" "1" 1 "0709849988"
    "2" "2" 5 "0709823420"
    "3" "1" 1 "0812782771"
    "3" "2" 2 "08126377" 
    "3" "3" 4 "0812782771"
    "3" "4" 3 "081279882"
    "4" "1" 1 "0809588399"
    "5" "1" 2 "0907626666"
    "5" "2" 1 "0907626666"
    end
    
    *CREATES samephoneno == 1 WHEN ALL INDIVIDUALS OF A HOUSEHOLD HAVE THE SAME PHONENO, 0 TO ALL HH INDIVIDUALS IF AT LEAST ONE OF THEM HAVE DIFFERENT PHONENO
    bysort hhid: gen try = 1 if _n==1
    replace try = 1 if phoneno == phoneno[_n-1]
    replace try = 0 if try==.
    bysort hhid: egen samephoneno = mean(try)
    drop try
    replace samephoneno = 0 if samephone < 1
    
    *COUNTING CHARACTERS IN PHONENO AND CREATING VAR WITH # OF CHARACTERS
    gen correctchar = 1 if strlen(phoneno) == 11
    gen char = strlen(phoneno)
    
    *LISTING ALL HH THAT DID NOT MEET ABOVE CRITERIA
    gen clean = 1 if (correctchar != 1 | samephoneno == 0)
    Please note that none of the phoneno you shared is 11 digits long, all observations you shared have to be reviewed by this criteria.

    Best;

    Comment


    • #3
      Some technique. I see only phone numbers with 10 or fewer digits here.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str4 hhid str9 memberid float relationship str10 phoneno
      "1" "1" 1 "0803431100"
      "1" "1" 2 "0803431100"
      "1" "2" 2 "0803431100"
      "1" "3" 3 "0803431100"
      "2" "1" 1 "0709849988"
      "2" "2" 5 "0709823420"
      "3" "1" 1 "0812782771"
      "3" "2" 2 "08126377" 
      "3" "3" 4 "0812782771"
      "3" "4" 3 "081279882"
      "4" "1" 1 "0809588399"
      "5" "1" 2 "0907626666"
      "5" "2" 1 "0907626666"
      end
      
      * head of hh? 
      gen ishead = relationship == 1
      
      * ishh 1 sorted after ishh 0 
      bysort hhid (ishead) : gen newphone = phoneno[_N] if ishead[_N] 
      
      * for example, do equal heads of hh have same number? 
      bysort hhid ishead : gen OK = newphone[1] == newphone[_N] 
       
       * length of number 
      gen length = length(newphone) 
      gen bad = length != 10 | missing(newphone) | !OK 
      
      list hhid-newphone bad, sepby(hhid) 
      
           +---------------------------------------------------------------------+
           | hhid   memberid   relati~p      phoneno   ishead     newphone   bad |
           |---------------------------------------------------------------------|
        1. |    1          2          2   0803431100        0   0803431100     0 |
        2. |    1          3          3   0803431100        0   0803431100     0 |
        3. |    1          1          2   0803431100        0   0803431100     0 |
        4. |    1          1          1   0803431100        1   0803431100     0 |
           |---------------------------------------------------------------------|
        5. |    2          2          5   0709823420        0   0709849988     0 |
        6. |    2          1          1   0709849988        1   0709849988     0 |
           |---------------------------------------------------------------------|
        7. |    3          3          4   0812782771        0   0812782771     0 |
        8. |    3          2          2     08126377        0   0812782771     0 |
        9. |    3          4          3    081279882        0   0812782771     0 |
       10. |    3          1          1   0812782771        1   0812782771     0 |
           |---------------------------------------------------------------------|
       11. |    4          1          1   0809588399        1   0809588399     0 |
           |---------------------------------------------------------------------|
       12. |    5          1          2   0907626666        0   0907626666     0 |
       13. |    5          2          1   0907626666        1   0907626666     0 |
           +---------------------------------------------------------------------+

      Comment


      • #4
        Please disregard my prior answer, as I didn't read the head of household rule. I apologize.

        Comment


        • #5
          thank you, I appreciate you all. the solution shared by Mr Cox worked for me. I also appreciate the contribution of Mr Paploski you don't owe me apology cos no knowledge is a waste.

          Kind Regards

          Comment

          Working...
          X