Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning data

    Dear all,

    I am cleaning a raw dataset like this
    hhid code24h code7day
    1 145 238
    1 238 .
    1 396 145
    1 396
    1 001
    2 115 204
    2 452
    2 375
    .... ......
    In principle, for each hhid, the code appeared in code24h should be in code7day as well. For example, I must add 115 into code7day for hhid No2. Right now, I am doing it manually but it is too time-consuming and not transparent.

    Does anyone have any idea how to solve this?

    Thank you so much.

    Lanna Ng

  • #2
    Lanna:
    do you mean something like:
    Code:
    bysort hhid: replace code7day=code24h
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I have some guesses about how you holding your data and what "add" means here.

      Perhaps this is what you want:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte hhid str3(code24h code7day)
      1 "145" "238"
      1 "238" "."  
      1 "396" "145"
      1 ""    "396"
      1 ""    "001"
      2 "115" "204"
      2 ""    "452"
      2 ""    "375"
      end
      
      replace code7day = code7day + " " + code24h if code24h != "" & code24h != code7day
      
      list 
      
           +---------------------------+
           | hhid   code24h   code7day |
           |---------------------------|
        1. |    1       145    238 145 |
        2. |    1       238      . 238 |
        3. |    1       396    145 396 |
        4. |    1                  396 |
        5. |    1                  001 |
           |---------------------------|
        6. |    2       115    204 115 |
        7. |    2                  452 |
        8. |    2                  375 |
           +---------------------------+

      Comment


      • #4
        Thanks for trying to help Carlo and Nick.

        To be specific, I would like to transform the previous table to something like this:
        hhid code24h code7day
        1 145 238
        1 238 .
        1 396 145
        1 396
        1 001
        2 115 204
        2 452
        2 375
        2 115
        .... ......

        Right now, I am adding the row and number "115" into code7day manually (because it appears in code24h but not in code7day).

        I hope I make myself clear enough now.

        Best,
        Last edited by Lanna Ng; 17 Feb 2022, 09:11.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte hhid str3(code24h code7day)
          1 "145" "238"
          1 "238" ""   
          1 "396" "145"
          1 ""    "396"
          1 ""    "001"
          2 "115" "204"
          2 ""    "452"
          2 ""    "375"
          end
          
          expand 2, g(added)
          replace code7day= code24h  if added
          bys hhid code7day (added): keep if !added|(_n==1& added & !missing(code7day))

          Res.:

          Code:
          . sort hhid
          
          . l, sepby(hhid)
          
               +-----------------------------------+
               | hhid   code24h   code7day   added |
               |-----------------------------------|
            1. |    1       238                  0 |
            2. |    1                  001       0 |
            3. |    1       396        145       0 |
            4. |    1       145        238       0 |
            5. |    1                  396       0 |
               |-----------------------------------|
            6. |    2       115        115       1 |
            7. |    2       115        204       0 |
            8. |    2                  375       0 |
            9. |    2                  452       0 |
               +-----------------------------------+

          Comment


          • #6
            Code:
            clear
            input byte hhid str3(code24h code7day)
            1 "145" "238"
            1 "238" "."  
            1 "396" "145"
            1 ""    "396"
            1 ""    "001"
            2 "115" "204"
            2 ""    "452"
            2 ""    "375"
            end
            
            bysort hhid : gen id = _n 
            
            reshape long code, i(hhid id) j(which) string 
            
            replace code = "" if code == "."
            
            bysort  hhid code (which) : gen toexpand = 2 if which[1] == "24h" & which[2] == "" & code != "" 
            
            expand toexpand, gen(expanded)
            
            replace which = "7day" if expanded 
            
            su id, meanonly 
            
            replace id = r(max)  + _n if expanded 
            
            drop toexpand expanded 
            
            reshape wide code, i(hhid id) j(which) string 
            
            list, sepby(hhid)
            
                +--------------------------------+
                 | hhid   id   code24h   code7day |
                 |--------------------------------|
              1. |    1    1       145        238 |
              2. |    1    2       238            |
              3. |    1    3       396        145 |
              4. |    1    4                  396 |
              5. |    1    5                  001 |
                 |--------------------------------|
              6. |    2    1       115        204 |
              7. |    2    2                  452 |
              8. |    2    3                  375 |
              9. |    2   22                  115 |
                 +--------------------------------+

            Comment

            Working...
            X