Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Expanding unbalanced panal data with gaps

    Dear Statalist,

    I have an unbalanced panel dataset with missing data between the first and last observation month in some of the panel ids. Example data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float time str1 id float obsvar
    744 "A"   20
    745 "A" 20.5
    747 "A" 19.5
    749 "A"   18
    693 "B"  100
    697 "B"  120
    end
    format %tm time
    I can fill the gaps using this code:

    Code:
    encode id, gen(id_dum)
    xtset id_dum time
    tsfill
    which results in the following output:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float time str1 id float obsvar long id_dum
    744 "A"   20 1
    745 "A" 20.5 1
    746 ""     . 1
    747 "A" 19.5 1
    748 ""     . 1
    749 "A"   18 1
    693 "B"  100 2
    694 ""     . 2
    695 ""     . 2
    696 ""     . 2
    697 "B"  120 2
    end
    format %tm time
    label values id_dum id_dum
    label def id_dum 1 "A", modify
    label def id_dum 2 "B", modify
    However, for each ID, I would like to have two additional observations: One in the month prior to the first observation, the other in the subsequent month of the last, so that the output ends up looking like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float time str1 id float obsvar long id_dum
    743 "A"    . 1
    744 "A"   20 1
    745 "A" 20.5 1
    746 ""     . 1
    747 "A" 19.5 1
    748 ""     . 1
    749 "A"   18 1
    750 "A"    . 1
    692 "B"    . 2
    693 "B"  100 2
    694 ""     . 2
    695 ""     . 2
    696 ""     . 2
    697 "B"  120 2
    698 "B"    . 2
    end
    format %tm time
    label values id_dum id_dum
    label def id_dum 1 "A", modify
    label def id_dum 2 "B", modify
    Kind regards
    Ingo

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float time str1 id float obsvar long 
    744 "A"   20 
    745 "A" 20.5 
    746 ""     . 
    747 "A" 19.5 
    748 ""     . 
    749 "A"   18 
    693 "B"  100 
    694 ""     . 
    695 ""     . 
    696 ""     . 
    697 "B"  120 
    end
    format %tm time
    
    drop if missing(id)
    bys id (time): frame put time id if (_n==1 |_n==_N) & !missing(id), into(merge)
    frame merge{
        bys id (time): replace time= cond(_n==1, time-1, time+1)
        tempfile merge
        save `merge', replace
    }
    frame drop merge
    append using `merge' 
    encode id, gen(id_dum)
    xtset id_dum time
    tsfill
    Res.:

    Code:
    . l, sepby(id_)
    
         +--------------------------------+
         |    time   id   obsvar   id_dum |
         |--------------------------------|
      1. | 2021m12    A        .        A |
      2. |  2022m1    A       20        A |
      3. |  2022m2    A     20.5        A |
      4. |  2022m3             .        A |
      5. |  2022m4    A     19.5        A |
      6. |  2022m5             .        A |
      7. |  2022m6    A       18        A |
      8. |  2022m7    A        .        A |
         |--------------------------------|
      9. |  2017m9    B        .        B |
     10. | 2017m10    B      100        B |
     11. | 2017m11             .        B |
     12. | 2017m12             .        B |
     13. |  2018m1             .        B |
     14. |  2018m2    B      120        B |
     15. |  2018m3    B        .        B |
         +--------------------------------+

    Comment


    • #3
      There may be a nicer way of doing this, but one way would be:

      Code:
      * tag first and last observation for each ID
      byso id_dum: egen first = min(time)
      replace first = (time == first)
      
      byso id_dum: egen last = max(time)
      replace last = (time == last)
      
      * duplicate first/last observations and change time
      expand 2 if first == 1, gen(dup1)
      replace time = time - 1 if first == 1 & dup1 == 1
      
      expand 2 if last == 1, gen(dup2)
      replace time = time + 1 if last == 1 & dup2 == 1
      
      * make values for duplicated observation missing
      replace obsvar = . if dup1 == 1 | dup2 == 1
      
      sort id_dum time
      drop dup1 dup2 first last

      Comment


      • #4
        The two approaches work as intended. Thank you both very much!

        I haven't been aware of the frame command so far. After taking a first look into the help file, this seems to be quite useful in a number of situations.

        Comment

        Working...
        X