Expanding unbalanced panal data with gaps

Ingo Geburtig

Join Date: Jun 2018
Posts: 8

Expanding unbalanced panal data with gaps

23 Mar 2024, 05:57

Dear Statalist,

I have an unbalanced panel dataset with missing data between the first and last observation month in some of the panel ids. Example data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float time str1 id float obsvar
744 "A"   20
745 "A" 20.5
747 "A" 19.5
749 "A"   18
693 "B"  100
697 "B"  120
end
format %tm time

I can fill the gaps using this code:

Code:

encode id, gen(id_dum)
xtset id_dum time
tsfill

which results in the following output:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float time str1 id float obsvar long id_dum
744 "A"   20 1
745 "A" 20.5 1
746 ""     . 1
747 "A" 19.5 1
748 ""     . 1
749 "A"   18 1
693 "B"  100 2
694 ""     . 2
695 ""     . 2
696 ""     . 2
697 "B"  120 2
end
format %tm time
label values id_dum id_dum
label def id_dum 1 "A", modify
label def id_dum 2 "B", modify

However, for each ID, I would like to have two additional observations: One in the month prior to the first observation, the other in the subsequent month of the last, so that the output ends up looking like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float time str1 id float obsvar long id_dum
743 "A"    . 1
744 "A"   20 1
745 "A" 20.5 1
746 ""     . 1
747 "A" 19.5 1
748 ""     . 1
749 "A"   18 1
750 "A"    . 1
692 "B"    . 2
693 "B"  100 2
694 ""     . 2
695 ""     . 2
696 ""     . 2
697 "B"  120 2
698 "B"    . 2
end
format %tm time
label values id_dum id_dum
label def id_dum 1 "A", modify
label def id_dum 2 "B", modify

Kind regards
Ingo

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 9945

23 Mar 2024, 09:48

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float time str1 id float obsvar long 
744 "A"   20 
745 "A" 20.5 
746 ""     . 
747 "A" 19.5 
748 ""     . 
749 "A"   18 
693 "B"  100 
694 ""     . 
695 ""     . 
696 ""     . 
697 "B"  120 
end
format %tm time

drop if missing(id)
bys id (time): frame put time id if (_n==1 |_n==_N) & !missing(id), into(merge)
frame merge{
    bys id (time): replace time= cond(_n==1, time-1, time+1)
    tempfile merge
    save `merge', replace
}
frame drop merge
append using `merge' 
encode id, gen(id_dum)
xtset id_dum time
tsfill

Res.:

Code:

. l, sepby(id_)

     +--------------------------------+
     |    time   id   obsvar   id_dum |
     |--------------------------------|
  1. | 2021m12    A        .        A |
  2. |  2022m1    A       20        A |
  3. |  2022m2    A     20.5        A |
  4. |  2022m3             .        A |
  5. |  2022m4    A     19.5        A |
  6. |  2022m5             .        A |
  7. |  2022m6    A       18        A |
  8. |  2022m7    A        .        A |
     |--------------------------------|
  9. |  2017m9    B        .        B |
 10. | 2017m10    B      100        B |
 11. | 2017m11             .        B |
 12. | 2017m12             .        B |
 13. |  2018m1             .        B |
 14. |  2018m2    B      120        B |
 15. |  2018m3    B        .        B |
     +--------------------------------+

Comment

George Taylor

Join Date: Dec 2023
Posts: 14

23 Mar 2024, 09:55

There may be a nicer way of doing this, but one way would be:

Code:

* tag first and last observation for each ID
byso id_dum: egen first = min(time)
replace first = (time == first)

byso id_dum: egen last = max(time)
replace last = (time == last)

* duplicate first/last observations and change time
expand 2 if first == 1, gen(dup1)
replace time = time - 1 if first == 1 & dup1 == 1

expand 2 if last == 1, gen(dup2)
replace time = time + 1 if last == 1 & dup2 == 1

* make values for duplicated observation missing
replace obsvar = . if dup1 == 1 | dup2 == 1

sort id_dum time
drop dup1 dup2 first last

Comment

Ingo Geburtig

Join Date: Jun 2018

Posts: 8
#4

24 Mar 2024, 04:16

The two approaches work as intended. Thank you both very much!

I haven't been aware of the frame command so far. After taking a first look into the help file, this seems to be quite useful in a number of situations.
Comment

Announcement

Expanding unbalanced panal data with gaps

Comment

Comment

Comment