Hi all, I have some time-series data of children and various "statuses" of their father, which is often missing but, when valid, will take values 0-5 and almost always start at 0 and escalate in ensuing months in some order.
What I've already done is create a months pre-post variable based on status=5.
What I want to try to create next are analogous variables for the other statuses, but only when a given sequence does not eventually reach dadstatus=5. So in the below data, if I wanted to create a month_since_first2dad variable, its t=0 would be month 7 for idchild 1, because that child never reaches dadstatus=5. The month_since_first2dad variable would be missing for both idchild 2 and 3 because they eventually reach dadstatus=5. What is the most efficient way to do this?
What I've already done is create a months pre-post variable based on status=5.
What I want to try to create next are analogous variables for the other statuses, but only when a given sequence does not eventually reach dadstatus=5. So in the below data, if I wanted to create a month_since_first2dad variable, its t=0 would be month 7 for idchild 1, because that child never reaches dadstatus=5. The month_since_first2dad variable would be missing for both idchild 2 and 3 because they eventually reach dadstatus=5. What is the most efficient way to do this?
Code:
clear input idchild month dadstatus 1 1 . 1 2 . 1 3 . 1 4 0 1 5 1 1 6 1 1 7 2 1 8 . 1 9 . 1 10 . 1 11 . 2 1 . 2 2 . 2 3 0 2 4 2 2 5 4 2 6 5 2 7 5 2 8 5 2 9 5 2 10 5 2 11 . 3 1 . 3 2 . 3 3 0 3 4 1 3 5 3 3 6 . 3 7 . 3 8 1 3 9 2 3 10 5 3 11 5 end gen status5 = 1 if dadstatus == 5 bysort idchild: egen first5dad = min(month) if status5 == 1 replace first5dad = 0 if first5dad != month & first5dad != . replace first5dad = 1 if first5dad == month gen first5dad_month0 = month if first5dad == 1 by idchild: egen first5dad_month = max(first5dad_month0) gen month_since_first5dad = 0 if first5dad == 0 replace month_since_first5dad = month-first5dad_month
Comment