Time dummies to estimate the length of relationships

Chris Boulis

Join Date: Feb 2019

Posts: 355
#16

03 Mar 2020, 22:50

While the above code (which aims to measure the length of a relationship) achieves what I want most of the time, it does not if:

1) a person changes partner, the spell sequence and spell length variables continue to increase, when they should restart.
2) a person response is missing, the spell restarts, but in this instance I want to direct Stata to use the partner's response (and if the same partner (p_id), should continue to increase).

Obviously reflecting the limitation of my knowledge, not the code. Code below. Help is appreciated.

Code:

bys id (wave): gen byte begin = inlist(marstat, 1, 2) & marstat!= marstat[_n-1] bys id (wave): replace begin = inlist(p_marstat, 1, 2) if marstat == . bys id (wave): gen byte spell = sum(begin) bys id spell (wave): gen byte end = _n == _N & inlist(marstat, 1, 2) bys id (wave): replace end = inlist(p_marstat, 1, 2) if marstat== . bys id spell (wave): gen seq = cond(spell, _n, .) bys id (wave): replace seq = inlist(p_marstat, 1, 2) if marstat== . bys id spell (wave): gen length = _N bys id (wave): replace length = inlist(p_marstat, 1, 2) if marstat== .

The link in #15 (above) provides sample data. (Note: marstat==1 - married; marstat==2 - de facto).
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29906

#17

04 Mar 2020, 10:55

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id p_id marstat p_marstat) byte wave
1001  1009 . 2  1
1001  1009 . 2  3
1001  1009 1 1  5
1001  1009 1 1  6
1001  1009 1 1  8
1001  1009 1 1  9
1001  1009 1 1 10
1001  1009 2 1 11
1001  1009 2 1 12
1001  1406 2 . 14
1074  1075 1 1  1
1074  1075 1 1  2
1074  1075 1 1  3
1074  1075 . 1  4
1074  1075 . 1  5
1074  1075 . 1  6
1074  1075 . 3  7
1075  1074 1 1  1
1075  1074 1 1  2
1075  1074 1 1  3
1075  1074 1 .  4
1075  1074 1 .  5
1075  1074 1 .  6
1075  1074 3 .  7
1075 12427 2 . 12
1075 12427 2 . 13
1075 12427 2 . 14
1075 12427 2 . 15
1075 12427 2 . 16
1075 12427 2 . 17
1075 12427 2 . 18
1188  1189 1 1  1
1188  1189 1 1  2
1188  1189 1 1  3
1188  1189 1 1  4
1188  1189 1 1  5
1188  1189 1 1  6
1188  1189 1 1  7
1188  1189 1 1  8
1188 11740 2 2 11
1188 11740 2 2 12
1188 11740 2 2 13
1188 17316 2 2 17
1188 17316 2 2 18
1191  1192 1 1  1
1191  1192 1 1  2
1191  1192 1 1  3
1191  1192 1 1  4
1191  1192 1 1  5
1191  1192 1 1  6
1191  1192 1 1  7
1191  1192 1 1  8
1191  1192 . 1  9
1191  1192 . 1 10
1191  1192 . 1 11
1191  1192 . 1 12
1192  1191 1 1  1
1192  1191 1 1  2
1192  1191 1 1  3
1192  1191 1 1  4
1192  1191 1 1  5
1192  1191 1 1  6
1192  1191 1 1  7
1192  1191 1 1  8
1192  1191 1 .  9
1192  1191 1 . 10
1192  1191 1 . 11
1192  1191 1 . 12
end

//  VERIFY THAT PARTNERS NEVER ACTIVELY DISAGREE ABOUT WHETHER
//  THEY ARE IN A RELATIONSHIP
assert inlist(marstat, 1, 2) == inlist(p_marstat, 1, 2) if !missing(marstat, p_marstat)

//  IDENTIFY OBSERVATIONS WHERE PERSON IS MARRIED OR DEFACTO
gen byte in_relationship = inlist(marstat, 1, 2) if !missing(marstat)
replace in_relationship = inlist(p_marstat, 1, 2) if missing(in_relationship)

//  IDENTIFY SPELLS OF THAT STATUS WITH SAME PARTNER
by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1])
replace spell_num = . if !in_relationship

//  CALCULATE DURATION OF ALL SPELLS OF A PAIRING
by id p_id: egen total_duration = count(spell_num)

I notice that in this data you have redundant data on couples: it appears once with id = A and p_id = B, and then (sometimes) also with p_id = B and id = A. You should consider eliminating that redundancy by verifying that the two parties never actively disagree about their status, and then keeping only one of the paired observations.

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 355
#18

10 Mar 2020, 20:47

That seems to have solved the problem Clyde Schechter - Thank you very much!

I agree with your comment:

You should consider eliminating that redundancy by verifying that the two parties never actively disagree about their status, and then keeping only one of the paired observations.

The challenge is to remove repeated observations when they are the same for both id and p_id, unless either id/p_id have additional relationships. Based on sample data in #17, id_1075, 1188 p_1074 and p_1189 all appear to have had one relationship and as such could be dropped. My first stab at the code is:

Code:

bys id: drop if marstat==p_marstat & spell_no==1

I appreciate your help, regards, Chris

Last edited by Chris Boulis; 10 Mar 2020, 20:54.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#19

14 Mar 2020, 20:43

Clyde Schechter. Why does "begin" fail (#16)? I would like to use this variable for survival analysis so appreciate help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#20

14 Mar 2020, 21:05

Your code for variable begin has three commands:

Code:

bys id (wave): gen byte begin = inlist(marstat, 1, 2) & marstat!= marstat[_n-1] bys id (wave): replace begin = inlist(p_marstat, 1, 2) if marstat == . bys id (wave): gen byte spell = sum(begin)

The problem is that it fails to correctly take into account p_marstat. It looks like it does it, but it goes wrong if p_marstat is 1 or 2 in the preceding variable. Here's how I would do this:

Code:

gen byte is_a_couple = max(inlist(marstat, 1, 2), inlist(p_marstat, 1, 2)) by id (wave), sort: gen byte begin = is_a_couple & is_a_couple[_n-1] != 1 by id (wave): gen spell = sum(begin)

The first command creates a new variable, is_a_couple if either of the two persons says they are. We have already verified earlier in the code that they never explicitly disagree about that, though sometimes one or the other is missing. Then the next line says that a relationship begins when is_a_couple = 1, but the previous observation's value of is_a_couple isn't. The third command is the usual calculation of spells by summing.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#21

15 Mar 2020, 19:27

Thanks Clyde Schechter. However, 'begin' does not pick up when id changes p_id similar to 1075 & 1188 in #17. In such situations both id & p_id agree they are in a couple, but it's a different couple for id and I haven't been able to figure out a solution. Your helps is appreciated as always.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#22

15 Mar 2020, 19:45

So, if I understand correctly what you want, you simply want a variable begin that marks the start of each spell of relationship. In that case, rather than trying to debug your code, I would say just go back to the code in #16 and then follow that with

Code:

by id p_id spell_num (wave), sort: gen byte begin = (_n == 1) & !missing(spell_num)
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#23

15 Mar 2020, 23:35

Nice solution. I see, so to ensure we only keep the count of a spell to the same pair, we sort by both id and p_id and spell_num by wave. Thanks for your help Clyde Schechter.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#24

26 Mar 2020, 05:06

Hi Clyde Schechter. I have found that my 'end' variable (code below) incorrectly shows '==1' for all ids in the last wave of the survey in which they participated (often the last published wave of data). I thought I could fix this by coding end==0 unless there is a change of relationship (in which case it would be followed with begin==1). I appreciate any help on how to address this.

Code:

bys xwaveid hhpxid spell (wave): gen byte end = _n == _N & inlist(mrcurr, 1, 2) replace end = inlist(p_mrcurr, 1, 2) if mrcurr == .

Example data in #17 should still be ok to test this.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#25

29 Mar 2020, 19:50

Hi Clyde Schechter. Even after looking over your code here and elsewhere, and Nick Cox's help files, I am still struggling with (1) missings and (2) last wave of data. Rule: I want 'end=1 if begin[_n+1]==1 for the same couple. Hence end=1' only if id or p_id changes partner.

Code:

bys id p_id spell (wave): gen byte end = begin[_n+1] & min(marstat, p_marstat) replace end = inlist(p_marstat, 1, 2) if marstat == .

In theory the replace command should address the missings problem, but doesn't. I used min() to deal with the case when one says in a couple (marstat==1 or 2), but other says separated (marstat==3) - although an infrequent case. Help very much appreciated.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#26

29 Mar 2020, 20:54

I don't understand what you are trying to do. Your first command in #25 doesn't make sense to me. What is the -& min(marstat, p_marstat)- part about? Unless marstat and p_marstat are both missing values (which, at least in your example, in #17 never happens) the minimum of those two will always be at least 1, and hence evaluate as true. So it adds nothing at all to the logic. I'm not sure what it was intended to add, but it doesn't actually add anything.

If you are just trying to identify the end of a spell, it's very simple:

Code:

by id p_id spell (wave): gen byte end = (_n == _N)

By the way spells are calculated in the code, in any given spell it is always exactly the same partners.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#27

29 Mar 2020, 21:56

Thank you Clyde Schechter for your quick response. Did you mean for "by" to be "bys" as it didn't run as 'by'? Error r(5) "not sorted". I included my code FYI, though my rule (#25) explained what I was trying to do.

Rule: I want 'end=1 if begin[_n+1]==1 for the same couple. Hence end=1' only if id or p_id changes partner.

That is 'end' is only '=1' if a relationship ends. Yes I was originally using this code but it gave me the issues I have been grappling with in #25 re

(1) missings and (2) last wave of data.

Can you suggest code to solve thes issues please.

In survival analysis this latter issue will show up as right-censored data I believe so I need 'end' to work accurately as it will be my failure variable. I think that means I also have an issue with the first wave where I understand my data will show up as left-truncated (that is, already in a relationship in wave 1). Thank you in advance.

Last edited by Chris Boulis; 29 Mar 2020, 22:15.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#28

29 Mar 2020, 22:24

Yes, I did mean -bys-. In that example data and code that I was working with the data were already properly sorted, so the command worked for me, but if you have done other things, then, yes, you need to sort again.

I don't know what you mean by "issues" with missings. Please explain what is going wrong. Show examples.

As for censoring, then I think what you want is:

Code:

// CHANGE VALUE OF END TO 0 IN FINAL SPELL OF A PARTNERSHIP IF // THEY ARE STILL TOGETHER. by id pid (wave), sort: replace end = 0 if _n == _N & in_relationship == 1 // THIS VARIABLE WAS CREATED IN #17

Last edited by Clyde Schechter; 29 Mar 2020, 22:28.
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 355

#29

30 Mar 2020, 01:23

Hi thank you Clyde Schechter. Ok, here's some data based on code in #26 & #28 (and throughout this thread). I no longer have the issue with missing (where id is missing, end=1), but as you can see 'end' should '=1' in each change of relationship, but doesn't.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave marstat p_marstat begin end)
179  180  1 2 . 1 0
179  180  3 2 . 0 0
179  180  5 1 1 0 0
179  180  6 1 1 0 0
179  180  8 1 1 0 0
179  180  9 1 1 0 0
179  180 10 1 1 0 0
179  180 11 1 2 0 0
179  180 12 1 2 0 0
180  179  1 . 2 1 0
180  179  3 . 2 0 0
180  179  5 1 1 0 0
180  179  6 1 1 0 0
180  179  8 1 1 0 0
180  179  9 1 1 0 0
180  179 10 1 1 0 0
180  179 11 2 1 0 0
180  179 12 2 1 0 0
180 146 14 2 . 1 0
186  864  8 2 . 1 0
186  864  9 2 . 0 0
186  864 10 2 . 0 0
186  864 11 1 . 0 0
186  864 12 1 . 0 0
186  864 13 1 . 0 0
186  864 14 1 . 0 0
186  864 15 1 . 0 0
186  864 16 1 . 0 0
186  864 17 1 . 0 0
186  864 18 1 . 0 0
188  189  1 1 1 1 0
188  189  2 1 1 0 0
188  189  3 1 1 0 0
188  189  4 1 1 0 0
188  189  5 1 1 0 0
188  189  6 1 1 0 0
188  189  7 1 1 0 0
188  189  8 1 1 0 0
188 740 11 2 2 1 0
188 740 12 2 2 0 0
188 740 13 2 2 0 0
188 131 17 2 2 1 0
188 131 18 2 2 0 0
189  188  1 1 1 1 0
189  188  2 1 1 0 0
189  188  3 1 1 0 0
189  188  4 1 1 0 0
189  188  5 1 1 0 0
189  188  6 1 1 0 0
189  188  7 1 1 0 0
189  188  8 1 1 0 0
116  279  2 2 2 1 0
116  279  3 2 2 0 0
116  279  4 2 2 0 0
116  279  5 1 1 0 0
116  279  6 1 1 0 0
116  279  7 1 1 0 0
116  279  8 1 1 0 0
116  279  9 1 1 0 0
116  279 10 1 1 0 0
116 1888 18 2 2 1 0
end

On the duplication issue you raised in #17 where p_id also shows as id. My rule for this code is that if both in a couple agree on status and only have one relationship in the data then drop either. But if id or p_id have another relationship, drop the data for the one that doesn't. Based on this data, drop id 179 & 189 - do you agree? If so, would the code in #18 work?

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29906

#30

30 Mar 2020, 11:46

I see. I think this will do it:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave marstat p_marstat)
179  180  1 2 .
179  180  3 2 .
179  180  5 1 1
179  180  6 1 1
179  180  8 1 1
179  180  9 1 1
179  180 10 1 1
179  180 11 1 2
179  180 12 1 2
180  179  1 . 2
180  179  3 . 2
180  179  5 1 1
180  179  6 1 1
180  179  8 1 1
180  179  9 1 1
180  179 10 1 1
180  179 11 2 1
180  179 12 2 1
180  146 14 2 .
186  864  8 2 .
186  864  9 2 .
186  864 10 2 .
186  864 11 1 .
186  864 12 1 .
186  864 13 1 .
186  864 14 1 .
186  864 15 1 .
186  864 16 1 .
186  864 17 1 .
186  864 18 1 .
188  189  1 1 1
188  189  2 1 1
188  189  3 1 1
188  189  4 1 1
188  189  5 1 1
188  189  6 1 1
188  189  7 1 1
188  189  8 1 1
188  740 11 2 2
188  740 12 2 2
188  740 13 2 2
188  131 17 2 2
188  131 18 2 2
189  188  1 1 1
189  188  2 1 1
189  188  3 1 1
189  188  4 1 1
189  188  5 1 1
189  188  6 1 1
189  188  7 1 1
189  188  8 1 1
116  279  2 2 2
116  279  3 2 2
116  279  4 2 2
116  279  5 1 1
116  279  6 1 1
116  279  7 1 1
116  279  8 1 1
116  279  9 1 1
116  279 10 1 1
116 1888 18 2 2
end



//  VERIFY THAT PARTNERS NEVER ACTIVELY DISAGREE ABOUT WHETHER
//  THEY ARE IN A RELATIONSHIP
assert inlist(marstat, 1, 2) == inlist(p_marstat, 1, 2) if !missing(marstat, p_marstat)

//  IDENTIFY OBSERVATIONS WHERE PERSON IS MARRIED OR DEFACTO
gen byte in_relationship = inlist(marstat, 1, 2) if !missing(marstat)
replace in_relationship = inlist(p_marstat, 1, 2) if missing(in_relationship)

//  IDENTIFY SPELLS OF THAT STATUS WITH SAME PARTNER
by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1])
replace spell_num = . if !in_relationship

//  CALCULATE DURATION OF ALL SPELLS OF A PAIRING
by id p_id: egen total_duration = count(spell_num)

by id p_id spell (wave), sort: gen byte begin = (_n == 1)
by id (wave), sort: gen byte end = (p_id[_n+1] != p_id) & _n < _N

Concerning the duplication: I would not work with a data set that is primarily about couples and has each couple listed twice, with different orders of the partners. At best it makes data management more complicated and wastes memory. At worst it opens up the potential for inconsistent information about the same partnership. So before we even get to the stage where your -dataex- begins, I would have gone through the data, verified that the information provided by the partners is consistent, resolved the inconsistency in some way where it is not, and then kept only one of the sets of data. Something along these lines:

Code:

gen byte in_relationship = inlist(marstat, 1, 2)
replace in_relationship = inlist(p_marstat, 1, 2) if missing(in_relationship)
drop marstat p_marstat
gen partner1 = min(id, p_id)
gen partner2 = max(id, p_id)
by partner1 partner2 wave (in_relationship), sort: assert inlist(in_relationship, in_relationship[1], .)
by partner1 partner2 wave( in_relationship): replace in_relationship = in_relationship[1]
keep partner1 partner2 wave in_relationship
duplicates drop

That would leave you with one record per wave per pairing, with enforced consistency on the in_relationship value, but tolerating missing values of marstat or p_marstat. I think this is a better way to go.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment