Time dummies to estimate the length of relationships

Chris Boulis

Join Date: Feb 2019
Posts: 355

#46

17 Nov 2020, 18:56

Hi Clyde Schechter. I was going to use 'seq' as my analysis time variable, however, after -stset-ing, Stata identified 732 missings for seq.

Code:

                id:  couple
     failure event:  end == 1
obs. time interval:  (seq[_n-1], seq]
 enter on or after:  begin==1
 exit on or before:  time .

------------------------------------------------------------------------------
     82,438  total observations
        732  event time missing (seq>=.)                        PROBABLE ERROR
      8,249  observations end on or before enter()
------------------------------------------------------------------------------
     73,457  observations remaining, representing
      8,233  subjects
        619  failures in multiple-failure-per-subject data
     73,597  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         1
                                          last observed exit t =        18

The problem appears to occur whenever, 'in_relationship' == 0, which in turn causes both 'spell_num' & 'seq' == . and is caused when at least one in a couple select a category other than 'de facto' or 'married' as their marital status, but the most significant issue appears to be when BOTH do not answer the marital status question, which appears to account for 602 of the 732 instances of missing.

Two key points:
- couple (50 51) answered de facto, but in wave 10, id-50 states "divorced" - this triggers the code to identify the 'end' of a 'spell', which is not the case - this is an issue as I use 'end' as the 'failure' variable ('seq' continues to count the sequence correctly).
- couple (12 24) didn't answer the marital status question. Given this group accounts for the largest share of the issue, do you see an issue if I remove them from my analysis?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave in_relationship) int spell_num byte seq float length byte(begin end mrcurr1 mrcurr2)
17  18  5 1 1  5  6 0 0 . 1
17  18  6 1 1  6  6 0 1 . 1
17  18  7 0 .  .  6 1 0 . 3
12  24  1 1 1  1  2 1 0 1 1
12  24  4 1 1  2  2 0 1 1 1
12  24  6 0 .  .  2 1 0 . .
12  24  7 0 .  .  2 0 0 . .
12  24  8 0 .  .  2 0 0 . .
12  24  9 0 .  .  2 0 0 . .
12  24 10 0 .  .  2 0 0 . .
12  24 11 0 .  .  2 0 0 . .
12  24 12 0 .  .  2 0 0 . .
50 51  6 1 1  6 11 0 0 2 2
50 51  7 1 1  7 11 0 0 . 2
50 51  8 1 1  8 11 0 1 . 2
50 51 10 0 .  . 11 1 0 4 2
50 51 11 1 3 10 11 1 0 2 2
50 51 12 1 3 11 11 0 0 2 4
50 51 13 1 3 12 11 0 0 2 3
end

Given these isues, do you think 'seq' is suitable as the analysis time variable (stset seq,) or would 'wave' or 'couple wave' (where couple is group(id p_id)) be better?

Code:

bys couple (wave): gen byte cwave = _n

Your thoughts are appreciated.

Last edited by Chris Boulis; 17 Nov 2020, 19:02.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#47

17 Nov 2020, 19:20

This doesn't sound like a question I can advise you about. Your "time" variable is intended to measure the duration of a couple's relationship. Your problem, as you describe it, arises primarily when the people in the couple do not indicate whether they are still in a relationship, or perhaps if they give discordant answers. Whether to treat this situation as a continuing relationship, an ended relationship, or unanalyzable (which would justify dropping the observation) is really a substantive question which is not in my domain of knowledge. You have to answer this question based on your understanding of the underlying science and what your research questions are. If you don't feel comfortable doing that yourself, you should consult with others who have expertise in this field. It's really not a statistical question.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#48

17 Nov 2020, 21:01

Thank you for your reply Clyde Schechter. I understand your point and I appreciate your comments. I think it's appropriate to drop the relevant observations as they are few and problematic.

Will this code ensure I only drop the obs for those couples where both partners did not respond to the marital status question?

Code:

bys id p_id (wave): drop if missing(mrcurr1, mrcurr2)

where mrcurr1 is the male partner's marital status and mrcurr2 is female partner's marital status.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#49

17 Nov 2020, 21:31

No, that will drop the observation if either one doesn't respond. What you want is:

Code:

drop if missing(mrcurr1) & missing(mrcurr2)

(And you don't need the -bys id p_id (wave)- part for this one--doing it by couple makes no difference in this case.)
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#50

18 Nov 2020, 16:45

Thank you Clyde Schechter. After posting, I was wondering if it would need "&". Ok, that makes sense as I'm making it a condition that this is missing for both in a couple. As always your advice is very much appreciated.

Lastly, I would like to remove those couples that only appear in the panel for one or two waves as this is too brief to be of value to my research. Could I remove them with this code?

Code:

bys id p_id (wave): drop if _N <= 2

Last edited by Chris Boulis; 18 Nov 2020, 16:56.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#51

18 Nov 2020, 17:47

Yes.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#52

18 Nov 2020, 17:55

Thank you Clyde Schechter
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#53

25 Nov 2020, 22:20

Hi Clyde Schechter. I have just identified an issue that is impacting on the code for end/begin.

In the following data for couple no.2392, you can see 'begin==1' five times and 'end==1' three times when we should only see 'begin==1' in wave 1 and 'end==1' should not occur as they do not experience failure in the survey period (18 waves), making it right-censored. (code in #32)

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) float couple byte(wave cwave begin end mrcurr1 mrcurr2) 153 159 2392 1 1 1 1 1 1 153 159 2392 2 2 1 0 . . 153 159 2392 3 3 1 0 . 1 153 159 2392 4 4 0 1 1 1 153 159 2392 5 5 0 0 . . 153 159 2392 6 6 0 0 . . 153 159 2392 7 7 0 0 . . 153 159 2392 8 8 0 0 . . 153 159 2392 9 9 1 0 . 1 153 159 2392 10 10 0 1 . 1 153 159 2392 12 11 0 0 . . 153 159 2392 13 12 1 0 . 1 153 159 2392 15 13 0 0 1 1 153 159 2392 16 14 0 0 . 1

I believe the trigger for these errors occurs when both in the couple do not report their marital status (mrcurr1 mrcurr2), which occurs in wave 2, 5, 6, 7, 8 & 12. As a result, 'end==1' in wave 1 and 'begin==1' in wave 2. 'begin==1' in wave 3 due to a response by one of the couple in wave 3. The missing responses in wave 5 triggers 'end==1' in wave 4. As neither respond until wave 9, only then does 'begin==1'. The non-response in wave 12 triggers 'end==1' in wave 10 (wave 11 is missing), with 'begin==1' in wave 13.

An issue also arises if both report a marital status that is neither 'de facto' or 'married (i.e. !inlist(mrcurr1, 1, 2). e.g. id-105 selected mrcurr==6 "single and not de facto" even though 'de facto' was entered in waves prior/after

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) float couple byte(wave cwave begin end mrcurr1 mrcurr2) 105 708 3 7 1 1 1 2 2 105 708 3 14 2 1 0 6 2 105 708 3 15 3 1 0 2 2

I think this may be addressed without touching the code in #32? I think that missing values (and responses that differ from 'inlist(mrcurr, 1, 2)) could be replaced with previous values. - unless one of the two obtain a new partner. Would this work?

Code:

bys couple(wave): replace mrcurr1 = mrcurr1[_n-1] if missing(mrcurr1) | !inlist(mrcurr1, 1, 2) bys couple(wave): replace mrcurr2 = mrcurr2[_n-1] if missing(mrcurr2) | !inlist(mrcurr1, 1, 2)

where couple = couple id

Code:

egen couple = group(id p_id) if !missing(id, p_id), label

I really appreciate your thoughts/comments.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#54

27 Nov 2020, 10:27

Well, as I have said in many contexts, missing data is a problem for which there is no good solution, only not-so-bad, bad, and terrible ones. The decision how to deal with missing data is best made through thinking through the process in the real world that generates the data and generates the missingness. Your area of research is one in which I have no expertise. I can only offer a layman's perspective guided by whatever wisdom my age confers upon me.

Why are people not reporting their relationship status. Some of this has nothing to do with relationships themselves. Perhaps on some versions of the questionnaire it is located in conveniently or is hard to see and easy to skip over by accident. Sometimes people are just distracted while doing a survey, interrupt it, and then resume in the "wrong place." But relationships are difficult to define. Sure, legally defined relationship status like marriage is clear cut: you are married if you have gone through the necessary legal requirements, and you remain so until either your spouse dies or a court decrees your marriage ended. But you are also asking about marriage-like relationships, which people can enter and leave pretty much at will. People might still be cohabiting but feel like things are coming to an end between them. Or the other way around. And such relationships can appear to dissolve, only to resume again some time later, perhaps more permanently the next time, but perhaps not. One might be having "an affair" that the other may or may not know about. There is a reason that social media sites offer "it's complicated" as a response option to questions about relationship status!

I think your proposal to carry forward the previous response is one reasonable approach. But it should be done with the understanding that it will get it right much of the time, but will also certainly result in some degree of misclassification. Another approach might be to simply drop observations where neither partner reports a status, and, if the two partners report conflicting status, always go with "together", or always go with "not together." Actually, depending on how much time and effort you are able to devote to this project, I would probably attempt all three of these approaches (and maybe even think of a few others) and see whether your main conclusions are robust to changes in these treatments of missing or conflicting information.

I don't know what other information is available in your survey data, but it might also be the case that the status for missing responses can be (probabilistically) inferred from the responses to other survey items. In that case, multiple imputation might be a reasonable approach.

This is probably not a very satisfying answer. If I had expertise in this domain, I might be able to provide more tailored, specific advice. And a good literature review and discussion with somebody else in your field might well be fruitful here. But even then, I imagine that there is no one best solution to this dilemma and that you would need to try at least a few approaches to see if the conclusions really depend on the specific way you handle this problem. With luck, they will not. But if they do, that is an important limitation to your findings that you should discuss whenever you present your research.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#55

30 Nov 2020, 16:25

Thank you Clyde Schechter. I really appreciate your very helpful advice and feedback. I will take on board your suggestions regarding approach. Kind regards, Chris.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#56

05 Dec 2020, 02:24

Hi Clyde Schechter. Can you advise how I could test that "end==1" only due to relationship failure and not other reasons, such as missing waves or where the reported marital status is !inlist(mrcurr1, 1, 2)? I tried using -browse- (but this doesn't give me what I want) e.g.

Code:

br id p_id wave end mrcurr1 mrcurr2 if end[_N] >= 1 & p_id != p_id[_n-1]

Also, I am trying to figure out how to treat couples that are not present in all waves in my effort to count the duration of a given relationship. At present I have a sequence variable:

Code:

bys id p_id (wave): gen byte seq = _n if !missing(spell_num)

and "total duration_this_pair" variable

Code:

bys id p_id spell_num: gen spell_duration = wave[_N] - wave[1] + 1 by id p_id: egen total_duration_this_pair = total(cond(spell_num != spell_num[_n-1] & in_relationship, spell_duration, .))

The latter gives negative values and seq is at times not reliable (e.g. missing waves and if a person reunites with a former partner after having a brief relationship with another person - it will continue the count rather than restart the count. I appreciate your help. Kind regards, Chris

Last edited by Chris Boulis; 05 Dec 2020, 02:31.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#57

05 Dec 2020, 10:04

It has been a while since we dealt with this aspect of your data and I no longer remember much about that. I no longer recall how variables like end, and spell_num were generated.
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 355

#58

05 Dec 2020, 18:27

Yes of course. My apologies Clyde Schechter. Here's the code for these variables:

Code:

* in a relationship
gen byte in_relationship  = inlist(mrcurr1, 1, 2) if !missing(mrcurr1) 
replace in_relationship  = inlist(mrcurr2, 1, 2) if missing(in_relationship)

* spell number
bys id p_id (wave): gen int spell_num = sum(in_relationship != in_relationship [_n-1])
replace spell_num = . if !in_relationship 

* end
bys id (wave): gen byte end = ((p_id[_n+1] != p_id) | (spell_num[_n+1] != spell_num))& _n < _N

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29906
#59

05 Dec 2020, 20:39

OK, thanks. So, I think what you probably want to do to see if the end variable is working the way you want is:

Code:

sort id p_id wave browse id p_id wave mrcurr1 mrcurr2 if inlist(1, end[_n-1], end, end[_n+1)

That will show you every observation where end = 1 along with the observations immediately preceding and following. That should enable you to inspect the data.
1 like
Comment

Chris Boulis

Join Date: Feb 2019
Posts: 355

#60

06 Dec 2020, 04:16

Thank you Clyde Schechter. That is very helpful.

Regarding the two variables noted in #56, I would like to measure the length of time a couple is together using either "total_duration_this_pair" OR "seq". The former is rarely correct, seq is mostly correct, but continues a count with an old partner after reuniting, even if having a new partner during their 'separation'. I would appreciate if you can identify a fix to either of these to ensure an accurate count the total time a couple is together.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave mrcurr1 mrcurr2 begin end) float(spell_duration total_duration_this_pair) byte seq
100001  100002  1 1 1 1 0  0  0  1
100001  100002  2 1 1 0 0  0  0  2
100001  100002  3 1 1 0 0  0  0  3
100001  100002  4 1 1 0 0  0  0  4
100003  100004  1 1 1 1 0  0  0  1
100003  100004  2 1 1 0 0  0  0  2
100003  100004  3 1 1 0 0  0  0  3
100003  100004  4 1 1 0 0  0  0  4
100006 1000842 10 2 2 1 0 -2 -2  1
100006 1000842 11 2 2 0 0 -2 -2  2
100006 1000842 12 2 2 0 0 -2 -2  3
100006 1000842 13 2 2 0 0 -2 -2  4
100006 1000842 14 2 2 0 0 -2 -2  5
100006 1000842 15 1 1 0 0 -2 -2  6
100006 1000842 16 1 1 0 0 -2 -2  7
100006 1000842 17 1 1 0 0 -2 -2  8
100006 1000842 18 1 1 0 0 -2 -2  9
100007 1106359 11 2 2 1 0  1  1  1
100008  100009  1 1 1 1 0  0  0  1
100008  100009  2 1 1 0 0  0  0  2
100008  100009  3 1 1 0 0  0  0  3
100008  100009  4 1 1 0 0  0  0  4
100008  100009  5 1 1 0 0  0  0  5
100008  100009  6 1 1 0 0  0  0  6
100008  100009  7 1 1 0 0  0  0  7
100011  600231  6 2 2 1 0 -2 -2  1
100011  600231  7 2 2 0 0 -2 -2  2
100011  600231  8 2 2 0 0 -2 -2  3
100011  600231  9 2 2 0 0 -2 -2  4
100016  200179  2 2 2 1 0  3  3  1
100016  200179  3 2 2 0 0  3  3  2
100016  200179  4 2 2 0 0  3  3  3
100016  200179  5 1 1 0 0  3  3  4
100016  200179  6 1 1 0 0  3  3  5
100016  200179  7 1 1 0 0  3  3  6
100016  200179  8 1 1 0 0  3  3  7
100016  200179  9 1 1 0 0  3  3  8
100016  200179 10 1 1 0 1  3  3  9
100016 1800788 18 2 2 1 0  1  1  1
end

Thank you in advance.

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment