Time dummies to estimate the length of relationships

Chris Boulis

Join Date: Feb 2019

Posts: 355
#1

Time dummies to estimate the length of relationships

09 Feb 2020, 04:47

I am working with 18 waves of panel data (HILDA) and in each wave, respondents state their marital status (details below). I want to estimate the length of relationships for those in my data. I am not sure how to do this, but understand I should use time dummies.

I have had help with code which uses dummies to identify a change in relationship status, such as from single to married. I have similar code for other relationship changes (i.e. single to de facto, de facto to married, de facto to separated, married to separated, separated to divorced, etc). However, I feel this code is clumsy, long winded and feel there is more direct way to code it.

I would appreciate help to track a relationship from start to finish to be able to measure the length of relationships (in years).

Code:

sort id wave cap drop droppout gen droppout = mi(marstat) // marital status cap drop if_* gen if_sin_to_mar = 0 if droppout == 0 bys id (wave): replace if_sin_to_mar = 1 if marstat==1 & 1.marstat== 6 & 1.mrcurr != . & mrcurr != . bys id (wave): egen if_sin_to_mar_N = sum(if_sin_to_mar) if droppout == 0 // not sure if this helps? gen timeline_sin_mar = . replace timeline_sin_mar = 0 if if_sin_to_mar == 1 bys id (wave): replace timeline_sin_mar = l.timeline_sin_mar + 1 if timeline_sin_mar == . bys id (nwave): replace timeline_sin_mar = l.timeline_sin_mar - 1 if timeline_sin_mar == .

Below is the detail on the marital status variable: marstat
Tags: None

1 like
Clyde Schechter

Join Date: Apr 2014

Posts: 29899
#2

09 Feb 2020, 13:35

What you are calling relationships are, if I understand correctly, spells during which variable marstat does not change. You can do this very simply with:

Code:

by id (wave), sort: gen int spell_num = sum(marstat != marstat[_n-1]) by id spell_num (wave), sort: gen duration = _N

This will calculate the duration of all "relationships" each person undergoes, in units of wave. If you only want this for relationships like married and defacto, simply replace the duration with missing value when the marstat variable is not 1 or 2.

If you want the duration in units of time instead of in units of wave, you need some variable that gives the date of each wave. If that variable is called wave_date, then the code is:

Code:

by id spell_num (wave), sort: gen duration_time = wave_date[_N] - wave_date[1]
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#3

09 Feb 2020, 18:05

Thank you very much Clyde Schechter.

A 'spell' where marstat does not change works for those that move from 'single to married' or 'single to de facto', but I also want to calculate the duration of those that move from 'single to de facto' then later marry (de facto to married). And also those couples that separate and later reunite after a "break" - as such, duration would end at separation, and continue if the couple reunites.

My guess is that the code will need to include the IDs for both members of a couple - respondent (id) and partner (p_id) to correctly identify these movements (my apologies for not including this earlier). To give you a feel for the data, I provide the following of respondent #1008, who has three relationships (and 'missings' in between).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave marstat p_marstat) 1008 1089 1 1 1 1008 1089 2 1 1 1008 1089 3 1 1 1008 1089 4 1 1 1008 1089 5 1 1 1008 1089 6 1 1 1008 1089 7 1 1 1008 1089 8 1 1 1008 11040 11 2 2 1008 11040 12 2 2 1008 11040 13 2 2 1008 17016 17 2 2 1008 17016 18 2 2
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29899
#4

09 Feb 2020, 19:45

So, what are you saying here? Do you want this to count as three relationships of duration 8, 3, and 2? Or as a single relationship of duration 13 waves? Or is it a single relationship with duration 18 waves (starting at 1 and ending at 18)?

I don't see how the partner variables are relevant here: if they are married it should be 1 for both of them, and if they are defacto, it should be 2 for both of them. It shouldn't really contribute any new information. If they are not the same, that has to be a data error, right?

Last edited by Clyde Schechter; 09 Feb 2020, 20:12.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#5

09 Feb 2020, 22:46

Hi Clyde Schechter. Each time the respondent changes partner reflects a new relationship. In this example in #3, the respondent had three relationships and it should count as such. My understanding is the code in #2 will count that as three separate relationships - the first with a duration of 8 years, the second of 3 years and the third of 2 years. I am not sure if it will capture ongoing relationships that include changes in marital status as noted in #3.

For example,
(1) where a couple moves from single to de facto and then de facto to married (same couple IDs) and
(2) to capture the duration of couples that are married, separate, then get back together again (not counting the time apart).

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(xwaveid hhpxid) byte(wave mrcurr p_mrcurr) 1188 1189 1 2 2 1188 1189 2 2 2 1188 1189 3 2 2 1188 1189 4 2 2 1188 1189 5 1 1 1188 1189 6 1 1 1188 1189 7 1 1 1188 1189 8 1 1 1188 1189 9 1 1 1188 1189 10 1 1 1188 1189 11 1 1 1188 1189 12 1 1 1188 1189 13 1 1 end

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(xwaveid hhpxid) byte(wave mrcurr p_mrcurr) 1191 1192 1 1 1 1191 1192 2 1 1 1191 1192 3 1 1 1191 1192 4 1 1 1191 1192 5 1 1 1191 1192 6 3 3 1191 1192 7 3 3 1191 1192 8 3 3 1191 1192 9 1 1 1191 1192 10 1 1 1191 1192 11 1 1 1191 1192 12 1 1 end

I see your point regarding the value of the partner IDs and you are correct. If the respondent or partner were out of step it could be due to one thinking they were 'on a break' but the other could have moved on and be in a de facto relationship. But I guess that may be a rare case.

Last edited by Chris Boulis; 09 Feb 2020, 22:52.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29899
#6

10 Feb 2020, 13:16

I'm sorry, but I still don't understand how you want to handle these various situations. And your latest data confuses me still more: what happened to the marstat variable? And what do the variables xwaveid, mrcurr, and p_mrucrr represent: they have not been in your earlier data sets, and you have not explained what they are.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#7

10 Feb 2020, 19:55

Hi Clyde Schechter. My apologies - see new example data below. My understanding of the code in #2 is that it won't capture relationships where there are changes in marital status. How could the code accommodate the following scenarios?

For example,
(Data #1) where couples are de facto then later on get married - I want to count this as a single relationship or a single 'spell' (sum duration where marstat==2 & marstat==1 where their IDs do not change) and
(Data #2) where couples separate (de facto or married) and then reunite (in the future) - I want to count their total time together (sum where marstat==1 in the first 'spell' & marstat==1 in the second 'spell' - ignoring the time where marstat==3.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave marstat p_marstat) 1008 1089 1 2 2 1008 1089 2 2 2 1008 1089 3 2 2 1008 1089 4 2 2 1008 1089 5 2 2 1008 1089 6 1 1 1008 1089 7 1 1 1008 1089 8 1 1 1008 1089 9 1 1 1008 1089 10 1 1 1008 1089 11 1 1 1008 1089 12 1 1 1008 1089 13 1 1 end

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave marstat p_marstat) 1091 1192 1 1 1 1091 1192 2 1 1 1091 1192 3 1 1 1091 1192 4 1 1 1091 1192 5 1 1 1091 1192 6 1 1 1091 1192 7 1 1 1091 1192 8 3 3 1091 1192 9 3 3 1091 1192 10 1 1 1091 1192 11 1 1 1091 1192 12 1 1 end

Last edited by Chris Boulis; 10 Feb 2020, 20:03.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29899

10 Feb 2020, 21:00

Now I get it. I believe the following code does what you want. Note that it calculates duration in units of wave, not years.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long(id p_id) byte(wave marstat p_marstat)
1008  1089  1 2 2
1008  1089  2 2 2
1008  1089  3 2 2
1008  1089  4 2 2
1008  1089  5 2 2
1008  1089  6 1 1
1008  1089  7 1 1
1008  1089  8 1 1
1008  1089  9 1 1
1008  1089  10 1 1
1008  1089  11 1 1
1008  1089  12 1 1
1008  1089  13 1 1
1091  1192  1 1 1
1091  1192  2 1 1
1091  1192  3 1 1
1091  1192  4 1 1
1091  1192  5 1 1
1091  1192  6 1 1
1091  1192  7 1 1
1091  1192  8 3 3
1091  1192  9 3 3
1091  1192 10 1 1
1091  1192 11 1 1
1091  1192 12 1 1
end

//  IDENTIFY OBSERVATOINS WHERE PERSON IS MARRIED OR DEFACTO
gen byte in_relationship = inlist(marstat, 1, 2)

//  VERIFY THAT THE PARTNER AGREES THEY ARE IN RELATIONSHIP
assert in_relationship == inlist(p_marstat, 1, 2)

//  IDENTIFY SPELLS OF THAT STATUS WITH SAME PARTNER
by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1])
replace spell_num = . if !in_relationship

//  CALCULATE DURATION OF ALL SPELLS OF A PAIRING
by id p_id: egen total_duration = count(spell_num)

Moreover, if the same id has, at different times, relationships with different partners, the code will calculate these separately: the calculated duration is an attribute of the id p_id pair.

I realize you are just concerned that it would be odd, but I am more concerned that it reflects a data error if the id and partner do not agree on their marstat, at least to the extent of agreeing on whether they are in a relationship. So I have included an -assert- command to verify that this is so. If there are exceptions, the code will halt with an error message and you will need to identify the offending observations and somehow reconcile their differences (pun intended).

Comment

Chris Boulis

Join Date: Feb 2019

Posts: 355
#9

11 Feb 2020, 02:33

Awesome Clyde Schechter. So glad you got what I meant. I'll play with this in the morning and let you know how I go. Thank you!!
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#10

11 Feb 2020, 19:54

Hi Clyde Schechter. Thanks again for your help with this. Just to check my understanding is correct for the following:

Code:

by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1]) replace spell_num = . if !in_relationship

calculates a relationship. If I "tab spell_num" (see below), would i interpret this as the frequencies of the number of couples that have between 1 and 9 relationships? That 95.72% of couples have had only one relationship (during the life of the survey period)?

Code:

by id p_id: egen total_duration = count(spell_num)

provides the length of each relationship. "tab total_duration" provides a table of the frequencies of the duration of relationships (see below). For example, 5.62% of couples (represented in the table) have a relationship that lasts 1 year. Is that interpretation correct?

I ran the code and Stata reported:

Code:

. assert in_relationship == inlist(p_mrcurr, 1, 2) 17,294 contradictions in 176,564 observations assertion is false r(9);

This is likely because of inconsistencies in the reported marstat entries of the partners or where one partner's entry to marstat is 'missing'. The following data example highlights both of these:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long(id p_id) byte(wave marstat p_marstat) 180 179 1 . 2 180 179 3 . 2 180 179 5 1 1 180 179 6 1 1 180 179 8 1 1 180 179 9 1 1 180 179 10 1 1 180 179 11 2 1 180 179 12 2 1 end

When one partner does not respond to the marstat question, do you think it would be better to (1) exclude all missing responses or (2) assume that the partner that responded did so correctly - I think that (2) is low risk and would avoid losing many observations. If both partners do not respond, then the responses should be excluded. Further, in cases where each partner enters different marstat types, I believe it best to "reconcile their differences", by excluding these observations. Can we adjust this line to do so?

Code:

assert in_relationship == inlist(p_marstat, 1, 2)

With respect to the data sample, the duration should include sole responses by partner 179 in waves 1 and 3, but exclude the responses in waves 11-12 as they have different marstat types. To that end, this couple's relationship lasts 10 waves (I assume no change in missing waves).

Last edited by Chris Boulis; 11 Feb 2020, 19:57.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29899
#11

11 Feb 2020, 20:40

Code:
by id p_id (wave), sort: gen int spell_num = sum(in_relationship != in_relationship[_n-1]) replace spell_num = . if !in_relationship

calculates a relationship. If I "tab spell_num" (see below), would i interpret this as the frequencies of the number of couples that have between 1 and 9 relationships? That 95.72% of couples have had only one relationship (during the life of the survey period)?

No, that's not correct. The first command creates spell_num as an increasing variable, that starts as 1 in the first wave, and grows each time being in or out of a relationship changes. So if some pair were in a relationship in waves 1 and 2, then out from waves 3 through 5, and then back in at 6, that relationship (broken up over time) is covered by both spells 1 and 3. The second command changes the 2 to missing value because there we have a spell of not being in a relationship. But if you tab spell_num, the 3 shows up in the tabulation even though there were not three relationships: just three consecutive time periods, of which 2 were in relationship and 1 was not.

You did not orginally ask for code that would enable you to tabulate how many people had how many relationships, and this code does not address that question, nor is there a simple way to answer that question based on what this code does. To get that, I would do something different:

Code:

by in_relationship id (p_id), sort: gen partner_num = sum(p_id != p_id[_n-1]) if in_relationship by in_relationship id: egen number_of_relationships = partner_num[_N] if in_relationship by id (number_of_relationships), sort: replace number_of_relationships = number_of_relationships[1] egen id_flag = tag(id) tab number_of_relationships if id_flag

Code:
by id p_id: egen total_duration = count(spell_num)

provides the length of each relationship. "tab total_duration" provides a table of the frequencies of the duration of relationships (see below). For example, 5.62% of couples (represented in the table) have a relationship that lasts 1 year. Is that interpretation correct?

No, that's not right. The reason is that if a couple has a relationship that lasts 5 waves, that shows up in 5 observations and gets counted 5 times in the tabulation, whereas if another couple has a relationship that lasts only 2 waves, that only shows up in 2 observations and gets counted only twice in the tabulation. So you need to flag couples and condition that tabulation on that:

Code:

egen couple_flag = tag(id p_id) tab total_duration if couple_flag

When one partner does not respond to the marstat question, do you think it would be better to (1) exclude all missing responses or (2) assume that the partner that responded did so correctly - I think that (2) is low risk and would avoid losing many observations. If both partners do not respond, then the responses should be excluded. Further, in cases where each partner enters different marstat types, I believe it best to "reconcile their differences", by excluding these observations.

I can't really comment. The appropriateness of this way of dealing with these discrepancies depends on the meaning you attach to these variables and the way you will use them in analysis. To some extent it also depends on having some expertise in understanding marital relationships. I can certainly say that this is one possible good approach to the problem; but it's not the only possible good approach and I am in no position to decide which approach is best.

Assuming you want to use this approach, replace that -assert- command with:

Code:

drop if missing(marstat) & missing(p_marstat) drop if (in_relationship != inlist(p_marstat, 1, 2)) & !missing(marstat) & !missing(p_marstat)

With respect to the data sample, the duration should include sole responses by partner 179 in waves 1 and 3, but exclude the responses in waves 11-12 as they have different marstat types. To that end, this couple's relationship lasts 10 waves (I assume no change in missing waves).

No, that's different from what you just said. In waves 11 and 12, they have different values of marstat, but they both agree that they are in a relationship. They just differ over whether it is marriage or defacto.

And the code I have written would not include missing years. That would be a rather different calculation.

Please make final decisions on what you want to calculate and post back, and then we'll write some code once and for all. But if you keep changing the problem....
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#12

11 Feb 2020, 22:27

Thank you very much Clyde Schechter. My apologies for the misunderstanding. I am very appreciative of your help and you have answered my questions and more. While I did not ask for code to count the no of relationships (I mentioned counting the no of relationships in my interpretation of above code), That said, I am grateful as I will use this code for another problem.

In running the following code, Stata provided the following output:

Code:

// NUMBER OF RELATIONSHIPS PER PERSON . by in_relationship id (p_id), sort: gen partner_num = sum(id != p_id[_n-1]) if in_relationship (10,627 missing values generated) . by in_relationship id: egen number_of_relationships = partner_num[_N] if in_relationship unknown egen function partner_num[_N]() r(133);

Have I missed something?

Code:

tab total_duration if couple_flag

One taking from this output (below) is that about 17.3% of relationships last 1 year, while about 12% last 18 (or more) years.

Code:

No, that's different from what you just said. In waves 11 and 12, they have different values of marstat, but they both agree that they are in a relationship. They just differ over whether it is marriage or defacto.

Yes my apologies, you are correct. It is a physical count, I was second guessing why those in a couple would enter their marital status differently, but I guess this is not relevant to the fact they agree to be in a relationship.

Again, a BIG thank you for your help.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29899
#13

16 Feb 2020, 12:16

In running the following code, Stata provided the following output:
Code:
// NUMBER OF RELATIONSHIPS PER PERSON . by in_relationship id (p_id), sort: gen partner_num = sum(id != p_id[_n-1]) if in_relationship (10,627 missing values generated) . by in_relationship id: egen number_of_relationships = partner_num[_N] if in_relationship unknown egen function partner_num[_N]() r(133);

Have I missed something?

No, you' haven't missed anything. That's my error. Where it says -egen-, it should say -gen-. I'm sorry about that.
1 like
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#14

17 Feb 2020, 01:30

No worries Clyde Schechter. Thanks for clarifying.
Comment
Chris Boulis

Join Date: Feb 2019

Posts: 355
#15

28 Feb 2020, 18:55

Your advice is very much appreciated, thanks Clyde Schechter. After receiving your help, and due to being new to coding spells, I read everything I could find about spells (much of which is written by Nick Cox, in particular "Identifying Spells" (2007 and others (2002, 2015) and was able to draft code to do most of what I wanted to do. However, there were a couple of instances where this code (which I developed from your code and from Nick Cox's articles) failed. Given these are probably rarer cases, I decided to begin a new thread so that others also using both respondent and partner observations in panel data would find if they happen to meet similar difficulties. The new thread can be find here: https://www.statalist.org/forums/for...-in-panel-data. Thank you again.
Comment

Announcement