Problem with Group-based trajectory modeling used with individuals who were born in different years

Lorenzo Belli

Join Date: Jan 2025

Posts: 2
#1

Problem with Group-based trajectory modeling used with individuals who were born in different years

20 Jan 2025, 08:53

Hello everyone,
This is my first post in the community. And I'm using Stata18 on Windows 10.

I’m starting a thread because I need help with Group-Based Trajectory Modeling (GBTM; Nagin, 2005). For simplicity, I will start by stating the question I need help with, and then I will provide additional information about the dataset to help you assist me better.

Question:
Since...
I am using GBTM with a polynomial function of victims' age, but...

the victims reached the same age in different calendar years...

how can I specify in the model that the victims were born in different years?

Objective of my analysis:
What I want to do is use GBTM to identify whether there are distinct trajectories of IPV victimization across the lifespans of women victims. The timeframe I aim to investigate runs from age 14 to the victim’s age at the time of the “selection crime” (as I said, the IPV crime that occurred between 2010 and 2015).

I plan to use the number of victimizations experienced at each age as the dependent variable and the victim’s age as the independent variable.

Some information about the dataset:
The dataset I’m working with is unique and was specifically compiled for my project. It includes:
a) a complete cohort of individuals convicted of at least one intimate partner violence (IPV) crime between January 1, 2010, and December 31, 2014 (referred to as the “selection period”) in Catalonia (Spain), amounting to approximately 7,000 offenders;
b) a full cohort of victims of these crimes, but only if they were granted a protection order at any point in their lives, totaling approximately 4,000 survivors.

In this dataset, IPV is defined as any domestic violence or gender-based violence crime involving a partner or ex-partner. The dataset predominantly consists of male IPV offenders (approximately 95%) and female survivors.

The data spans a complete criminal and victimological history from each individual’s earliest recorded experiences until March 31, 2019. It also includes some basic socio-demographic characteristics such as gender, age, nationality, place of residence, and postal code.

The selection period is important because the sample is representative of that time frame.

Dataset structure:
Below, you will find more detailed information about the variables. To clarify further, I have attached an image; however, do not worry, as the same information can be observed in the dataex provided below. Additionally, due to the extreme sensitivity of the data, both the image and the dataset you have access to are a replica with simulated data resembling the original.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id s_birth_year s_age_at_sc v35 t35 v36 t36 v37 t37 v38 t38 v39 t39 v40 t40) 1 1965 50 0 35 0 36 0 37 0 38 0 39 1 40 2 1964 35 0 35 . . . . . . . . . . 3 1969 41 0 35 0 36 0 37 0 38 0 39 0 40 4 1973 44 0 35 0 36 0 37 0 38 0 39 1 40 5 1985 39 0 35 0 36 0 37 0 38 0 39 . . 6 1945 36 0 35 1 36 . . . . . . . . 7 1969 35 1 35 . . . . . . . . . . 8 1972 53 0 35 0 36 0 37 0 38 0 39 0 40 9 1965 42 0 35 0 36 0 37 0 38 0 39 0 40 10 1982 51 0 35 0 36 0 37 0 38 0 39 0 40 11 1977 49 0 35 0 36 0 37 0 38 0 39 0 40 12 1976 35 0 35 . . . . . . . . . . end label var id "Victim's ID" label var s_birth_year "Victim's birth year" label var s_age_at_sc "Victim's age at Selection Crime (i.e., first IPV between jan 1 2010- dec 31 2014" label var v35 "Number of victimisation(s) suffered at age 't35'" label var t35 "Age at which the number of victimizations at 'v35' was suffered."

id is the victim’s ID.

s_birth_year is the victim’s birth year.

s_age_at_sc is the victim’s age at the selection crime.

v35 is a continuous variable indicating the number of IPV crimes experienced at age 26; v27 indicates the number of IPV crimes at age 27; and so on. [0 means no IPV suffered at that age; 1 means one IPV; and so on]

t35 indicates that the victim was 35 years old at the time of v35; and so on.

Missing values in both the v* and t* variables indicate that the victim did not reach that specific age by the time of the selection crime. For example, Victim 6 was 36 years old at the time of selection crime and was not observed afterward.

Typically, this method of analysis has been used on samples of individuals who were approximately the same age. However, this is not the case with my dataset. This means that victimsin my sample are considered to be the same ages at different points in time. For instance, Victims 1 and 4 both experienced one IPV at age 40 but were born in 1965 and 1973, respectively.

Suposedly, I think I understand that GBTM allows for observing individuals who were the same age in different years (see paragraph 7.6 "Testing for Cohort Effects" on page 134 of Group-Based Modeling of Development, Nagin, 2005), but I’m not sure how to apply this.

When I run the command, my results appear significant:

Code:

traj, var(v*) indep(t*) model(zip) order(2 3 3 2)

[What you see above is the command I used on my real dataset, so it does not produce significant results when applied as is to the replicated dataset I have provided.]

Conclusion:
So -and I conclude-, following the question posed at the beginning of the post...
is the command as written above sufficient? Or should I add additional options to address the issue of age?

I hope I have explained myself clearly. If anything is unclear, please do not hesitate to ask me for further clarifications. Thank you very much in advance!
Tags: GBTM, longitudinal, trajectories

Announcement

Problem with Group-based trajectory modeling used with individuals who were born in different years