How to specify a model with linear splines to capture distinct trends by region?

Myriam Pean

Join Date: Apr 2024

Posts: 22
#1

How to specify a model with linear splines to capture distinct trends by region?

24 Mar 2025, 10:10

Dear Statalist user,

How could you help me with this issue, please? I am working on a Poisson model to analyze trends in two regions. My goal is to estimate distinct slopes for each region, with a break in 1980 to capture a change in trend.

I used the following specification:

[poisson _d c.AgeCourant c.periode1_RegionA c.periode1_RegionB /// c.periode2_RegionA c.periode2_RegionB i.Niveduc i.Revenu i.parité ///
if _st==1 & Anneecal <= 1988, exposure(TempsÀRisque) irr allbaselevels]

What I want to achieve:
Have a specific slope for each region before 1980

Have a specific slope for each region after 1980

Capture a break in 1980 to see if there is a distinct change in trend for each region

Problem:

I want to make sure my model is correctly specified. Currently, I have defined the linear splines as follows:

[gen periode1_Ontario = (Traitement == 0) * (Anneecal - 1960) * (Anneecal < 1980)
gen periode1_Quebec = (Traitement == 1) * (Anneecal - 1960) * (Anneecal < 1980)
gen periode2_Ontario = (Traitement == 0) * (Anneecal - 1980) * (Anneecal >= 1980)
gen periode2_Quebec = (Traitement == 1) * (Anneecal - 1980) * (Anneecal >= 1980)]

Is this specification correct to obtain distinct slopes before and after 1980 for each region? If not, what would be the best way to parameterize this?

Thank you very much for your help! 😊
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#2

24 Mar 2025, 10:25

While you could do it the way you are showing, unless you are using a very old version of Stata, there is a much simpler approach available.

Code:

gen byte era = (Anneecal >= 1980) & !missing(Anneecal) poisson _d c.AgeCourant i.Traitement##i.era##c.Anneecal i.Niveduc i.Revenu i.parité /// if _st==1 & Anneecal <= 1988, exposure(TempsÀRisque) irr allbaselevels margins era#Traitement, dydx(Anneecal) predict(ir)

The output of the -margins- command will show you the slope of the incidence rate over time separately in each of the four combinations of treatment group and pre-post 1980 eras.

For more information, read -help fvvarlist- and https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.

Added Note: In the code you provided it seems that there is a correspondence between Quebec and Traitement == 1 vs Ontario and Traitement == 0. If that is not true of your situation, then your proposed code does not make sense to me, and I also cannot be sure if the code I propose here is correct, as I would not know how to handle the effects of Province and Treatment if they are separate things.

Last edited by Clyde Schechter; 24 Mar 2025, 10:28.
Comment
Myriam Pean

Join Date: Apr 2024

Posts: 22
#3

25 Mar 2025, 09:13

Thank you for your response and for suggesting a simpler approach! I really appreciate it.

I confirm that this correspondence is true: Quebec and Traitement == 1 vs Ontario and Traitement == 0

I have a question: In the approach you proposed, would it be possible for the slope values to appear in the equation results table? One of the provinces serves as the baseline, which is why I initially proceeded the way I did. However, my issue is that with my approach, the graphical output is incorrect.

Also, just to clarify, I am using Stata version 17.

Would you be able to suggest something for the graphical output as well?

Looking forward to your insights!

Thanks,

Myriam
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#4

25 Mar 2025, 09:33

I'm not sure what you mean by the "equation results table." If you are referring to the table of coefficients (and standard errors and test statistics) produced by -poisson-, the slopes you are looking for will not appear there directly. Rather those slopes would have to be calculated, using -lincom- for various combinations of the "main effect" coefficients and the two and three-way interaction terms. That's tedious and error prone work--better to let -margins- do it for you.

You don't say what kind of graph you are trying to plot. And it isn't clear to me what you might want. The four slopes are just constant numbers and there isn't really any interesting graph I can think of based on them. Perhaps you want to graph not the four slopes, but rather four curves giving the incidence rate as a function of Annecal in the four combinations of Traitement and era. If so:

Code:

margins era#Traitement, predict(ir) at(Anneecal = (1972(1)1988)) marginsplot, xdimension(Anneecal)

For the -at()- option in the -margins- command, substitute the actual range of years you are interested in. I can tell from #1 that your data of interest terminates in calendar year 1988, but there is no indication there of how early the starting year of interest is. You may or may not like the aesthetics of the default -marginsplot-. You can tailor the appearance to your liking with almost any of the options available for Stata -graph twoway- plots.

If that is not actually the graph you want, please describe clearly what you are looking for.
Comment
Myriam Pean

Join Date: Apr 2024

Posts: 22
#5

25 Mar 2025, 10:50

M. Clyde,

Thank you a lot for your response! I completely understand what you explained regarding the calculation of the slopes using -lincom- and the benefit of using -margins-.

As for the graph, what would suit me better is to have four segments of lines showing the evolution of fertility for each province: two before 1980 (one for Quebec and one for Ontario) and two after 1980, with a visible break at 1980. The idea is to see the distinct evolution (in terms of slopes) of each province during these time segments.

Thanks again for your help!

Best regards,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#6

25 Mar 2025, 11:39

As for the graph, what would suit me better is to have four segments of lines showing the evolution of fertility for each province: two before 1980 (one for Quebec and one for Ontario) and two after 1980, with a visible break at 1980. The idea is to see the distinct evolution (in terms of slopes) of each province during these time segments.

I think I'm missing something. To me this sounds like what the code I suggested in #4 will produce.
Comment
Myriam Pean

Join Date: Apr 2024

Posts: 22
#7

25 Mar 2025, 11:51

Thank you for your suggestion! However, the key difference with the code you proposed is that, in your approach, the four line segments extend from 1972 to 1988. What I am aiming for, instead, is to have two line segments extending from 1972 to 1979 for both provinces, with a break at 1980, and then two additional line segments extending from 1981 to 1988 for each province.

I hope this clarifies my request. Thanks again for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#8

25 Mar 2025, 12:27

OK. So, this is a little more complicated. Run the -margins- command as in #4, but add a -saving()- option which saves the -margins- results in a Stata data set. Next -use- that data set just saved. Open the browser so you can see the data clearly. Identify the variables that contain the values of the incidence rate (it will probably be called _margin), the values of Anneecal (I suspect it will be called _at2, but I'm not sure of that), era (I think it will be _m2) and Traitement (I suspect it will be _at1, but perhaps something else.) Then you can do something like this:

Code:

keep if (_at2 > 1980 & _m2 == 1) | (_at2 < 1980 & _m2 == 0) egen which_line = group(_m1 _m2) gen `c(obs_t)' obs_no= _n reshape wide _margin, i(obs_no) j(which_line) graph twoway line _margin* _at2, sort

As I say, you will need to carefully check which variable in the -margins- output data set corresponds to the variables needed for your graph. And, frankly, rather than coding the graph command using the names you find there, it probably makes more sense to -rename- those to incidence_rate, Anneecal, era, and Traitement and then use those names in the graph command. Finally, you can tailor the appearance of the graph with whatever -twoway- options you like.
Comment
Myriam Pean

Join Date: Apr 2024

Posts: 22
#9

27 Mar 2025, 13:21

M. Clyde,

I sincerely appreciate your valuable assistance. Thank you so much. Your explanation is very helpful, and as you anticipated, I now realize that the variable names differ in the dataset. For example, era corresponds to _eta4 instead of _m2

I have attached the data below for reference. Could you clarify what the variable _m1 represents? I understand that _m2 corresponds to Traitement, but I would like to be sure about _m1 before proceeding further.

Many thanks again for your time and support.

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29948
#10

27 Mar 2025, 14:45

I'm pretty sure that _m1 corresponds to the variable era. _at4, to me, looks like Anneecal.
Comment

Announcement

How to specify a model with linear splines to capture distinct trends by region?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment