Restricted cubic splines with cumulative average method

Nerea Becerra

Join Date: Mar 2019
Posts: 14

Restricted cubic splines with cumulative average method

12 Mar 2019, 09:27

Dear all,
I am trying to conduct a restricted cubic spline to assess the potential non-linear association between a continuous exposure variable and an incident event.
The point is that I am using the cumulative average method suggested by Walter Willett et al., in " Dietary Fat and coronary heart disease: a comparison of approaches for adjusting for total energy intake and modelling repeated dietary measurements".
For that purpose, after calculate the cumulative average consumption of my exposure variable, I have to reshape the dataset from wide to long format. After that, I run the mkspline stata code as well as the xblc command.
when I plot the results, the graph does not look as I thought. Actually the shape is a little bit wird.
However, if I do it only with the baseline exposure (without the cumulative average method) and I do not reshape the databse (because I do not need it), the graph that I obtain is pretty good.

Could some one tell me if we cannot run the restricted cubic spline with a database in long format (which is the format that the cumulative method needs)?.

Following you can see the the stata do-file that I am using for that purpose. You will see how the graph looks like.

Code:

clear
input float(id evento naranja0 naranja1 naranja2 naranja3 tiempo)
 1 0 200 225 200 150 1.5
 2 0 120 200 150 120 7.5
 3 1 150 125 200   . 1.6
 4 0 120 200 150 120 6.9
 5 1 120   . 150 120   3
 6 0 200 250 150 120 7.3
 7 1 200 125 300 120   4
 8 1 250 300 300 120   5
 9 0 120 200 150 120 6.7
10 0 120   . 150 120 6.9
11 0 120 225 200 150 1.5
12 0 120   . 150 120   7
13 1 150 125 200   . 1.6
14 0 120 200 150 120 6.9
15 1 120   . 150 120   3
16 0 200 250 150 120 7.3
17 1 200 125 300 120   4
18 1 250   . 300 120   5
end

Code:

*generate variable fup*
gen fup0=tiempo
replace fup0=1 if tiempo>=1

foreach x of numlist 1/3 {
    gen fup`x'=tiempo if tiempo>`x' & tiempo<`x'+1
    replace fup`x'=`x'+1 if tiempo>=`x'+1
    replace fup`x'=. if tiempo<=`x'
        }
replace fup3=tiempo if tiempo>3

sum fup0-fup3

* Event in each visit.

gen d0=0
replace d0=1 if evento==1 & tiempo<=1


foreach x of numlist 1/3 {
    gen d`x'=0
    replace d`x'=1 if evento==1 & tiempo>`x' & tiempo<=`x'+1
    replace d`x'=. if tiempo<=`x'
        }
        
        
            
*cumulative*
gen pru_naranja0= naranja0
egen pru_naranja1=rowmean(naranja0 naranja1)
egen pru_naranja2=rowmean(naranja0 naranja1 naranja2)
egen pru_naranja3=rowmean(naranja0 naranja1 naranja2 naranja3)


foreach x of numlist 0/3 {
    xtile t_prunaranja`x'=pru_naranja`x', nq(3)
    tab t_prunaranja`x'
        }        


*******************************************************************************
Restricted cubic splines
*******************************************************************************
foreach n of numlist 0/3 {
mkspline pru_naranja_sp`n' = pru_naranja`n', cubic nknots(3) displayknots
mat knots = r(knots)
}
 

rename pru_naranja_sp01 pru_naranja_1sp0
rename pru_naranja_sp02 pru_naranja_2sp0
rename pru_naranja_sp11 pru_naranja_1sp1
rename pru_naranja_sp12 pru_naranja_2sp1
rename pru_naranja_sp21 pru_naranja_1sp2
rename pru_naranja_sp22 pru_naranja_2sp2
rename pru_naranja_sp31 pru_naranja_1sp3
rename pru_naranja_sp32 pru_naranja_2sp3


*********************************************************************************************************
reshaping the database
*********************************************************************************************************
reshape long d fup  pru_naranja pru_naranja_1sp pru_naranja_2sp, i(id) j(year)
  
drop if fup ==.

rename pru_naranja_1sp  pru_naranja_sp1
rename pru_naranja_2sp  pru_naranja_sp2

**************************************
stset dataset
*************************************
stset fup, id(id) failure(d==1)
codebook  pru_naranja_sp*, c
stcox pru_naranja_sp1 pru_naranja_sp2
testparm pru_naranja_sp1 pru_naranja_sp2 // Is orange predicting the hazard ratios of diabetes?
testparm pru_naranja_sp2 // Is a spline model for orange predicting hazard ratios of diabetes better compared to a simpler linear-responsemodel? The p-value is small, so the answer is yes.There is evidence of strong departure from linearity (P-value < 0.001).


levelsof pru_naranja, local(levels)
xblc pru_naranja_sp1 pru_naranja_sp2, cov(pru_naranja) at(`r(levels)')  ref(150)  eform gen(ptorange hr Lb Ub)


* graph the result
twoway (line hr ptorange, sort) (line Lb Ub ptorange, sort lc(black black) lp(- -)), ///
legend(off) yscale(log) ylab(0.25 .5 1 2 4 8 16 32) ///
xtitle(Oranges) ytitle(Hazard ratio) yline(1) ///
title(Effect of Oranges) subtitle(Adjusted for treatment)

Dr. Nerea Becerra

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35438
#2

12 Mar 2019, 10:57

Glancing through I note that you're overwriting

Code:

mat knots = r(knots)

every time around a loop. So you will exit the loop with only the last set of knots saved. You may want something more like

Code:

mat knots`n' = r(knots)

Footnotes:

Did you know that you can write (e.g.)

Code:

forval n = 0/3 {

instead of

Code:

foreach n of numlist 0/3 {

and

Code:

gen fup0 = max(tiempo, 1)

instead of

Code:

gen fup0=tiempo replace fup0=1 if tiempo>=1

Check out

Code:

help rename groups

for more flexible use of

Code:

rename
Comment
Nerea Becerra

Join Date: Mar 2019

Posts: 14
#3

13 Mar 2019, 05:36

Thank you Nick for your prompte reply.
You are right, there was a mistake in the loop regarding mat knots. However, these matrix are not used with the xlbc command.
If you look at the stata synta, here we only use the variables generated from the mkspline as well as the original variable used for this purpose.

Code:

xblc pru_naranja_sp1 pru_naranja_sp2, cov(pru_naranja) at(`r(levels)') ref(150) eform gen(ptorange hr Lb Ub)

At the end, the graph looks like

As you can see the shape does not look like a restricted cubic spline. There are pronounced peaks. However, if I run the same analysis but taking into account only the baseline exposure rather than the cumulative average, and therefore, without reshaping the database,

Code:

mkspline naranja_sp0 = naranja0, cubic nknots(3) displayknots mat knots = r(knots) stset tiempo, id(id) failure(evento==1) stcox naranja_sp01 naranja_sp02 levelsof naranja0, local(levels) xblc naranja_sp*, cov(naranja0) at(`r(levels)') ref(150) eform gen(ptorange hr Lb Ub) twoway (line hr ptorange, sort) (line Lb Ub ptorange, sort lc(black black) lp(- -)), /// legend(off) yscale(log) ylab(0.25 .5 1 2 4 8 16 32) /// xtitle(Oranges) ytitle(Hazard ratio) yline(1) /// title(Effect of Oranges) subtitle(Adjusted for treatment)

The obtained graph looks much better:

I do not know where is the mistake... Actually, if we use the precitnl command instead of the xblc, the obtained predicted HR and 95%CI are exactly the same.

Code:

predictnl

Maybe the restricted cubic splines cannot be used with the cumulative average method.
Comment

Announcement

Restricted cubic splines with cumulative average method

Comment

Comment