Calculating age-adjusted hormone levels from a cubic spline analysis

Theresa Gildner

Join Date: Oct 2018

Posts: 9
#1

Calculating age-adjusted hormone levels from a cubic spline analysis

12 Oct 2018, 13:23

Hi everyone! I'm relatively new to Stata (and the forum) and am having some trouble figuring out a new statistical analysis. I am looking at male testosterone level variation in relation to age and BMI. I collected six samples from each participant (one in the morning and one in the evening for three days to capture how testosterone levels change throughout the day) and I used multilevel models to account for within-individual and between-individual variation (and calculate daily change in testosterone and total daily production). However, I ran into a problem when I added age and BMI into the models. Age and testosterone levels do not exhibit a linear relationship. Testosterone is low in young men, peaks in adulthood, and declines in older men. To account for this non-linear relationship, I used the mkspline command to calculate restricted cubic splines for each testosterone sample. For example (where "LogT_t1" is the log transformed concentration value of the first testosterone sample):

mkspline Agesp = Age, cubic displayknots nknots(3)
regress LogT_t1 Agesp*

I would now like to use these regression equations to generate an age-adjusted value for each of the six testosterone samples, which I can then plug into my multilevel models and examine the association between the age-adjusted testosterone parameters and BMI. Any recommendations for how to do this? Thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

12 Oct 2018, 14:37

Rather than calculating an adjusted logT level for each observation and then using that in your next analysis, why not just stick with logT as your outcome variable in the multilevel model and include the Agesp* variables as predictors?
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#3

12 Oct 2018, 19:19

Thank you for the suggestion, that seems like a good solution. How would I account for an age-by-BMI interaction in that model though? Would I just enter an age*BMI interaction term (the original age variable) into the model to see if it accounts for any additional variation beyond what is already in the model? Thanks for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

12 Oct 2018, 20:43

Would I just enter an age*BMI interaction term (the original age variable) into the model to see if it accounts for any additional variation beyond what is already in the model?

No, definitely not. You would include c.bmi##c.(Agesp*). Interaction terms must always be built out of the same constituent effects that appear by themselves in the model. The best way to assure you always get that right is to avoid the # interaction in favor of the ## interaction. By letting Stata expand that for you automatically, you will always be guaranteed perfect consistency: nothing gets left out, and nothing gets miscoded. So

Code:

regress logT_t1 c.BMI##c.(Agesp*)

will do the trick.

I should also point out that any tests that you subsequently do involving Age must always include all of the Agesp* variables, too. So if you want to then test whether the interaction significantly contributes to the model, it would be

Code:

testparm c.BMI#c.(Agesp*)

Last edited by Clyde Schechter; 12 Oct 2018, 20:47.
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#5

13 Oct 2018, 13:11

Wonderful, thanks so much! That works great in the models. One last question. Do you have any recommendations for how to compare model fit between different multi-level models when using multiply imputed data? Specifically, I'm trying to determine whether the null, fixed-effects, or random slopes model best fit the data. Normally I would store the model estimates and use likelihood ratio tests, but those commands don't work when using multiply imputed data. I've spent some time looking online (including on this forum), but all the code I've found so far doesn't seem to work. Thank you very much for your help, I really appreciate it!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

13 Oct 2018, 13:38

I don't know how to do this kind of model selection with multiply imputed data. There are several forum members who are more knowledgeable about multiple imputation than I am and perhaps one of them can point the way.
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#7

13 Oct 2018, 17:02

Sounds good, hopefully one of them can weigh in. Whichever model is the best fit, the fixed effects and random slopes models provide the same basic results (with slightly different coefficients). It looks like the interaction between age and BMI is not significant, but BMI and both of the age terms resulting from the spline analysis are significant. I'm a little unclear how to interpret the main effects of the two age coefficients. Does it have to do with where the participant ages fall relative to where the knots are located? Thank you again for your time, especially on the weekend! You've been incredibly helpful!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

13 Oct 2018, 17:44

Well, cubic splines are complicated and there is no direct simple interpretation of the coefficients. It's a bit as if you had a high degree polynomial in age. The coefficients of each individual term, age, age^2, age^3, etc. are not individually interpretable. Similar considerations apply to the spline variables: they are not meaningful in their own right. Their coefficients can differ in sign. They're just not directly intepretable.
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#9

13 Oct 2018, 18:27

That makes sense. However, I found an error in my code and just re-ran the models. It turns out there is a significant interaction between BMI and each of the two age terms (which actually makes more sense to me, based on what researchers know about age influences BMI, and both age and BMI seem to influence testosterone). If I was interpreting an interaction between two linear terms I would use simple effects analysis to plot the interaction using conditional values of age (±1 SD from the mean age) and BMI (±1 SD from the mean BMI). But I'm not sure how to approach interpreting this age-BMI interaction from these analyses. Do you have any suggestions? Thank you again!!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

14 Oct 2018, 15:22

So this is a bit difficult. With just a linear age term you would pick an interesting set of values of age, and then use those in the -at()- option of -margins- and then run -marginsplot-. But you can't do that here because age isn't actually part of the model. So what you need to do is pick values of age that are interesting and are also instantiated in your data set. Then from the dataset you can find out the values of the Agesp* variables that correspond to those values of age. Then use those in -at()- options in your -margins- command. Note, by the way, that you do not want to have -margins- at crossed values of the different Agesp* variables: most of those combinations will not correspond to any value of age at all. So you do something like this (illustrated with the auto.dta)

Code:

sysuse auto, clear mkspline mpgsp = mpg, nknots(3) cubic regress price i.foreign##c.(mpgsp*) levelsof mpg foreach n of numlist 15 20 25 30 { local atspec`n' foreach v of varlist mpgsp* { summ `v' if mpg == `n' local atspec`n' `atspec`n'' `v' = `r(mean)' } } tempfile margins margins foreign, at(`atspec15') at(`atspec20') at(`atspec25') at(`atspec30') /// saving(`margins') use `margins', clear graph twoway line _statistic _at2, by(_m1)

If you want to graph this, -marginsplot- will not be helpful here because it does not recognize that mpgsp* are related to each other and to mpg. So you would have to add a -saving()- option to -margins-. Then you can open that data set, identify the variable that contains the predicted values, and then use -graph twoway-. Your horizontal access variable can be the first spline variable, because the first variable in a cubic spline is always equal to the variable that it is derived from. So Agesp1 == age. The code above illustrates the approach. You may want to use a different approach to the graph, and the `margins' tempfile has everything in it you need: it's just a matter of crafting the -graph- commands to get the specific display you want.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#11

14 Oct 2018, 19:26

I have not been following this; however, one "solution" may be the use of Royston's -mcp- command; I do note that the article does not discuss how to use mcp with restricted cubic splines (except cursorily); however, on p. 524 of the SJ, article (see the help file for help in finding the article which is freely downloadable from the Stata web site), Patrick Royston says "I am happy to provide an example file on request suggesting how to produce the spline model equivalent of figure 7" (see the article to see the figure)
1 like
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#12

15 Oct 2018, 11:39

Thank you for the suggestions. I've been working on adapting the suggested code (#10), and the first part works great:

Code:

mi estimate: mixed LogT_t Time_t c.BMI##c.(Agesp*) || PID: levelsof Age foreach n of numlist 12 23 35 52 67 { local atspec`n' foreach v of varlist Agesp* { summ `v' if Age == `n' local atspec`n' `atspec`n'' `v' = `r(mean)' } }

However, I keep getting errors on the second part:

Code:

tempfile margins margins BMI, at(`atspec12') at(`atspec23') at(`atspec35') at(`atspec52') at(`atspec67') saving(`margins') use `margins', clear graph twoway line _statistic _at2, by(_m1)

Specifically, the error reads: "last estimates not found r(301)". Based on a suggestion I saw online, I included the -post- command in the MI syntax:

Code:

mi estimate, post: mixed LogT_t Time_t c.BMI##c.(Agesp*) || PID:

And that gave me a different error when I tried to use -margins- :
"Warning: cannot perform check for estimable functions.
e(sample) does not identify the estimation sample
r(322);"

Any suggestions for how to get the margins command to work? Thank you for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#13

15 Oct 2018, 12:45

You have introduced a big new complication here with multiple imputation.

-margins- does not work after multiple imputation. There is an -mimrgns- command, written by daniel klein available from SSC, that does. Dan himself expresses uncertainty whether it is in fact legitimate to do this, but has made it available. I don't know whether it supports a -saving()- option. I think it does, but, if not, that would complicate matters because -marginsplot- will not give you the kind of graph you need: you actually need to use the dataset that -margins- saves along with -graph twoway-. If you go this route, do read -help mimrgns- carefully before proceeding.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#14

15 Oct 2018, 13:53

As far as I know, margins' saving() option is (still) undocumented. Anyway, I do not allow the saving() option with mimrgns. mimrgns returns its results in r() from which the required values could probably be extracted to create a graph. Also, Ben Jann's coefplot (SSC) might be an alternative to marginsplot, if the latter cannot be used.

However even thoughI have not followed this in detail, it seems Theresa is combining a couple of things that are either not documented or supported directly and there might be good reasons why. Clyde's suggested solution looks good to me but I have not given it a lot of thought. As Clyde mentions, the imputations are likely to make things more complicated, not just from a technical point of view. For example, how does Theresa create the splines? Does she do so for each imputed dataset? I believe she should. Does she account for the interactions during the imputation process? I am sure that she has to. Also, the loop for getting the values for the Agesp* variables completely ignores the imputations. If the mean is all Theresa wants, this might almost be fine but she probably wants to exclude the non-imputed (original) dataset from the calculations. Also, she needs to make sure to have her data stored in an mi style that allows summarize to work correctly. There are probably more issues.

Best
Daniel

Last edited by daniel klein; 15 Oct 2018, 14:07. Reason: deleted potentially misleading comment
Comment
Theresa Gildner

Join Date: Oct 2018

Posts: 9
#15

15 Oct 2018, 15:38

Thanks for the feedback! I will continue to look into other options, but am leaning toward performing these analyses on my original dataset and no longer using multiple imputation. There seems to be too much complexity here to produce reliable and interpretable results. I appreciate all your suggestions, I've learned so much!
Comment

Announcement

Calculating age-adjusted hormone levels from a cubic spline analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment