Standardizing longitudinal data according to a baseline reference norm

Stephanie G Schrempft

Join Date: Jul 2020

Posts: 5
#1

Standardizing longitudinal data according to a baseline reference norm

24 Jul 2020, 02:43

Hello,

I am examining the longitudinal trajectories of various biomarkers, and I would like to standardize the biomarkers at each time point according to a baseline reference norm (formed from young, healthy participants) by sex. Thus far, I used the syntax below. However, the resulting ‘z scores’ are not true z scores as they do not have a mean of 0 and a sd of 1. How can I standardize in this way and obtain true z scores? Any thoughts would be much appreciated.

*generate norm means and standard deviations for females
foreach var of varlist biomarker0 biomarker0 biomarker0 biomarker0 {
sum `var' if age < 45 & gender == 0 & disease0 == 0
return list
gen nm_`var' = r(mean)
gen nsd_`var' = r(sd)
}

*generate norm means and standard deviations for females
foreach var of varlist biomarker0 biomarker0 biomarker0 biomarker0 {
sum `var' if age < 45 & gender == 1 & disease0 == 0
return list
replace nm_`var' = r(mean) if gender == 1
replace nsd_`var' = r(sd) if gender == 1
}

* generate zscores for biomarkers at baseline
foreach var of varlist clean_ biomarker0 biomarker0 biomarker0 biomarker0 {
gen z`var' = (`var'- nm_`var')/nsd_`var'
}

* generate zscores for biomarkers at follow-up 1 and follow-up 2
gen zbiomarker1 = (biomarker1 - nm_biomarker0) / nsd_biomarker0
gen zbiomarker2 = (biomarker2 - nm_biomarker0) / nsd_ biomarker0
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

24 Jul 2020, 11:18

It sounds to me like you are trying to do the impossible. If you use the means and standard deviations of a reference group (young healthy participants) to do the standardization, it would be just a coincidence if you got a standard normal distribution when applying that outside the reference group.

Perhaps if you explained why you want to do this in the first place, we could figure out what approach is most sensible. Usually standardizing variables just makes it difficult or impossible to interpret your analytic results, regardless of how you do it. So what are you actually trying to accomplish here? What is it about these particular biomarkers that makes you want to, in some way, standardize them?
Comment
Stephanie G Schrempft

Join Date: Jul 2020

Posts: 5
#3

26 Jul 2020, 13:33

Dear Clyde,

Thank you very much for your response. I am trying to create a longitudinal index of physiological ageing from multiple age-related biomarkers. I am trying to replicate an approach that first standardized each biomarker to have the same scale (mean = 0, sd = 1, based on their distribution at baseline when participants were all aged 26 years); then used mixed effects growth modeling to save the individual slopes for each biomarker. These slopes were then aggregated into an individual physiological ‘ageing’ score. The difference with my data is that the sample is diverse at baseline (in terms of age and comorbidities), therefore I am trying to standardize the biomarkers according to a baseline norm (formed from a young, healthy subsample at baseline). Any thoughts or suggestions would be much appreciated.

Stephanie
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

26 Jul 2020, 15:48

You probably won't like this, but my thought is that you shouldn't do this. If your sample is diverse in age and comorbidities, it seems a priori unlikely that a model that was originally applied to healthy 26 year olds will give results similar to the findings of that paper you are trying to (sort of) replicate, even if you found some good way to standardize the biomarkers. Even major determinants of health outcomes have different magnitudes of effect at different ages, or in the presence of some comorbidities. It would be a surprise if these biomarkers worked differently.

I would probably proceed by ignoring the previous work and starting with a fresh approach. I don't see what is gained by standardizing the biomarkers at all in order to create an index out of them. You can do that just as well with the unstandardized versions, and that has the advantage that you will actually be able to explain what your index means and how it weights each biomarker. Actually, I'm not really sure what the point of combining them into a single index is, but maybe I'm being too glib in saying that. In any case, you don't need to standardize anything to do growth modeling. Just do it. Make sure you include some kind of reasonable adjustments for the effects of age and comorbidities, and consider including lots of interactions (assuming your data sample is large enough to support them). That probably requires some in-depth graphical exploration of the data before you plunge into the growth modeling.
Comment
Stephanie G Schrempft

Join Date: Jul 2020

Posts: 5
#5

27 Jul 2020, 04:25

Dear Clyde,

Thanks again for getting back to me. Would the proportion of maximum scaling method (POMS; [(observed − minimum)/(maximum − minimum)]) be an acceptable alternative to z-score standardization? So each biomarker would range from 0 (=minimal possible) to 1 (=maximum possible). Although the standardization/normalization isn’t needed for the growth modeling, it would be helpful for graphs.

Best wishes,

Stephanie
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

27 Jul 2020, 12:43

Well, not really. The problem with standardization is that the mean and standard deviation vary from sample to sample, so nobody but the person doing the analysis knows what the relationship between the standardized score and the actual value is. The same problem occurs with POMS: only the person doing the analysis knows what the maximum and minimum values are. That said, the POMS is probably not as bad as standardization because it's pretty understandable, compared to a standard deviation (which is really incomprehensible to people without statistical training).

If the goal is to get the values into similar ranges for the purposes of graphing, why not just rescale some of the measurements by some suitable power of 10? Even a school child can understand that! Or, less desirable in my view, and only if it does not overly distort the relationships you are trying to show in the graphs, use a logarithmic scale on the graph axis(es).

By the way, it is the norm in this community to use our real first and last names as our username, to promote collegiality and professionalism. The Forum software does not permit you to edit your user name once your account is established. However, if you click on CONTACT US in the lower right corner, you can message the administrator to make the change for you. Thanks in advance.
Comment
Stephanie G Schrempft

Join Date: Jul 2020

Posts: 5
#7

27 Jul 2020, 14:15

Thank you, Clyde.

If I don't use standardization before aggregating the biomarkers, what else could I do to obtain a unit-weighted composite?

I will contact the forum administrator to change my name.

Best wishes,

Stephanie (Schrempft)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

27 Jul 2020, 14:29

Well, I'm not sure quite what you're looking for. If the purpose of forming the composite is to get a measure that combines the ability of each biomarker to predict some single outcome, I would regress the outcome on the biomarkers and then use the coefficients of the regression as weights in a weighted average. If the purpose of the composite is to get a measure that captures the commonality among the biomarkers, I would do either a principal components or factor analysis of the biomarkers and use that. If by a unit-weighted composite you mean one that "counts all the biomarkers equally," that is a fantasy: only if the biomarkers had a common scale of measurement could such an index exists. Standardization and POMS can give you the illusion, but not the reality, of counting each biomarker equally.
Comment

Announcement

Standardizing longitudinal data according to a baseline reference norm

Comment

Comment

Comment

Comment

Comment

Comment

Comment