Dose-response models

Janine Stubbs

Join Date: May 2021

Posts: 34
#1

Dose-response models

21 May 2024, 23:10

Dear Statalisters,

Despite searching, I haven't found Stata routines for modelling and graphing models of dose-response data.

In my case, the data is laboratory derived with a continuous outcome (cell viability). Across the concentration gradient of the exposure chemical, the data displays an s-shaped curve with an inflection point (EC50) with upper and lower asymptotes.

Additionally, the data structure is hierarchical, with random effects for batch required.

My aim would be to calculate point estimates and the uncertainty around EC50s (concentration to affect 50% drop in viability) or slope around the inflection point for different drugs and exposure times.

Presumably this is standard stuff, so I'm a bit baffled about not having found any regression methods.

Thankyou,

Janine
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#2

23 May 2024, 01:16

Originally posted by Janine Stubbs View Post

Despite searching, I haven't found Stata routines for modelling and graphing models of dose-response data.

In my case, the data is laboratory derived with a continuous outcome (cell viability). Across the concentration gradient of the exposure chemical, the data displays an s-shaped curve with an inflection point (EC50) with upper and lower asymptotes.

You can use Stata's nl command for fitting these kinds of model. There are a few canned logistic-type functions that ship with the command (see its help file), but I think that you're looking for a conventional Emax functional relationship, which I believe isn't among the canned functions, but which is easily enough implemented with a substitutable expression.

I show an example below. (Begin at the "Begin here" comment; the first part just generates an artificial dataset for illustration. The illustration includes both graphing and modeling of dose-response data.) From your description, I guess that the cell viability ranges from zero (maximum concentration of the exposure chemical, assuming that it's toxic and not an essential nutrient) to 100 (percent, in the absence of exposure to the noxious chemical). In the illustration, I've recast the outcome as cell nonviability for convenience. Do-file and log-file are attached if you're interested.

Additionally, the data structure is hierarchical, with random effects for batch required.

You can use the Stata command menl to fit a nonlinear mixed-effects regression model to the same functional relationship. I include that in the illustration below. Due to the nature of the response variable (cell viability), the two asymptotes in your case are zero and 100, and so I've left these as model constants in the illustration. Because the Hill coefficient reflects the fundamental nature of the molecular interaction, which is unlikely to be subject to the vagaries of batch-to-batch variation in the particulars, I've left that parameter as fixed-effect in the mixed-effect model, and modeled only the batch potency (inflection point, EC50) as random. This nonlinear mixed-effect model is included in the attached files so that you can compare its results to those of the unpooled (individual nonlinear regressions) and population-averaged model.

My aim would be to calculate point estimates and the uncertainty around EC50s (concentration to affect 50% drop in viability) or slope around the inflection point for different drugs and exposure times.

I'm not sure what you mean by "batch" and random effects for it in this case, unless you're trying to model variation in cell cultures or interday variation in the laboratory. You're liable to have difficulty discerning batch-to-batch variation amid all of the experimental error associated with biological phenomena and their measurement, and so you might need to resort to fitting a Bayesian model with a regularizing prior on the batch variation parameter that helps avoid its estimate from collapsing to zero, especially if you have only relatively few batches. I haven't illustrated it below, but you can use bayesmh for this. Expression of the basic nonlinear regression (Emax) model is analogous to that for the other methods that are illustrated.

Presumably this is standard stuff, so I'm a bit baffled about not having found any regression methods.

It isn't all that standard, for example, even with the so-called conventional Emax model (which isn't the only approach to conducting dose-response analyses), you still need to decide the particulars, for example, whether the error model is additive or proportional (or exponential), whether to model the logarithm of the Hill coefficient (in order to keep its estimate positive), how random effects enter.

Code:

version 18.0 clear * // seedem set seed 2064187623 // Batches quietly set obs 35 generate byte bid = _n generate double ec50 = rnormal(3.5, 0.3) quietly expand 6 bysort bid: generate byte cnc = _n generate double cnv = rnormal(100 * cnc^3 / (cnc^3 + ec50^3), sqrt(10)) * * Begin here * // Illustration of graphing local bid = runiformint(1, 35) quietly nl (cnv = 100 * cnc^{h} / (cnc^{h} + {EC50}^{h})) /// if bid == `bid', initial(h 1 EC50 1) predict double cnv_hat, yhat graph twoway /// line cnv_hat cnc if bid == `bid', lcolor(black) || /// scatter cnv cnc if bid == `bid', mcolor(black) mfcolor(white) /// msize(medium) /// title(Batch `bid', size(small) ring(0) position(10)) scheme(s2color) /// ytitle(Cell Nonviability) ylabel( , angle(horizontal) nogrid) /// xtitle(Exposure) xline(`=_b[/EC50]', lcolor(black) lpattern(dash)) /// legend(off) quietly graph export dose-response.png // Individual (no pooling) frame create Results double(h EC50) forvalues bid = 1/35 { quietly nl (cnv = 100 * cnc^{h} / (cnc^{h} + {EC50}^{h})) /// if bid == `bid', initial(h 1 EC50 1) frame post Results (_b[/h]) (_b[/EC50]) } frame Results: ci mean h EC50 // Population-average modeling nl (cnv = 100 * cnc^{h} / (cnc^{h} + {EC50}^{h})), vce(cluster bid) /// initial(h 1 EC50 1) nolog // Mixed-effects modeling menl cnv = (100 * cnc^{h} / (cnc^{h} + ({ec50} + {EC50[bid]})^{h})), /// initial(h 1 ec50 1) nolog exit

Attached Files

dose-response.smcl (5.2 KB, 1 view)

dose-response.do (1.5 KB, 1 view)
Comment
Janine Stubbs

Join Date: May 2021

Posts: 34
#3

23 May 2024, 23:03

Dear Joseph,

Thankyou for this most helpful reply! I'd like to ask a bunch of questions. The most urgent one is: what is the parameter h and is a re-expression of the Hill equation possible to expose the parameter describing the slope around the EC50?

The same question is posed more thoroughly and with respect to other dose-response relationships in the attached document.

In addition, may I please ask two further questions:

I see that going with defaults for initial parameter estimates (i.e. zero) leads to non-sensical parameter estimates in the output. The option ,initial(h 1 EC50 1) suggests that a guesstimate is required prior to running the regression. Why is this necessary?

The “population-average modelling” includes the the ,vce(cluster bid) option, allowing for intragroup correlation. How does this differ from menl, mixed effects modelling? The results are extremely similar.

FYI, yes, you've understood the nature of the experimental system very well. I did not describe 'batch" carefully. In essence the variable captures variation in the preparation of the stock solutions and cells (rather than variation in the batch number of the drug). There is no interest to estimate these effects, but it seems necessary and prudent to take this into account when estimating the fixed effects as observations are grouped by this variable. I hope this makes sense.

Kind regards,

Janine

Attached Files

Qus around dose response equation.docx (16.4 KB, 1 view)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#4

24 May 2024, 05:25

Originally posted by Janine Stubbs View Post

. . . what is the parameter h and is a re-expression of the Hill equation possible to expose the parameter describing the slope around the EC50?

h is the Hill coefficient. You can say that it is the parameter describing the slope around EC50.

The same question is posed more thoroughly and with respect to other dose-response relationships in the attached document.

Running the following code might help elucidate the relation between the Emax model (rearranged Hill equation) and the canned "symmetric sigmoid shape" three-parameter logistic model that ships with nl.

Code:

version 18.0 clear * quietly set obs 20 generate int dos = _n * 5 generate double rsp = 100 * dos^2 / (dos^2 + 20^2) // Emax model (with E0 = 0) nl (rsp = {Emax} * dos^{h} / (dos^{h} + {ED50}^{h})), initial(h 1 ED50 1) nolog // three-parameter log-logistic function ("LL3" in some parlances) generate double ldo = log(dos) nl log3 rsp ldo, nolog display in smcl as text "ED50 = " as result exp(_b[b3]) exit

Let me know if that doesn't clear things up.

. . . a guesstimate is required prior to running the regression. Why is this necessary?

Nonlinear regression models are notoriously finicky. Many of them in biology (think multicompartment pharmacokinetics models) are even only partially identified.

The “population-average modelling” includes the the ,vce(cluster bid) option, allowing for intragroup correlation. How does this differ from menl, mixed effects modelling? The results are extremely similar.

What I called population average is actually what I think economists call a pooled regression with cluster-robust standard errors. Population-average (or "marginal") models usually have attenuated regression coefficients compared to their corresponding individual-specific (mixed-effects) counterparts. There's a good discussion of the difference in Anders Skrondal & Sophia Rabe-Hesketh, Generalized Latent Variable Modeling Multilevel, Longitudinal, and Structural Equation Models (Chapman & Hall, 2004) if my recollection is good.

I have to admit that I was a little surprised at the nearly identical regression coefficients obtained in my illustrative example. Better behaved than I expected.

. . . you've understood the nature of the experimental system very well. I did not describe 'batch" carefully. In essence the variable captures variation in the preparation of the stock solutions and cells (rather than variation in the batch number of the drug). There is no interest to estimate these effects, but it seems necessary and prudent to take this into account when estimating the fixed effects as observations are grouped by this variable.

Yeah, I was thinking that if the drug's chemistry is well characterized and is produced under good manufacturing controls, then you'll never see any batch-to-batch variation at all in a typical bioassay. Likewise, if the laboratory (academic or industrial) has established adequate operating procedures and quality controls for its cell cultures and their reagents, then you'll be hard pressed to statistically discriminate much variation there, too.
Comment
Janine Stubbs

Join Date: May 2021

Posts: 34
#5

28 May 2024, 23:35

Joseph,

That is a clever illustration. Thankyou very much indeed!

Why do you say the log3 sigmoidal logistic model is canned, if it's still available within nl?

I'm still vague about the distinction between the A. pooled regression with cluster-robust standard errors/ population-average (or "marginal") models and B. mixed-effects models, as the latter by default report population-average estimates. If anyone could weigh in on this question that was first asked in #3, that would be great!

Really loving this post; learning a lot.

Janine
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#6

29 May 2024, 00:50

Originally posted by Janine Stubbs View Post

Why do you say the log3 sigmoidal logistic model is canned, if it's still available within nl?

Canned in the sense of prepared in advance for repeated use with little or no change.

I'm still vague about the distinction between the A. pooled regression with cluster-robust standard errors/ population-average (or "marginal") models and B. mixed-effects models. . .

The following code illustrates what I was referring to.

Code:

version 18.0 clear * // seedem set seed 939352126 quietly sysuse auto summarize rep78, meanonly quietly replace rep78 = runiformint(r(min), r(max)) if missing(rep78) * * "Pooled" w/cluster-robust standard errors * glm foreign c.mpg, family(binomial) link(logit) vce(cluster rep78) nolog // Alternatively xtgee foreign c.mpg, family(binomial) link(logit) /// i(rep78) corr(independent) vce(robust) nolog * * "Population average" * xtgee foreign c.mpg, family(binomial) link(logit) /// i(rep78) corr(exchangeable) nolog * * "Individual-specific" (mixed-effects, random-effect) * xtlogit foreign c.mpg, i(rep78) nolog exit

You'll see attenuation of the regression coefficients (with nonlinear models) in comparing those of the population-average regression model to those of the individual-specific regression model.
Comment

Janine Stubbs

Join Date: May 2021
Posts: 34

04 Jun 2024, 23:36

Hi Joseph,

Got it; canned as in canned laughter!

Here are dataex data and some questions below.

Thankyou!

Janine

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long batch byte plate float time byte age float(conc ln_conc cs)
2  2 1 0 .0001 -9.2103405 2.8872
2  2 1 0 .0001 -9.2103405 5.6114
2  2 2 0 .0001 -9.2103405 9.6326
2  2 1 0 .0001 -9.2103405 5.1084
2  2 2 0 .0001 -9.2103405 5.8042
2  2 2 0 .0001 -9.2103405 8.6286
2  2 1 0   .75  -.2876821 6.8126
2  2 1 0  .075  -2.590267 3.9272
2  2 1 0  .075  -2.590267 3.5548
2  2 2 0   7.5   2.014903  6.455
2  2 1 0   7.5   2.014903 4.4118
2  2 1 0   .75  -.2876821 3.9334
2  2 1 0  7500   8.922658  .2276
2  2 1 0    75   4.317488 2.5274
2  2 1 0    75   4.317488 5.2902
2  2 2 0  .075  -2.590267 6.1958
2  2 2 0  7500   8.922658  .2444
2  2 1 0    75   4.317488 4.1722
2  2 1 0   7.5   2.014903  5.407
2  2 1 0   7.5   2.014903 3.1662
2  2 2 0   7.5   2.014903  6.623
2  2 1 0   .75  -.2876821 6.3172
2  2 2 0   750   6.620073 3.5702
2  2 1 0   750   6.620073 3.1378
2  2 1 0   750   6.620073 2.1236
2  2 2 0  .075  -2.590267 7.9884
2  2 2 0   .75  -.2876821 8.5102
2  2 1 0   750   6.620073 2.5704
2  2 2 0   750   6.620073 1.9254
2  2 2 0   7.5   2.014903 7.3484
2  2 2 0   .75  -.2876821 8.9272
2  2 2 0    75   4.317488 4.3276
2  2 2 0  .075  -2.590267 6.9172
2  2 2 0   750   6.620073 2.8238
2  2 2 0    75   4.317488 6.9014
2  2 2 0   .75  -.2876821  8.342
2  2 2 0    75   4.317488   7.55
2  2 1 0  .075  -2.590267 5.2232
3  3 1 0 .0001 -9.2103405 6.8408
3  3 2 0 .0001 -9.2103405 9.8692
3  3 1 0 .0001 -9.2103405 6.9262
3  3 2 0 .0001 -9.2103405 9.6204
3  3 1 0 .0001 -9.2103405 6.5412
3  3 2 0 .0001 -9.2103405 9.0106
3  3 1 0    75   4.317488 6.4142
3  3 1 0   750   6.620073 3.2694
3  3 1 0  .075  -2.590267 7.3872
3  3 2 0  7500   8.922658  .2442
3  3 2 0   .75  -.2876821 8.3012
3  3 2 0    75   4.317488 7.1134
3  3 1 0    75   4.317488 7.6314
3  3 1 0   7.5   2.014903 6.9052
3  3 2 0    75   4.317488  9.312
3  3 2 0   .75  -.2876821 8.6296
3  3 1 0   750   6.620073  2.783
3  3 2 0    75   4.317488 8.8604
3  3 1 0  .075  -2.590267 7.2914
3  3 1 0    75   4.317488 5.2584
3  3 2 0   750   6.620073  .4814
3  3 2 0   750   6.620073  .5442
3  3 1 0  .075  -2.590267 6.4574
3  3 1 0  7500   8.922658    .21
3  3 1 0   7.5   2.014903 7.1974
3  3 1 0   .75  -.2876821 6.6828
3  3 2 0   750   6.620073  .9712
3  3 2 0  .075  -2.590267 9.2694
3  3 1 0   750   6.620073 2.2728
3  3 1 0   .75  -.2876821 6.1138
3  3 1 0   .75  -.2876821 6.6494
3  3 2 0   7.5   2.014903 9.1328
3  3 2 0  .075  -2.590267  8.687
3  3 2 0   7.5   2.014903 8.5504
3  3 2 0   .75  -.2876821 8.2946
3  3 1 0   7.5   2.014903  5.601
3  3 2 0   7.5   2.014903 8.2558
3  3 2 0  .075  -2.590267 8.6214
2 14 1 1 .0001 -9.2103405 7.0544
2 14 1 1 .0001 -9.2103405  6.367
2 14 1 1 .0001 -9.2103405 7.3408
2 14 2 1 .0001 -9.2103405 9.3294
2 14 2 1 .0001 -9.2103405 9.1564
2 14 2 1 .0001 -9.2103405 9.3146
2 14 2 1   .75  -.2876821 8.3918
2 14 1 1  7500   8.922658  .3988
2 14 2 1  7500   8.922658  .2788
2 14 1 1  .075  -2.590267 5.9108
2 14 1 1   .75  -.2876821 4.6318
2 14 2 1   7.5   2.014903 7.9966
2 14 2 1   750   6.620073  .9506
2 14 2 1    75   4.317488 5.6838
2 14 2 1  .075  -2.590267 9.4186
2 14 1 1    75   4.317488 5.9646
2 14 1 1   7.5   2.014903 6.9404
2 14 1 1  .075  -2.590267 5.8926
2 14 1 1  7500   8.922658  .3944
2 14 2 1  .075  -2.590267 9.3422
2 14 1 1   750   6.620073  2.404
2 14 1 1    75   4.317488 5.7592
2 14 2 1  .075  -2.590267 9.5426
2 14 2 1  7500   8.922658  .2708
end
label values batch run
label values time timelabel

// Question 1: why does this code not work?
replace conc = 0.0075 if conc == 0.0001

// Question 2: How do I restrict the display of the EC50 on the graph to one or two decimal places?
scatter cs ln_conc
nl (cs = {Emax} * conc^{h} / (conc^{h} + {EC50}^{h})) , initial(h 0.5 EC50 100)
predict pr, yhat
scalar EC50 = ln(_b[/EC50])

line pr ln_conc , lcolor(black) sort || ///
scatter cs ln_conc , mcolor(black) mfcolor(white) msize(medium) ///
ytitle("cell survival") ylabel( , angle(horizontal) nogrid) xtitle("log(conc), ug/ml") ///
xline(`=EC50', lcolor(black) lpattern(dash)) ///
title(EC50 `=EC50', size(small) ring(0) position(12)) scheme(s2color) ///
legend(off)

// Question 2: How can I include a covariate, for example age?

// Question 3: Assuming this data were hierarchically structured as follows; batch at the highest level, then plate and finally time
// how would this be incorporated in this example (the Stata documentation is difficult to follow)

/* Question 4:
Having plotted up some pretty graphs, I'm convinced the Hill equation y = 100 * x^{h} / (x^{h} + {EC50}^{h}
is mathematically equivalent to the -log3- equation y = 100 / 1 + exp({h} * (log(x) - log({EC50})))
Letting Emin = a and Emax = b, the Emin parameter appears as shown:
y = a + ((b-a) / 1 + exp({h} * (log(x) - log({EC50}))))
Where would b appear in the Hill equation if we relaxed the assumption that Emin = 0?
*/

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4352
#8

05 Jun 2024, 01:21

Originally posted by Janine Stubbs View Post

// Question 1: why does this code not work?
replace conc = 0.0075 if conc == 0.0001

This comes up on the list from time to time. I think that there's an FAQ on it by now. Try

Code:

replace conc = 0.0075 if float(conc) == float(0.0001)

// Question 2: How do I restrict the display of the EC50 on the graph to one or two decimal places?

Try

Code:

. . . scalar define EC50 = ln(_b[/EC50]) local EC50 : display %3.1f EC50 graph twoway /// line pr ln_conc , lcolor(black) sort || /// scatter cs ln_conc , mcolor(black) mfcolor(white) msize(medium) /// ytitle("cell survival") ylabel( , angle(horizontal) nogrid) xtitle("log(conc), µg/ml") /// xline(`=EC50', lcolor(black) lpattern(dash)) /// title(EC50 `EC50', size(small) ring(0) position(12)) scheme(s2color) /// legend(off)

// Question 2: How can I include a covariate, for example age?

With nl, you have to manually create the linear combination, for example, to include age as a covariate for EC50, try something like the following.

Code:

nl (cs = {Emax} * conc^{h} / (conc^{h} + ({EC50} + age * {EC50age})^{h})) , initial(h 0.5 EC50 100)

// Question 3: Assuming this data were hierarchically structured as follows; batch at the highest level, then plate and finally time
// how would this be incorporated in this example (the Stata documentation is difficult to follow)

You wouldn't use nl, but rather menl. I recommend first digging into that command's user manual entry.

/* Question 4:
Letting Emin = a and Emax = b, the Emin parameter appears as shown:
y = a + ((b-a) / 1 + exp({h} * (log(x) - log({EC50}))))
Where would b appear in the Hill equation if we relaxed the assumption that Emin = 0?

Do you mean where would a (i.e., Emin) appear in the Hill equation? Here's how I'd write it.

Code:

nl (cs = {Emin} + ({Emax} - {Emin}) * conc^{h} / (conc^{h} + {EC50}^{h})), /// initial(Emin 0.1 Emax 100 h 1 EC50 100)
Comment

Announcement