Back-transforming predicted/fitted values in OLS

Dan Palmer

Join Date: Jul 2018

Posts: 71
#1

Back-transforming predicted/fitted values in OLS

16 Oct 2018, 14:58

I've run an OLS regression using a natural log-transformed dependent variable; no predictor variables were transformed. However, I need to be able to express predicted values for individual cases in terms of the original scale of measurement. Can back-transforming the model's predicted values (as given by the predict xb command) be achieved by simple exponentiation of the predicted values? That is, can my desired back-transformation of predicted values be realized as follows:

Code:

gen xb_original_scale=2.718281828459^xb

Some sources (such as this) seem to suggest that this approach is correct, while others (such as this) seem to indicate that things are more complicated.

Last edited by Dan Palmer; 16 Oct 2018, 15:01.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29962
#2

16 Oct 2018, 18:58

If the variation in xb is very small, then just back transforming can be reasonable. But that is not typically the case, and when it is, you probably had no benefit in using a log-transformed model in the first place and you would be better off redoing the analysis without it.

In the more typical case where the variation in xb is appreciable, back transforming gives biased estimates. If you actually need predictions in the original metric of your outcome variable, you should not do a log-linear regression. Instead, go back and do the analysis using -glm- with a log link function. Then -predict, mu- will give you unbiased predictions of the outcome in its untransformed original metric.
1 like
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#3

16 Oct 2018, 19:08

You can take a look at the advice here on Duan smearing.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17678
#4

17 Oct 2018, 00:34

Dan:
see also: https://www.ncbi.nlm.nih.gov/pubmed/15685641

Kind regards,
Carlo
(Stata 19.0)
Comment
Dan Palmer

Join Date: Jul 2018

Posts: 71
#5

18 Oct 2018, 09:27

Thanks to all for your suggestions. In reviewing Gould's (2011) blog entry on the topic, as well as other materials, it sounds like I have three main options:
OLS with back-transformation of predicted values, for which Gould suggests this approach

Code:

predict yhat replace yhat = exp(yhat) replace yhat = yhat*exp(e(rmse)^2/2)

Poisson regression with robust standard errors

GLM with log link

All values of my DV are greater than zero and present a classic positive skew. In this case, it is not clear to me that there is any appreciable difference between options #2 and #3 above. I will continue to investigate a bit to determine whether #1 is justified or if #2/#3 are needed.

Last edited by Dan Palmer; 18 Oct 2018, 10:12.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3001
#6

18 Oct 2018, 13:05

Dear Dan Palmer,

#1 is valid only under homoskedasticity. #2 is likely to be preferred to #3 because it partially accounts for heteroskedasticity and has been shown to be quite reliable.

Best wishes,

Joao
2 likes
Comment
Dan Palmer

Join Date: Jul 2018

Posts: 71
#7

22 Oct 2018, 12:30

Can anyone speak to the correctness of step three in Gould's three-part back-transformation process, as outlined in post #5 above? I'm having difficulty locating an original source that supports this specific correction. For instance, in the much-referenced piece by Cox, Warburton, Armstrong, and Holliday (2007), the authors suggest the following correction:

While these back-transformations clearly are related, they are not the same. Beyond not knowing which of these methods is preferred, I'm also wondering whether the latter is available in Stata as in a straightforward way, perhaps as a scalar or system macro.
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#8

22 Oct 2018, 13:10

Dan, I see no difference between (1) in #5 and #7... both are telling you to calculate predicted values as

Code:

exp(xb + s^2/2)

where xb is the linear predictor and s is the RMSE. Or am I missing something?

Last edited by Andrea Discacciati; 22 Oct 2018, 13:14.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35449
#9

22 Oct 2018, 13:12

That reference seems familiar somehow but I doubt that even its authors would presume that giving journal etc. is dispensable

More crucially, the paper gives other methods too.

Rather than asking for oracular opinions on what's right, perhaps you should explain your model specification, both deterministic and stochastic parts. The solution depends on the problem.
1 like
Comment
Dan Palmer

Join Date: Jul 2018

Posts: 71
#10

22 Oct 2018, 14:51

Andrea - Thank you for clarifying the article excerpt. It wasn't clear to me that the term "variance" was to be implemented as RMSE explicitly, as opposed to some other measure of variance. This is the bridge I needed someone to help me cross.

Nick - Sorry for the incomplete citation. Here it is in full: Cox, N.J., Warburton, J., Armstrong, A., & Holliday, V.J. (2008). Fitting concentration and load rating curves with generalized linear models. Earth Surface Processes and Landforms, 33: 25-39.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#11

25 Oct 2018, 10:49

Take a look at Christopher Baum's -levpredict- post-regress command for an automated approach and these literature references:
Cameron, A.C. Trivedi, P., 2009. Microeconometrics using Stata. Stata Press

Duan, N., 1983. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistica Association 78:605-610.

Nichols, A., 2010. Regression for nonnegative skewed dependent variables. BOS'10 Stata Conference. Accessible from http://repec.org/bost10/nichols_boston2010.pdf

The default is the E[exp(u)] = exp(0.5 sigma^2) method. The Duan option gives you the one I suggested above that makes weaker assumption.
2 likes
Comment

Announcement

Back-transforming predicted/fitted values in OLS

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment