Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Back-transforming predicted/fitted values in OLS

    I've run an OLS regression using a natural log-transformed dependent variable; no predictor variables were transformed. However, I need to be able to express predicted values for individual cases in terms of the original scale of measurement. Can back-transforming the model's predicted values (as given by the predict xb command) be achieved by simple exponentiation of the predicted values? That is, can my desired back-transformation of predicted values be realized as follows:

    Code:
    gen xb_original_scale=2.718281828459^xb

    Some sources (such as this) seem to suggest that this approach is correct, while others (such as this) seem to indicate that things are more complicated.
    Last edited by Dan Palmer; 16 Oct 2018, 15:01.

  • #2
    If the variation in xb is very small, then just back transforming can be reasonable. But that is not typically the case, and when it is, you probably had no benefit in using a log-transformed model in the first place and you would be better off redoing the analysis without it.

    In the more typical case where the variation in xb is appreciable, back transforming gives biased estimates. If you actually need predictions in the original metric of your outcome variable, you should not do a log-linear regression. Instead, go back and do the analysis using -glm- with a log link function. Then -predict, mu- will give you unbiased predictions of the outcome in its untransformed original metric.

    Comment


    • #3
      You can take a look at the advice here on Duan smearing.

      Comment


      • #4
        Dan:
        see also: https://www.ncbi.nlm.nih.gov/pubmed/15685641
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks to all for your suggestions. In reviewing Gould's (2011) blog entry on the topic, as well as other materials, it sounds like I have three main options:
          • OLS with back-transformation of predicted values, for which Gould suggests this approach
          Code:
          predict yhat
          replace yhat = exp(yhat)
          replace yhat = yhat*exp(e(rmse)^2/2)
          • Poisson regression with robust standard errors
          • GLM with log link
          All values of my DV are greater than zero and present a classic positive skew. In this case, it is not clear to me that there is any appreciable difference between options #2 and #3 above. I will continue to investigate a bit to determine whether #1 is justified or if #2/#3 are needed.




          Last edited by Dan Palmer; 18 Oct 2018, 10:12.

          Comment


          • #6
            Dear Dan Palmer,

            #1 is valid only under homoskedasticity. #2 is likely to be preferred to #3 because it partially accounts for heteroskedasticity and has been shown to be quite reliable.

            Best wishes,

            Joao

            Comment


            • #7
              Can anyone speak to the correctness of step three in Gould's three-part back-transformation process, as outlined in post #5 above? I'm having difficulty locating an original source that supports this specific correction. For instance, in the much-referenced piece by Cox, Warburton, Armstrong, and Holliday (2007), the authors suggest the following correction:


              Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	35.0 KB
ID:	1467019



              While these back-transformations clearly are related, they are not the same. Beyond not knowing which of these methods is preferred, I'm also wondering whether the latter is available in Stata as in a straightforward way, perhaps as a scalar or system macro.

              Comment


              • #8
                Dan, I see no difference between (1) in #5 and #7... both are telling you to calculate predicted values as

                Code:
                exp(xb + s^2/2)
                where xb is the linear predictor and s is the RMSE. Or am I missing something?
                Last edited by Andrea Discacciati; 22 Oct 2018, 13:14.

                Comment


                • #9
                  That reference seems familiar somehow but I doubt that even its authors would presume that giving journal etc. is dispensable

                  More crucially, the paper gives other methods too.

                  Rather than asking for oracular opinions on what's right, perhaps you should explain your model specification, both deterministic and stochastic parts. The solution depends on the problem.

                  Comment


                  • #10
                    Andrea - Thank you for clarifying the article excerpt. It wasn't clear to me that the term "variance" was to be implemented as RMSE explicitly, as opposed to some other measure of variance. This is the bridge I needed someone to help me cross.

                    Nick - Sorry for the incomplete citation. Here it is in full: Cox, N.J., Warburton, J., Armstrong, A., & Holliday, V.J. (2008). Fitting concentration and load rating curves with generalized linear models. Earth Surface Processes and Landforms, 33: 25-39.

                    Comment


                    • #11
                      Take a look at Christopher Baum's -levpredict- post-regress command for an automated approach and these literature references:
                      • Cameron, A.C. Trivedi, P., 2009. Microeconometrics using Stata. Stata Press
                      • Duan, N., 1983. Smearing estimate: A nonparametric retransformation method. Journal of the American Statistica Association 78:605-610.
                      • Nichols, A., 2010. Regression for nonnegative skewed dependent variables. BOS'10 Stata Conference. Accessible from http://repec.org/bost10/nichols_boston2010.pdf
                      The default is the E[exp(u)] = exp(0.5 sigma^2) method. The Duan option gives you the one I suggested above that makes weaker assumption.

                      Comment

                      Working...
                      X