Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exponential Regression: nl vs. reg

    I would like to fit an exponential decay function: y=A*exp(b*x)

    I thought the best method would be to use the "nl" command, as in:
    nl (y={A}*(exp({b}*x)))

    But an alternative method should be to take the log of y first and the run a simple linear regression.
    Because
    ln(y)=ln(A*exp(b*x))
    ln(y)=ln(A)+ln(exp(b*x))
    ln(y)=ln(A)+b*x

    Therefore, using
    gen ln_y=ln(y)
    reg ln_y x

    should get me the same result.

    But I don't get the same estimates of "b" under both approaches.

    I am not very familiar with the "nl" command. So maybe I am not using it correctly?

    Thanks

  • #2
    Nothing says you should get the same result. The log of the expectation is not the expectation of the log. You need to start with a model that includes an error term. Say, a multiplicative error term in the exponential model. If that error is independent of x then the two methods are both consistent for b. But they won't be the same. Without independence they have different probably limits. You need to decide what you're wiling to assume.

    Comment


    • #3
      The two models are not equivalent because you have overlooked the role of error terms. The actual equation that -nl- estimates is

      y = {A} * exp({b}*x) + e

      And -nl- finds the values of A and b that minimize the sum of e^2.

      Crucially, the error term is additive. So when you take logarithms, you don't get

      ln_y = ln(A) + b*x + e. You get ln_y = ln(A*exp(b*x)+e), which does not simplify in closed form.

      The model you are estimating using -reg ln_y x- would be equivalent to the original y = {A} * exp({b}*x) * e, with a multiplicative error.

      Comment


      • #4
        Dear Kevin,

        There are few discussions in which I would have anything to add after Jeff and Clyde contributed, but I guess this is one of those very rare cases.

        As Jeff pointed out, the two methods will generally lead to different estimates, which in general do not even have the same probability limit. So, you need to decide whether you are interested to learn about the effects of x on (the conditional mean of) y or on (the conditional mean of) ln(y).

        In case you want to learn about the conditional expectation of y given x, you have to estimate the model in its multiplicative form. However, using -nl- may not be the best option because there is ample evidence that the non-linear least squares estimator can be very inefficient in this context. A much safer approach it to use Poisson regression with robust standard errors, as advocated here (see also here). Whether estimating a model in logs or in levels makes a material difference is very much an empirical question, but there are well-known cases where the difference can be substantial.

        Finally, Clyde's discussion of the two models is not entirely correct. Indeed, what -nl- estimates is

        y = {A}*exp({b}*x) + e

        but this model can also be written as a model with a multiplicative error because

        y = {A}*exp({b}*x) + e = {A}*exp({b}*x)*u

        with u = 1 + e/({A}*exp({b}*x)).

        So, your model in logs will be ln_y = ln(A) + b*x + ln(u). The problem is that in general the conditional expectation of ln(u) is not a constant, and therefore OLS is likely to be inconsistent for b. The paper I mentioned above contains a detailed discussion of this problem.

        All the best,

        Joao

        Comment


        • #5
          Thank you Jeff, Clyde and Joao.

          I am working with population density gradients. My intuition had been to model the effect of x (distance) on the conditional mean of y (population density). But it seems the standard practice in my field is to model ln(y) and that I wrote the initial equation wrong.
          This paper (page 16) writes the formula as equivalent to y={A}*exp({b}x*e) with the error within the exponential term, so that
          y={A}*exp({b}x*e)
          ln(y)=ln{A} + ln(exp({b}x)) + ln(exp(e))
          ln(y)=ln{A} + {b}x + e

          This is just for the summary statistics part of my paper, so I would prefer not delve into a mathematical debate on this issue with the existing literature, but do you see any problems with that formulation?

          Thanks again.
          Kevin

          Comment


          • #6
            Dear Kevin,

            The problem with that formulation is that A and b are parameters of the conditional expectation of ln(y) given x, but in general are not parameters of the conditional expectation of y given x. Suppose e is heteroskedastic; then E[y|x] = {A}*exp({b}x) * E[exp(e)|x], which is not equal to {A}*exp({b}x) because E[exp(e)|x] is a function of x.

            As Jeff pointed out above, you need to decide which of the two conditional expectations is of interest. In case you really care about E[y|x], then the model in logs may be misleading.

            All the best,

            Joao

            Comment


            • #7
              Kevin: My 1992 IER paper, "Some Alternatives to the Box-Cox Regression Model," also discusses what Joao has highlighted above: that a model of a conditional mean for a nonnegative response can be written with an additive or multiplicative error. The models are equivalent unless one imposes some extra assumption, such as the error is independent of the covariates. I would prefer the exponential model estimating using the Poisson or gamma quasi-MLEs.

              Comment


              • #8

                hi. am new to stata and so kindly forgive my naivety. am trying to run an exponential fit and i cant figure out the code for the same. kindly help. thanks

                Comment


                • #9
                  if you look at the help for "nl" you will see several example of "common" models including the exponential; if this is not what you are looking for, please read the FAQ and then clarify your question

                  Comment


                  • #10
                    Another possibility is if you are trying to do is fit an exponential time to failure model. In that case, -help streg-.

                    Comment

                    Working...
                    X