Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log transformation of negative profits. Solutions?

    Hi all,

    I want to use the logarithm of a firm's profits as a dependent variable (see data example). However, I have many firms with negative values, which implies that I would get many missing values after the log transformation. I know that one potential solution is to include a constant. However, I have firms with huge negative profits. Thus, I do not see it as feasible. Do you have other solutions, or am I obliged to pay the price of using a log transformation?

    Thanks in advance for your help.

    Best,
    Kevin

    Code:
    firm year profits_heures
    3 2002 -10.93949
    3 2002 -10.93949
    3 2002 -10.93949
    3 2002 -10.93949
    8 2006    .279665
    8 2006    .279665
    8 2006    .279665
    8 2006    .279665
    8 2007   .9127844
    8 2007   .9127844
    8 2007   .9127844
    8 2007   .9127844
    8 2007   .9127844
    8 2008 -2.7366899
    8 2008 -2.7366899
    8 2008 -2.7366899
    8 2008 -2.7366899
    8 2008 -2.7366899
    8 2008 -2.7366899
    8 2010  -.54344109
    8 2010  -.54344109
    8 2010  -.54344109
    8 2010  -.54344109
    end

  • #2
    Why do you want to use a log transformation for this variable? What purpose to you hope to accomplish by doing that. Whatever the purpose, if it is legitimate at all, there is surely a better way to achieve it--it is hard to imagine a variable less suitable for log transformation. So say what that is, and perhaps a suitable solution will be found.

    Comment


    • #3
      Clyde Schechter asks a fair question. What to do here is much debated, on Statalist and more generally, and it sometimes seems that nobody much likes anybody else's favoured solutions. Here are some:

      0. Leaving your outcome untransformed. Only trying it will show what virtues and vices this has.

      1. If you think your mean function is positive, that is, the mean outcome as a function of predictors, which could be plausible if negative values -- even though sometimes very large negative -- are in a small minority, then generalized linear models with a logarithmic link might work adequately.

      2. The so-called neglog transformation T(y) = sign(y) * log(1 + abs(y)) has the merits that

      it preserves sign, as T(y) is negative, zero, or positive exactly as y is negative, zero or positive

      it behaves like y for y near 0, like log y for y >> 0 and like -log(-y) for y << 0

      it is likely to reduce problems with appreciable skewness, tail weight or outliers.

      However, it does not find universal favour as appearing arbitrary to critics (but "fit for purpose" otherwise).

      3. T(y) = asinh(y) or more generally asinh(k y) has some family resemblance to #2 and comments tend to be similar. (In some literature, it is known as IHS, an abbreviation likely to bemuse or puzzle those who know other uses for that abbreviation or contraction, but IHS is made intelligible as inverse hyperbolic sine.) Note that choice of k > 0 is crucial and defaulting to k = 1 is also a choice.

      Further comments on #2 and #3. Although getting some desired shape for the marginal distribution of T(y), or that of y, is not at all first priority in choosing a method, it's not irrelevant either. I would always

      * plot T(y) versus y for the outcome data to get a sense of whether it behaves sensibly

      * look especially carefully at residuals from any model predicting T(y) from your X variables.

      * invert predictions using the inverse function.

      4. A two-part model predicting profit or loss as binary and magnitude of profit or loss as a non-negative outcome. No experience with this myself, but it's a well-used model in some fields. Even simpler is the possibility that profit and loss define subsets which deserve, or even demand, quite different models.

      5. I've left until last log(y + c) where c is large enough to make all logarithms positive. I rate this easily the worst solution as ad hoc in the worst sense. A plot of log(y + c) versus y is again essential to see what the transform does, and it's often deeply unsatisfactory.

      Simple to state, but harder to satisfy, are that a good approach not only "works" with your data but allows relating your results to those of other studies in a relevant literature. The chicken and egg question is that you would often need to re-do other studies using a particular method to be able to compare. "Do what others have done" is mixed counsel, as the persistence of #5 as a suggestion to me implies that many researchers are not thinking hard enough about what they have done. In particular, plotting the transformation and the results are often neglected steps, especially, it seems, in some branches of economics.

      Comment


      • #4
        I should add -- before say John Mullahy does -- that 1 in sign(y) * log(1 + abs(y)) is not an innocent neutral. If the units are say million USD 1 means something quite different from what it means if the units are USD. And so on.

        Comment


        • #5
          Click image for larger version

Name:	transformation algorithm.png
Views:	1
Size:	26.0 KB
ID:	1721945


          It seems Nick Cox knows where I stand on such matters.

          (I should emphasize that my comment here is meant to be a generic one, not one focused specifically on the issue raised in #1 of the present thread.)
          Last edited by John Mullahy; 26 Jul 2023, 12:34.

          Comment

          Working...
          X