Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Natural Logarithm of a variable hving negative value

    As a variable, I have the Natural Logarithm of sales growth (Lnsalesgrowth). But the sales growth can be negative, and then Lnsalesgrowth is not defined. I have more than 1500 firms. So, I am dropping those observations which are undefined. Is this the right way to handle this, or is there any alternative?

  • #2
    Pranshu:
    the first question to pose is: why going ln?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Strictly, logarithms of negative numbers are defined, just as complex numbers but that's not useful statistically, at least here.

      See e;g; ,https://math.stackexchange.com/quest...egative-number

      But we know what you mean. If the motive is to tame a skewed distribution, so-called neglog, namely sign(x) ln(1 + |x|)


      Code:
      sign(x) * log1p(abs(x))
      can be useful as can
      Code:
      asinh(x)
      (economists in particular are probe to call this IHS for inverse hyperbolic sine)

      as can cube root, meaning
      Code:
       sign(x) * abs(x)^(1/3)
      All these transformations have in common

      1. preserving sign

      2. puttling in tails relatively speaking.

      In short, omitting negative values is quite the wrong way to do it. Use the data as they come, or use a transformation fit for purpose.

      Comment


      • #4
        The sales growth range is -378,289 to 488,741 in my dataset. To reduce this variation, I am considering using the log natural of this variable. It is a proxy for investment opportunity in the existing literature. So, I am following the literature.
        Last edited by Pranshu Tripathi; 21 Oct 2022, 08:50.

        Comment


        • #5
          Pranshu:
          if you're going to do a regression on this regressand, why not considering a -glm-, with a log link and gamma family?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            I'd like to see the distribution not just the range. @Carlo Lazzaro's excellent idea does carry with it a presumption that the mean function is positive despite any negative values in the data.
            Last edited by Nick Cox; 21 Oct 2022, 09:20.

            Comment


            • #7
              Nick is obviously correct.
              My preference for -gml- with a log link and gamma family comes from several dreaedful experiences with healthcare cost data logged and then back-transformed via Duan's smear (https://www.jstor.org/stable/2288126) with disappointing results when contrasted against their raw scale.
              This (painful) issue is well covered in https://www.stata.com/bookstore/heal...cs-using-stata , pages 96-99.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                See also "The inverse hyperbolic sine transformation and retransformed marginal effects" by Ed Norton, The Stata Journal (2022) 22, Number 3, pp. 702–712, DOI: 10.1177/1536867X221124553. Examples with respect to health care costs, I recall.

                Comment


                • #9
                  Mr. Lazzaro and Mr. Cox, thanks for your input. I am new to research, so I am unaware of GML . I will explore this and will come back if any query arises.

                  Thanks and Regards

                  Comment


                  • #10
                    Carlo Lazzaro meant GLM -- generalized linear models.

                    Comment


                    • #11
                      Thanks, Mr. Stephen Jenkins. I will go through it.

                      Comment


                      • #12
                        Nick is obviously right again.
                        I'm progressively losing all the letters on my keyboard and sometimes typing the right letter is just probabilistic (time to buy a more decent keyboard!).
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Hey, I forgot to mention that the variable mentioned above is an independent variable.

                          Comment


                          • #14
                            There might be other reasons for GLMs.

                            Comment


                            • #15
                              Pranshu:
                              if the variable you mentioned is a predictor, I would leave it in its original metric, especially if logging produces a remakbale reduction of the original sample.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X