Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    thanks for the clarification in #13, above; given that, you might be interested in reading Choi, G, et al., (2022), "Log-transformation of independent variables: must we?", Epidemiology, 33(6): 843-853

    Comment


    • #17
      Predictors with outlying values have corresponding leverage that may or may not be desirable. Without more information than supplied -- for example, I asked to know more about the distribution in an earlier post, but nothing more has been said -- the only easy advice is to compare results with and without transformations.

      Comment


      • #18
        Mr. Nick,
        I am sorry. I had to leave my work for some reason. Now, I am doing it again. Let me rewrite the issue.
        Following the literature, I have a predictor named logsalesgrowth, which is defined as a log of the change in sales (current year sales - previous year sales).
        I have read somewhere that the minimum value should be added in the log function, i.e.(log (x+minvalue)) to avoid getting undefined values.
        I want to know whether this is the right way to do it or if there is an alternative or right way.
        The following images are showing data distribution of the change in sales(without log).

        Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	26.1 KB
ID:	1687862

        Click image for larger version

Name:	Graph1.jpg
Views:	1
Size:	17.7 KB
ID:	1687863


        Comment


        • #19
          Depends. I used to regard log(x + 1) as a terrible fudge but if values of x are >= 0 if often works well enough and the fact that this function is like x for small x and like log x for large positive x is quite often useful as hybrid behaviour.

          See https://stats.stackexchange.com/ques...vs-fitted-plot for a dataset in which this transformation worked beautifully.

          But, but, but: log (x + c) where c is some data-driven constant is a million miles from that. In your case c needs to be about 1 million and how is it to be chosen without arbitrariness? (Note that your stated recipe leaves the minimum at 0 whereas it must be positive. That may sound trivial but is not.)

          No, as I already suggested neglog (a generalisation of log(x + 1)) should work as you wish to pull in outliers symmetrically and respect sign. asinh() and cube root are also candidates. I don't sense much of a relationship in your data but that is another story.

          Comment


          • #20
            Here is a portfolio of graphs showing various transformations devised for variables of any sign.

            neglog and asinh are sisters under the skin, unsurprisingly give their definitions.

            cube root is a milder transformation

            All of these treat arguments of varying sign symmetrically and so respect the sign.

            log(x + a lot) will stretch your large negative outliers further away while pulling in your large positive outliers. It's likely to be by far the worst choice.

            Code:
            set scheme s1color 
            
            local opts xla(0 -5e5 "-5 x 10{sup:5}" 5e5 "5 x 10{sup:5}" -1e6 "-10{sup:6}" 1e6 "10{sup:6}")
            local opts `opts' ra(-1e6 1e6) yli(0, lc(gs8) lw(vthin)) xli(0, lc(gs8) lw(vthin))
            twoway function sign(x) * log(1 + abs(x)), `opts' ytitle(neglog(x)) name(G1, replace)
            twoway function asinh(x), `opts' ytitle(asinh(x)) name(G2, replace)
            twoway function sign(x) * abs(x)^(1/3), `opts' ytitle(cube root of x) name(G3, replace)
            twoway function log(x + 1e6), `opts' ytitle(log (x + 10{sup:6})) name(G4, replace)
            
            graph combine G1 G2 G3 G4
            Click image for larger version

Name:	compare_transforms.png
Views:	1
Size:	43.6 KB
ID:	1687948

            Comment


            • #21
              Naturally the real problem may be upstream -- that this was never a good measure of growth any way. I have to do that I don't understand where it comes from.

              Comment


              • #22
                Dear Pranshu Tripathi,

                Nick as already provided great advice and, as he suggests, I think the problem is upstream. I do not think it makes sense to model the log of (current year sales - previous year sales); to my mind it would be more natural to model the growth of sales defined as (current year sales / previous year sales) or the log of that if you prefer (which is the difference between the log of sales in the 2 years rather than the log of the difference). This is a more meaningful quantity and has no negative values.

                Best wishes,

                Joao

                Comment


                • #23
                  I trust that everyone is agreed that if sales growth can be negative, or even zero, as it surely can, then

                  1. There is an extreme option of ignoring observations with zero or negative values altogether, which amounts to re-defining the research problem!

                  2. Taking logarithms is out of the question because the logarithm of zero is not defined and the logarithm of negative values is not defined in a way that helps here.

                  Where I am still puzzled is that the graphs in #18 look surprisingly symmetrical in terms of the patterns of extreme high and low values. However, this may be an illusion caused by coincidence and the fact that some positive skewness overall is rather swamped visually by the need to show the entire range.

                  Presumably #18 shows data in currency units but even so the distribution seems surprising to me without a fuller story. Perhaps there is a mixture of very many small enterprises that aren't going to change their sales much with some much larger enterprises that might.

                  Also, we all know that enterprises can go out of business, so does the data include any such?

                  Comment


                  • #24
                    Mr. Nick
                    Thanks for your detailed reply. These firms are all non-financial listed firms on the Bombay Stock Exchange, having complete 8-year data for all the variables.

                    Comment


                    • #25
                      Thanks for that detail, but I have no idea what that implies. How is growth as shown in #18 calculated? That's the question.

                      Comment


                      • #26
                        It is the change in the sales value (million) in the local currency.

                        Comment


                        • #27
                          Originally posted by Joao Santos Silva View Post
                          Dear Pranshu Tripathi,

                          Nick as already provided great advice and, as he suggests, I think the problem is upstream. I do not think it makes sense to model the log of (current year sales - previous year sales); to my mind it would be more natural to model the growth of sales defined as (current year sales / previous year sales) or the log of that if you prefer (which is the difference between the log of sales in the 2 years rather than the log of the difference). This is a more meaningful quantity and has no negative values.

                          Best wishes,

                          Joao
                          Mr. Joao Santos Silva ,Thanks,
                          I would try this but I have to find a reference for it.

                          Comment


                          • #28
                            Return defined as present / previous should be familiar to you as an economist. It is positive so long as both values are and taking the logarithm makes the distribution more symmetric as (0, 1) maps to negative numbers and (1, infinity) to positive numbers. 1 naturally maps to 0.

                            Comment


                            • #29
                              Thanks, Mr. Nick Cox. We generally use these for returns. But for absolute growth, I was not clear.

                              Comment

                              Working...
                              X