Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inverse hyperbolic sinc transformation

    Hi,

    I have been suggested to use hyperbolic sine transformation instead of log(0+1)

    I have tried 'asinh' for that.
    My question is, how do I interpret the coefficient?, for instance, if the coefficient if 0.324, can I interpret it the same way as a log?

  • #2
    Dear Farzana Misha,

    Can you please tell us more about your data and why you want to transform it?

    Best wishes,

    Joao

    Comment


    • #3
      My outcome var is the log of income and it does include a number of 0 s and I was suggested to try the hyperbolic sine transformation instead of log(0+1). I never tried it before and was wondering how to do it and eventually find out whether I should really do it.

      Comment


      • #4
        EDIT: Crossed with #3 which confirms my guess.

        I guess that log(0 + 1) means here log(x + 1). If so, the rationale may be that you want to use logarithms on an outcome or response variable but are frustrated by some zeros in the data.

        What I guess that Joao Santos Silva is driving towards is that it is better to use some approach which (in generalized linear model jargon) is based on the use of a logarithmic link, where the assumption in essence is that mean outcome, conditional on the predictors, is positive, which is consistent with some of the values being zero.

        Otherwise in my view asinh deserves some consideration if you have an outcome or predictor that is variously negative, zero and positive with some extreme values in one or other tail (and it's not just a matter of small occasional zeros or a few freakish small negatives). Firm profit and loss can be a good example.

        But asinh can't be interpreted as if it were a logarithm because it isn't, although for large arguments there is some qualitative similarity.

        Knowing how the function is defined and how it behaves are essential. I have a fleeting memory of it being mentioned briefly late in my secondary school education and only rarely bumping into it since, except in this context.

        This graph may help, so long as the behaviour of log x as x approaches 0 from above is understood.

        Code:
        twoway function asinh(x), ra(-40 40) || function log(x), ra(0.01 40) legend(pos(11) ring(0) col(1) order(1 "asinh" 2 "log")) ytitle(transformed) xtitle(argument) yla(, ang(h)) yli(0, lstyle(grid)) xli(0, lstyle(grid))



        Attached Files
        Last edited by Nick Cox; 16 May 2020, 06:23.

        Comment


        • #5
          Thanks a lot, Nick! Honestly, it helped a lot to understand the overall concept and why and when it can be utilized!

          Comment


          • #6
            Thanks for the thanks, but watch out.

            The use of transformations evokes a range of reactions among experienced statisticians and data analysts, from those who will use them very willingly to those who (almost) never use them, on various grounds. I am nearer one end than the other, but people here might disagree with some or all of #4.

            The great merit of link functions rather than transformations of a response is getting predictions that relate to the scale of the response, which is what both researchers and practical people (should) care about.

            However, much comes from experience and I daresay that many people regard log income as quite as natural a scale as income.

            Comment


            • #7
              Dear Farzana Misha,

              Thank you for the additional information. Nick's explanation above is very clear and, like he predicted, my suggested solution is that you do not transform the data at all and simply estimate a model with an exponential conditional mean. You can do this simply by using Poisson regression and in that case the interpretation is exactly like in a model where you take logs of the dependent variable, with the advantage that you do not have to drop the zeros. Using log(x + 1) or the hyperbolic sine transformation will produce parameters that are very difficult to interpret in a meaningful way.

              Best wishes,

              Joao

              Comment


              • #8
                Thanks a lot, guys! its been very helpful (especially in terms of moral support!)

                Comment


                • #9
                  Regarding interpretation of the asinh() transformation, you might be interested in the article by Bellamare & Wichman in Oxford Bulletin of Economics and Statistics, 2020 (open access): https://onlinelibrary.wiley.com/doi/...111/obes.12325
                  I am definitely not disagreeing with the advice provided by Joao and Nick -- it's very good -- but the article may help enrich your learning experience, so to speak. (It would be more relevant to your case if the outcome variable included some negative values.)

                  Comment


                  • #10
                    Hello everyone, I have a question regarding the transformation of the independent variable X. Specifically, the distribution is right-skewed, with many small values and a few larger ones. To be specific, the minimum is zero, the maximum is 32, and the mean is 0.13. Is there any transformation that could help in this case?

                    Comment


                    • #11
                      My answer to #10 is there might be a helpful transform, but choosing one depends on information you don't give.

                      If there is just one independent variable (*) (I say predictor), then show us a scatter plot and tell us what model you have in mind, possibly but not necessarily regression. It may be that a transform will help.

                      In the general case the predictor you're focusing on is just one among others.

                      Either way, there is no principle that skewness in a predictor is problematic; otherwise almost all (0, 1) indicator predictors would qualify as problematic.

                      There is, or should be, in practice some inclination to consider transforming a predictor if using some transform say T() of predictor X_1 gets you closer to the desired functional form.

                      So perhaps

                      link(Y) = b_0 + b_1 T(X_1) + b_2 X_2 + b_3 X_3 + ...

                      is a better idea than

                      link(Y) = b_0 + b_1 X_1 + b_2 X_2 + b_3 X_3 + ...

                      where link() could be identity(), log(), logit(), and so on. The same applies to any other X, here X_2, X_3, ....

                      The idea that predictors should be symmetrically distributed (stronger version: normally distributed) seems to come from nowhere in particular but to be discernible almost everywhere as an impression that researchers have, over several different fields.

                      Does it come from any textbooks? If so, please give references. Or courses? If so, name your teachers and where they teach.

                      My guess is that it comes mostly from papers in some literatures making much use of transformations, imparting an idea that this is what should be done. So a meme feeds on attention and breeds, as memes do.

                      I'm more positive than many people here about transformations but suggest that the biggest deal is getting closer to the functional form of your model.

                      (*) A riff on talking about dependent and independent variables:

                      The positive case for doing this seems to be

                      P1. Many people learn these terms quite early in their education, in mathematics and/or science.

                      P2. This continues into many texts, courses and literatures.

                      The negative case -- implying abandoning or avoiding these terms -- seems to be

                      N1. The terms are overloaded in statistics, as senses of dependence and independence crop up in various contexts in (probability and) statistics.

                      N2. The words are so similar that many people just get them the wrong way round.

                      N3. There are just many more evocative terms available, to choose according to taste, tradition or tribal habit. Response, outcome, ... and explanatory variable, predictor, ,,, and many more. (I've no quarrel, naturally, with a preference for treating predictor as a term to denote Xb. Just choose a term and explain it or show what you mean with examples.)


                      Comment

                      Working...
                      X