Inverse hyperbolic sine (IHS) transformation allowing for non-normality

Aarushi Dhingra

Join Date: Sep 2019

Posts: 40
#1

Inverse hyperbolic sine (IHS) transformation allowing for non-normality

22 Nov 2021, 20:41

Dear user,

I want to run a basic Tobit regression, I have a continuous outcome variable of medical cost and I have transformed this using the command 'asinh' in Stata, lets call this IHS cost. I was wondering if there is a user-written command so I can estimate the following equation, similar to equation 11 in Brown, Greene, Harris & Taylor (2015), please see the link to the paper (https://ideas.repec.org/a/eee/ecmode...cp228-236.html).
Where y is the IHS cost. In particular, I would like to estimate the highlighted gamma parameter.
Is there a way I am able to commute this?

Thanks for your help!

Kind Regards,
Aarushi
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#2

23 Nov 2021, 00:13

Dear Aarushi Dhingra,

I cannot see the gamma parameter you refer to, but my advice is that you ignore the Tobit with IHS transformation and just use Poisson regression with robust standard errors. That is a standard way of modelling medical costs and it is much more robust that what you are trying to do.

Best wishes,

Joao
2 likes
Comment
Aarushi Dhingra

Join Date: Sep 2019

Posts: 40
#3

23 Nov 2021, 15:55

Dear Joao Santos Silva

Sorry about that, I tried to upload a picture of the transformation but it did not seem to go through. Please see the attached picture again.

I was under the impression that Possion regressions can only be used for binary or count data not for continuous data, perhaps I am wrong?

What I am trying to do is similar to this post on the Stata forum- https://www.statalist.org/forums/for...rmation-by-mle

Not sure if it will be the same for Tobit.

Thanks for your help!

Aarushi
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

23 Nov 2021, 21:16

I was under the impression that Possion regressions can only be used for binary or count data not for continuous data, perhaps I am wrong?

You are, indeed, wrong. While nobody is conspiring to limit knowledge of this to some select elite, it is one of the "best kept secrets" of statistics and is often the ideal solution for dealing with highly skew data. It is a particularly good solution to the problem that often arises whereby one is tempted to do a log transform of the outcome variable, but is deterred from doing so by the presence of zeroes or negative numbers.
1 like
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#5

24 Nov 2021, 00:15

Dear Aarushi Dhingra,

As Clyde noted above, Poisson regression can be used for any kind of non-negative data with no upper limit, and its use to model medical expenditures was pioneered by John Mullahy, who often contributes to this forum. The big advantage of Poisson regression over using the IHS transformation is that with Poisson regression the interpretation of the estimates is very simple, whereas with the IHS transformation the interpretation of the results is much less intuitive. Poisson regression is also much better if you want to use the results for prediction. Therefore, I reiterate that my recommendation is that you use Poisson regression with robust standard errors.

Best wishes,

Joao
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 751
#6

24 Nov 2021, 08:21

I would just add to Clyde's and Joao's helpful comments in #4 and #5 that it may be helpful to reflect on what feature(s) of your data you wish to model. While this is your decision and yours alone, in many instances what is desired is a robust estimate of the conditional mean of the outcome, E[y|x], and nothing more than that. (This is a point that has been emphasized in a number of contexts by Jeff Wooldridge.)

If so then one key feature of E[y|x] when dealing with non-negative outcomes (like medical costs) is that it must be positive. The most straightforward (though certainly not the only) way to enforce this is to specify E[y|x]=exp(xb) and a straightforward and robust way to estimate such a model is to use Poisson regression. Importantly this avoids any transformation of the outcome variable.

In some disciplines working with Generalized Linear Models (GLMs) is perhaps more familiar. If so, then specifying a log-link in the GLM context corresponds to the specification E[y|x]=exp(xb). If in addition one specifies a poisson family then this is equivalent to Poisson regression, i.e.

Code:

poisson y x1 x2, vce(robust)

is equivalent to

Code:

glm y x1 x2, fam(poisson) link(log) vce(robust)
2 likes
Comment
Aarushi Dhingra

Join Date: Sep 2019

Posts: 40
#7

26 Nov 2021, 16:51

Thanks for the advice Joao, Clyde, and John. Much appreciated.

Kind regards,
Aarushi
Comment
Yodefia Rahmad

Join Date: May 2022

Posts: 3
#8

14 Apr 2025, 03:24

Originally posted by Clyde Schechter View Post

You are, indeed, wrong. While nobody is conspiring to limit knowledge of this to some select elite, it is one of the "best kept secrets" of statistics and is often the ideal solution for dealing with highly skew data. It is a particularly good solution to the problem that often arises whereby one is tempted to do a log transform of the outcome variable, but is deterred from doing so by the presence of zeroes or negative numbers.

Dear Clyde Schechter

This is a rather old thread but I'm going to shoot my shot anyway: Do you have any literature I can read up about using poisson for such (continuous) data? The general "handbook" I've been referring to such as Wooldridge's Econometric Analysis of Cross Section and Panel Data & Verbeek's A Guide to Modern Econometrics have only introduced the use of possion (and negative binomial) for count data.

Thanks!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#9

14 Apr 2025, 03:49

https://blog.stata.com/2011/08/22/use-poisson- rather-than-regress-tell-a-friend/ is relevant. In my reading it's still true that this point is well known but not widely explained.
1 like
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#10

14 Apr 2025, 06:59

Dear Yodefia Rahmad,

Standard references for this include

Willard G Manning, John Mullahy, 2001, Estimating log models: to transform or not to transform?, Journal of Health Economics, 20(4), pp. 461-494

and

J.M.C. Santos Silva and Silvana Tenreyro, 2006, The Log of Gravity, The Review of Economics and Statistics, 88(4), pp. 641-658.

The 8th edition of Wooldridge's introductory textbook also covers the topic.

Best wishes,

Joao
Comment

Announcement

Inverse hyperbolic sine (IHS) transformation allowing for non-normality

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment