Natural Logarithm of a variable hving negative value

Pranshu Tripathi

Join Date: Oct 2022

Posts: 62
#1

Natural Logarithm of a variable hving negative value

21 Oct 2022, 08:01

As a variable, I have the Natural Logarithm of sales growth (Lnsalesgrowth). But the sales growth can be negative, and then Lnsalesgrowth is not defined. I have more than 1500 firms. So, I am dropping those observations which are undefined. Is this the right way to handle this, or is there any alternative?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

21 Oct 2022, 08:10

Pranshu:
the first question to pose is: why going ln?

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#3

21 Oct 2022, 08:42

Strictly, logarithms of negative numbers are defined, just as complex numbers but that's not useful statistically, at least here.

See e;g; ,https://math.stackexchange.com/quest...egative-number

But we know what you mean. If the motive is to tame a skewed distribution, so-called neglog, namely sign(x) ln(1 + |x|)

Code:

sign(x) * log1p(abs(x))

can be useful as can

Code:

asinh(x)

(economists in particular are probe to call this IHS for inverse hyperbolic sine)

as can cube root, meaning

Code:

sign(x) * abs(x)^(1/3)

All these transformations have in common

1. preserving sign

2. puttling in tails relatively speaking.

In short, omitting negative values is quite the wrong way to do it. Use the data as they come, or use a transformation fit for purpose.
3 likes
Comment
Pranshu Tripathi

Join Date: Oct 2022

Posts: 62
#4

21 Oct 2022, 08:47

The sales growth range is -378,289 to 488,741 in my dataset. To reduce this variation, I am considering using the log natural of this variable. It is a proxy for investment opportunity in the existing literature. So, I am following the literature.

Last edited by Pranshu Tripathi; 21 Oct 2022, 08:50.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#5

21 Oct 2022, 09:07

Pranshu:
if you're going to do a regression on this regressand, why not considering a -glm-, with a log link and gamma family?

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#6

21 Oct 2022, 09:14

I'd like to see the distribution not just the range. @Carlo Lazzaro's excellent idea does carry with it a presumption that the mean function is positive despite any negative values in the data.

Last edited by Nick Cox; 21 Oct 2022, 09:20.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#7

21 Oct 2022, 09:26

Nick is obviously correct.
My preference for -gml- with a log link and gamma family comes from several dreaedful experiences with healthcare cost data logged and then back-transformed via Duan's smear (https://www.jstor.org/stable/2288126) with disappointing results when contrasted against their raw scale.
This (painful) issue is well covered in https://www.stata.com/bookstore/heal...cs-using-stata , pages 96-99.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1425
#8

21 Oct 2022, 10:06

See also "The inverse hyperbolic sine transformation and retransformed marginal effects" by Ed Norton, The Stata Journal (2022) 22, Number 3, pp. 702–712, DOI: 10.1177/1536867X221124553. Examples with respect to health care costs, I recall.
3 likes
Comment
Pranshu Tripathi

Join Date: Oct 2022

Posts: 62
#9

21 Oct 2022, 10:08

Mr. Lazzaro and Mr. Cox, thanks for your input. I am new to research, so I am unaware of GML . I will explore this and will come back if any query arises.

Thanks and Regards
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#10

21 Oct 2022, 10:11

Carlo Lazzaro meant GLM -- generalized linear models.
Comment
Pranshu Tripathi

Join Date: Oct 2022

Posts: 62
#11

21 Oct 2022, 10:15

Thanks, Mr. Stephen Jenkins. I will go through it.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#12

21 Oct 2022, 10:16

Nick is obviously right again.
I'm progressively losing all the letters on my keyboard and sometimes typing the right letter is just probabilistic (time to buy a more decent keyboard!).

Kind regards,
Carlo
(Stata 19.0)
Comment
Pranshu Tripathi

Join Date: Oct 2022

Posts: 62
#13

21 Oct 2022, 14:18

Hey, I forgot to mention that the variable mentioned above is an independent variable.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#14

21 Oct 2022, 14:21

There might be other reasons for GLMs.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#15

22 Oct 2022, 01:54

Pranshu:
if the variable you mentioned is a predictor, I would leave it in its original metric, especially if logging produces a remakbale reduction of the original sample.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Announcement

Natural Logarithm of a variable hving negative value

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment