Logarithmization of variables results in variable drop

Dan Braun

Join Date: Apr 2016

Posts: 5
#1

Logarithmization of variables results in variable drop

28 Apr 2016, 00:26

After I transformed the values of a variable into (natural) logarithmic values, many nr. of observation dropped.

Code:

generate ln_excess_returns=ln(excess_returns)

Why did this happend? What happens when the data is being transformed into logarithmic values?
I use STATA v14.1
Thank you very much.
Tags: logarithm, regression, variable transformation
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

28 Apr 2016, 00:49

What happens when the data is being transformed into logarithmic values?

Intentionally provocative (but no offense): You should know that if you decide transform your data. At least you should have a reason for doing so that is different from: "everyone else does it".

My guess is that you have zero and/or negative values in excess_returns. The logarithm is only defined for strictly positive values, thus you end up with missing values - not with less observations! You are correct in stating that cases with missing values will be excluded from regression type models, but you did not state that you were running those. I will assume you want something like

Code:

regress ln_excess_returns idepvars

If so, please read Bill Gould's blog entry on the matter.

Best
Daniel
Comment
Dan Braun

Join Date: Apr 2016

Posts: 5
#3

28 Apr 2016, 02:09

Daniel,
thank you for the reply. The article was a great start. And yes you are right, I don't quite know what I'm doing... but learning

Yes I want to run a multiple regression with "excess return" as dependent. According "codebook" this variable has negative values even after i "log" it:

range: [-4.1195941,4.0271358]
std. dev: 1.44402
there are about 3600 values in total...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

28 Apr 2016, 02:41

Naturally. Any positive value less than 1 has a logarithm that is negative. That is expected, and not in itself a problem. Taking your lowest value, note that it corresponds to a return about 0.16:

Code:

. di exp(-4.119594) .01625111

I'd revise logarithms before you make use of them.

The bigger deal is zero and negative values in the original. Some people use sign(x) * ln(1 + abs(x)) as a transformation that behaves like ln(x) for large positive x and like -ln(-x) for large negative x. That doesn't look friendly until you get to know it.

Last edited by Nick Cox; 28 Apr 2016, 02:44.
Comment

Announcement

Logarithmization of variables results in variable drop

Comment

Comment

Comment