log transformation generating missing data

ruth matko

Join Date: Mar 2018

Posts: 15
#16

19 Jun 2018, 13:47

Ok again thank you all.

DepVar is the log of the employed-unemployed rate ratio of native labour

DepVar = ln(y/(1-y) = ln (N/P-N)

employment rate of native worker is defined as: y = N/P

N= native labour
P = total native workforce

I tried some of your suggestions and it helps to reduce the coefficients, but they are still too big

And I'm not sure how to interpet the results.

Best regards,
Ruth
Comment
ruth matko

Join Date: Mar 2018

Posts: 15
#17

20 Jun 2018, 04:37

Nick Cox can you tell me the code of the quantile normal plots you produced?

I now the command qnorm, but I can not make it that it looks like your plots.

Thank you.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#18

20 Jun 2018, 04:58

I only used the summarize results for 9 percentiles. So, I was doing what I could with what I could see. The code was very ad hoc and I didn't keep it.

You can do better with your raw data using qnorm and then graph combine.

See http://fmwww.bc.edu/repec/usug2016/cox_uksug16.pptx for an overview of quantile plotting in Stata.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#19

20 Jun 2018, 06:54

Here is a relatively painless way to get quantile-normal plots side by side. You need to install multqplot and indeed qplot from the Stata Journal website first.

Code:

sysuse auto, clear multqplot price mpg weight, trscale(invnormal(@)) xla(-2/2) xtitle("") combine(row(1) b1title(standard normal deviate) l1title("extremes, quartiles and median are labelled"))

As Yudi Pawitan emphasised (reference in the presentation linked in #18) a normal quantile plot shows much about a distribution even if a distribution is not remotely close to normal and the idea never even entered your head.
Comment
ruth matko

Join Date: Mar 2018

Posts: 15
#20

20 Jun 2018, 09:05

Ok great. Thank you.
Comment
Jeanne Roche

Join Date: Jul 2021

Posts: 14
#21

09 Dec 2021, 08:04

Dear all,

I am encountering a similar issue. I want to log one of my independent variables which is very skewed - but this variable contains a lot of 0 which are important for my analysis.
I like the option of taking the squared root instead of the logarithm as suggested by @Mike Lacy. Would you have a reference for this practice?

Also, an underlying question is: should I worry a lot that my independent variable is skewed? or is it mainly a concern if the dependent variable is skewed?

Thanks a lot in advance for your help!
Best regards,
Jeanne

[I use Stata 16 for Mac]
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#22

09 Dec 2021, 09:50

#21

There is quite a big difference between transforming a response or outcome and transforming a predictor with a logarithm or similar transformation.

With a response or outcome there is often (many would say almost always) scope not to transform the response, but to use a model with (in generalized linear model jargon) logarithmic link. That approach has many advantages. For one, a model that is y = exp(Xb) is compatible with some zero or negative outcomes, because the specification is about the mean function, not all the data. Classically a Poisson regression model certainly includes the idea that a count could be zero. Other distributions are compatible with logarithmic link.

As I understand it asinh and neglog could be link functions for a GLM as they are monotonic and differentiable but I have not seen any work under either heading.

With a predictor, and contrary to an astonishingly widespread myth, there is no general presumption in modelling that a predictor follows any particular marginal distribution. (Against the particular myth that predictors should be normally distributed. it may be noted that indicator predictors with values say 0 and 1 fail spectacularly to meet that idea.) In practice there remains the question of whether b_j x_j or b_j T(x_j) for some transformation T() is a better idea as a way of capturing a relationship that may be nonlinear. Or it may help a little to tame skewness or subdue outliers in a predictor. Or "theory" may incline the researcher to taking logarithms any way.

*
1 like
Comment
Jeanne Roche

Join Date: Jul 2021

Posts: 14
#23

11 Dec 2021, 11:51

Dear Nick,

Thank you very much ! This is very helpful. I also found that I could use log (x+1) instead of log(x) and I am considering it as well.

Best regards,
Jeanne
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35221
#24

11 Dec 2021, 12:11

That's just a special case of neglog. as defined in https://www.jstor.org/stable/3592674
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment