Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Ok again thank you all.

    DepVar is the log of the employed-unemployed rate ratio of native labour

    DepVar = ln(y/(1-y) = ln (N/P-N)

    employment rate of native worker is defined as: y = N/P

    N= native labour
    P = total native workforce

    I tried some of your suggestions and it helps to reduce the coefficients, but they are still too big

    And I'm not sure how to interpet the results.

    Best regards,
    Ruth




    Comment


    • #17
      Nick Cox can you tell me the code of the quantile normal plots you produced?

      I now the command qnorm, but I can not make it that it looks like your plots.

      Thank you.

      Comment


      • #18
        I only used the summarize results for 9 percentiles. So, I was doing what I could with what I could see. The code was very ad hoc and I didn't keep it.

        You can do better with your raw data using qnorm and then graph combine.

        See http://fmwww.bc.edu/repec/usug2016/cox_uksug16.pptx for an overview of quantile plotting in Stata.

        Comment


        • #19
          Here is a relatively painless way to get quantile-normal plots side by side. You need to install multqplot and indeed qplot from the Stata Journal website first.

          Code:
          sysuse auto, clear
          multqplot price mpg weight, trscale(invnormal(@)) xla(-2/2) xtitle("") combine(row(1) b1title(standard normal deviate) l1title("extremes, quartiles and median are labelled"))
          Click image for larger version

Name:	multqplot3.png
Views:	1
Size:	37.7 KB
ID:	1449786



          As Yudi Pawitan emphasised (reference in the presentation linked in #18) a normal quantile plot shows much about a distribution even if a distribution is not remotely close to normal and the idea never even entered your head.

          Comment


          • #20
            Ok great. Thank you.

            Comment


            • #21
              Dear all,

              I am encountering a similar issue. I want to log one of my independent variables which is very skewed - but this variable contains a lot of 0 which are important for my analysis.
              I like the option of taking the squared root instead of the logarithm as suggested by @Mike Lacy. Would you have a reference for this practice?

              Also, an underlying question is: should I worry a lot that my independent variable is skewed? or is it mainly a concern if the dependent variable is skewed?

              Thanks a lot in advance for your help!
              Best regards,
              Jeanne

              [I use Stata 16 for Mac]

              Comment


              • #22
                #21

                There is quite a big difference between transforming a response or outcome and transforming a predictor with a logarithm or similar transformation.

                With a response or outcome there is often (many would say almost always) scope not to transform the response, but to use a model with (in generalized linear model jargon) logarithmic link. That approach has many advantages. For one, a model that is y = exp(Xb) is compatible with some zero or negative outcomes, because the specification is about the mean function, not all the data. Classically a Poisson regression model certainly includes the idea that a count could be zero. Other distributions are compatible with logarithmic link.

                As I understand it asinh and neglog could be link functions for a GLM as they are monotonic and differentiable but I have not seen any work under either heading.

                With a predictor, and contrary to an astonishingly widespread myth, there is no general presumption in modelling that a predictor follows any particular marginal distribution. (Against the particular myth that predictors should be normally distributed. it may be noted that indicator predictors with values say 0 and 1 fail spectacularly to meet that idea.) In practice there remains the question of whether b_j x_j or b_j T(x_j) for some transformation T() is a better idea as a way of capturing a relationship that may be nonlinear. Or it may help a little to tame skewness or subdue outliers in a predictor. Or "theory" may incline the researcher to taking logarithms any way.





                *

                Comment


                • #23
                  Dear Nick,

                  Thank you very much ! This is very helpful. I also found that I could use log (x+1) instead of log(x) and I am considering it as well.

                  Best regards,
                  Jeanne

                  Comment


                  • #24
                    That's just a special case of neglog. as defined in https://www.jstor.org/stable/3592674

                    Comment

                    Working...
                    X