Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal effects with log-transformed dependent variables

    Dear Stata user,

    I have dollar amounts as my dependent variable. This variable has about 40% zeros and is positively skewed. I log transformed it by adding 1 (i.e. new_outcome=ln(1+old_outcome) ). I ran a two-stage `churdle` model and generated marginal effects as recommended in the Stata manual. My main predictor of interest is a dichotomous variable (it is either 0 or 1). Because my outcome is log-transformed I am a bit perplexed as to how to interpret the effect of my main variable of interest summarized below. Would it be correct to say that when the X_var is equal 1 the dollar amount in the outcome variable is 67% percent higher?

    Code:
        
    
     margins, dydx(X_var)
    
                            
        Delta-method
               dy/dx     Std. Err.    z       P>z          [95% Conf.    Interval]              
    X_var    .6697905   .2090674    3.07    0.002          .2314362    1.045223
    Any advice would be greatly appreciated. This seems like a rudimentary question but I have not been able to find a definitive answer.

  • #2
    Michael: A more straightforward approach that allows you to interpret your findings in terms of effects on dollars instead effects on log-dollars is to use the glm procedure. There is no need to treat zeros specially, and no need to transform. The basic idea is that the log of the conditional mean is linear in covariates and parameters, i.e.
    Code:
    E[y|X_var]=exp(a+b*X_var)
    Here's a basic template:
    Code:
    glm y X_var, link(log) fam(gamma) robust
    margins, dydx(X_var)
    In this instance the intepretation of the margins result is the change in the expected value of y (conditional on X_var) due to a change in X_var.

    Comment


    • #3
      John,

      Thank you so much for your advice. I used the code you suggested with not log-transformed dollar amount as my outcome variable and the marginal effect of the X_var is 6.30. Would it be correct to say in this case that when X_var=1, the expected value of y is higher by $6.3?

      If I may, I would like to ask a few other questions.
      • I have an intuitive understating of how `churdle` model works, but i have never used the `glm` approach you suggested. What source do you think I could refer to for my own understanding and to cite in my paper if I wanted to use the `glm` model?
      • If I were to continue on with the `chrudle` model, how could I interpret the marginal effect?
      • What criteria should I use to decide which model, `churdle` or `glm` is a better choice with my data?
      Thank you once again for your help.

      Comment


      • #4
        Michael:

        1. When using -margins- the reported results are averages over the sample of the partial derivative/difference. So if you had other covariates you would be averaging over them as well. You can instruct -margins- to evaluate the derivatives/differences at particular x-values if that is of interest to you, using the at(...) option, e.g.
        Code:
        glm y x1 x2, ...
        margins, dydx(*) at(x1=1 x2=7)
        2. This paper might be a reasonable starting point to learn about GLM in a context similar to the one you're studying: https://uwmadison.box.com/s/2e29jvgw...d03i3jqbj0o5oo

        3. I think the churdle default -margins- option, ystar, is interpreted the same way as with GLM. You also might want to take a look at -twopm-, a user-contributed command. Type "help twopm"

        4. As for "better choice" that really depends on what is your objective. One common objective might be superior out-of-sample prediction performance, in which case a cross-validation exercise might be informative. But "better" is really up to you to define.

        John

        Comment


        • #5
          Thank you for your answer, John, it is very helpful and saved me a lot of time!

          Comment

          Working...
          X