Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logaristhm vs. gamma log link

    Dear Stata Experts,

    I calculate with a variable which is highly skewed to the right. (Skewness 4.910018; Kurtosis 37.58021; 50% .15802 Mean .4333288).

    What I did because of that was to use the natural log of the variable. When I plotted the model via marginsplot this resulted in a partially negative y scale which for the topic does not make sense and the scale of course is not interpretable.

    The code reads as follows:

    svy: reg lndepvar i.var1##i.var2
    margins, over(var 1 var2) predict(xb)
    marginsplot


    The second code reads as follows:

    svy: glm depvar i.var1##i.var2, family(gamma) link(log)
    margins, over(var 1 var2) predict(xb) vsquish
    marginsplot


    The problem is: With the glm code, I get the same result as if I use the first code but with the original variable which is not log transformed. The estimators however, differ between the models. it is just the plots.
    Is there anything wrong with the margins command in my second code?
    What would you recommend? Should I use the natural log and waive the interpretable scale or should i use gamma log link.


    Very happy for your help.
    Thank you so much!!
    Last edited by anne jagdberg; 01 Mar 2022, 13:51.

  • #2
    With the -glm- model, if you want youru -margins- results to be in the metric of depvar, use -predict(mu)-, not -predict(xb)-, in your -margins- command.

    As an aside, are you sure you want to use -over(var1 var2)- in your -margins- command? This produces predicted margins conditional on the values of var1 and var2, and each of the results is calculated using only the subset of the data with the values of var1 and var2 shown in the row stub of the output. In particular, these conditional predicted margins are not adjusted for the confounding effects of other variables. These are perfectly legitimate statistics, but are usually not what people want. Most of the time, people want predicted margins that are calculated from the entire data set and adjusted for the values of everything in the data set. To do that, it's -margins var1 var2, predict(mu) vsquish-.

    Comment


    • #3
      Thank you for yoir reply!

      If I use predict(mu) still the marginsplot shows me the same one as if I do not log transform it. Sorry I dont understand :-(

      To your second question: I am not sure... My variable var1 ranges from 16 to 30 and var2 from 1 to 5. I saw a code which did the same what I am doing, this was:
      margins, at(var1=(16(1)30) var2=(1(1)5)) vsquish

      is there a difference to my code margins, over(var 1 var2) predict(xb) vsquish?

      Thank you very much!

      Comment


      • #4
        If I use predict(mu) still the marginsplot shows me the same one as if I do not log transform it. Sorry I dont understand :-(
        I think you need to show the graphs, with the accompanying regression code, so we can see what you are referring to.

        My variable var1 ranges from 16 to 30 and var2 from 1 to 5. I saw a code which did the same what I am doing, this was:
        margins, at(var1=(16(1)30) var2=(1(1)5)) vsquish

        is there a difference to my code margins, over(var 1 var2) predict(xb) vsquish?
        Yes, there is a difference. But first, let's clarify something: are your variables var1 and var2 continuous or discrete. The code using var1##var2 implies they are discrete. If you intend them to be continuous, you have to tell Stata that by writing c.var1##c.var2. In an interaction term, any unprefixed variable is treated as discrete.

        Assuming var1 and var2 are discrete
        Code:
        margins, at(var1=(16(1)30) var2=(1(1)5))
        is equivalent to
        margins var1#var2
        
        but, in general, different from
        
        margins, over(var1 var2)
        In a model that has no other variables, then -margins, over(var1 var2)- will be the same as the other two. But if there are any other variables in the regression, they are different.

        Comment


        • #5
          OK thank you, I attach the graphs
          - with gamma link log
          - with regression and the original variable
          - with regression and the ln variable


          I would like the variables to be treated as categorial ones. Thank you for your question and the information about writing c.var1!
          As you can see, there are no other variables.





          My codes look as follows:


          ************************************************** ******************************************
          ** gamma distribution
          svy: glm sports i.cohort i.AGE i.AGE#i.cohort if sports>0, family(gamma) link(log)
          margins, over(AGE cohort)
          marginsplot, title("Cohort, age, sports") xtitle("Age") ///
          ytitle("Sports (Gamma)") ///
          plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
          plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
          plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
          plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
          plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
          ci1opts(lcol(gs0)) ///
          ci2opts(lcol(gs5)) ///
          ci3opts(lcol(gs8)) ///
          ci4opts(lcol(gs11)) ///
          ci5opts(lcol(gs14))
          graph export gammalinkall_sports_across_age.tif, replace

          ** without log
          svy: reg sports i.cohort i.AGE i.AGE#i.cohort if sports>0
          margins, over(AGE cohort)
          marginsplot, title("Cohort, age, sports") xtitle("Age") ///
          ytitle("sports)") ///
          plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
          plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
          plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
          plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
          plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
          ci1opts(lcol(gs0)) ///
          ci2opts(lcol(gs5)) ///
          ci3opts(lcol(gs8)) ///
          ci4opts(lcol(gs11)) ///
          ci5opts(lcol(gs14))
          graph export all_sports_across_age.tif, replace
          */

          ** all ln sports
          svy: reg lnsports i.cohort i.AGE i.AGE#i.cohort if sports>0
          margins, over(AGE cohort)
          marginsplot, title("Cohort, age, ln sports") xtitle("Age") ///
          ytitle("Ln(sports)") ///
          plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
          plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
          plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
          plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
          plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
          ci1opts(lcol(gs0)) ///
          ci2opts(lcol(gs5)) ///
          ci3opts(lcol(gs8)) ///
          ci4opts(lcol(gs11)) ///
          ci5opts(lcol(gs14))
          graph export all_lnsports_across_age.tif, replace

          ************************************************** ******************************************


          Attached Files

          Comment


          • #6
            I have an amendment... I was trying this and that to solve the problem and what I now found was.

            glm sports i.cohort if drinker==1, family(gamma) link(log)
            margin cohort, atmeans

            glm sports i.cohort if drinker==1
            margin cohort, atmeans

            In both cases, it delivers the same results for margins, which I do not understand.
            Can anyone explain that?

            If I examine the "original" values (via mean sports, over(cohort)) the values are different (which I understand, I do not understand that the above mentioned codes deliver the same results).


            This is similar to the problem above, that with gamma link the plots are the same as if I do not use the logarithm.

            Thank you!

            Comment


            • #7
              It's the job of the generalized linear models you asked for to fit and report the means of the outcome for each category. A glm is a more or less fancy variation on

              mean outcome | predictors

              What differs is just what kind of uncertainty is expected around the mean function.


              Code:
              sysuse auto, clear
              
              glm mpg i.foreign
              predict raw
              
              glm mpg i.foreign, link(log)
              predict log
              
              glm mpg i.foreign, f(gamma)
              predict gamma
              
              tabdisp foreign, c(raw log gamma)
              
              ----------------------------------------------------------------------
              Car       |
              origin    | Predicted mean mpg  Predicted mean mpg  Predicted mean mpg
              ----------+-----------------------------------------------------------
               Domestic |           19.82692            19.82692            19.82692
                Foreign |           24.77273            24.77273            24.77274
              ----------------------------------------------------------------------
              There is a tiny amount of numeric noise there, but in principle all those means should be considered identical.
              Last edited by Nick Cox; 02 Mar 2022, 09:50.

              Comment

              Working...
              X