Logaristhm vs. gamma log link

anne jagdberg

Join Date: Feb 2018

Posts: 28
#1

Logaristhm vs. gamma log link

01 Mar 2022, 12:39

Dear Stata Experts,

I calculate with a variable which is highly skewed to the right. (Skewness 4.910018; Kurtosis 37.58021; 50% .15802 Mean .4333288).

What I did because of that was to use the natural log of the variable. When I plotted the model via marginsplot this resulted in a partially negative y scale which for the topic does not make sense and the scale of course is not interpretable.

The code reads as follows:

svy: reg lndepvar i.var1##i.var2
margins, over(var 1 var2) predict(xb)
marginsplot

The second code reads as follows:

svy: glm depvar i.var1##i.var2, family(gamma) link(log)
margins, over(var 1 var2) predict(xb) vsquish
marginsplot

The problem is: With the glm code, I get the same result as if I use the first code but with the original variable which is not log transformed. The estimators however, differ between the models. it is just the plots.
Is there anything wrong with the margins command in my second code?
What would you recommend? Should I use the natural log and waive the interpretable scale or should i use gamma log link.

Very happy for your help.
Thank you so much!!

Last edited by anne jagdberg; 01 Mar 2022, 12:51.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Mar 2022, 12:50

With the -glm- model, if you want youru -margins- results to be in the metric of depvar, use -predict(mu)-, not -predict(xb)-, in your -margins- command.

As an aside, are you sure you want to use -over(var1 var2)- in your -margins- command? This produces predicted margins conditional on the values of var1 and var2, and each of the results is calculated using only the subset of the data with the values of var1 and var2 shown in the row stub of the output. In particular, these conditional predicted margins are not adjusted for the confounding effects of other variables. These are perfectly legitimate statistics, but are usually not what people want. Most of the time, people want predicted margins that are calculated from the entire data set and adjusted for the values of everything in the data set. To do that, it's -margins var1 var2, predict(mu) vsquish-.
1 like
Comment
anne jagdberg

Join Date: Feb 2018

Posts: 28
#3

01 Mar 2022, 13:26

Thank you for yoir reply!

If I use predict(mu) still the marginsplot shows me the same one as if I do not log transform it. Sorry I dont understand :-(

To your second question: I am not sure... My variable var1 ranges from 16 to 30 and var2 from 1 to 5. I saw a code which did the same what I am doing, this was:
margins, at(var1=(16(1)30) var2=(1(1)5)) vsquish

is there a difference to my code margins, over(var 1 var2) predict(xb) vsquish?

Thank you very much!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#4

01 Mar 2022, 14:06

If I use predict(mu) still the marginsplot shows me the same one as if I do not log transform it. Sorry I dont understand :-(

I think you need to show the graphs, with the accompanying regression code, so we can see what you are referring to.

My variable var1 ranges from 16 to 30 and var2 from 1 to 5. I saw a code which did the same what I am doing, this was:
margins, at(var1=(16(1)30) var2=(1(1)5)) vsquish

is there a difference to my code margins, over(var 1 var2) predict(xb) vsquish?

Yes, there is a difference. But first, let's clarify something: are your variables var1 and var2 continuous or discrete. The code using var1##var2 implies they are discrete. If you intend them to be continuous, you have to tell Stata that by writing c.var1##c.var2. In an interaction term, any unprefixed variable is treated as discrete.

Assuming var1 and var2 are discrete

Code:

margins, at(var1=(16(1)30) var2=(1(1)5)) is equivalent to margins var1#var2 but, in general, different from margins, over(var1 var2)

In a model that has no other variables, then -margins, over(var1 var2)- will be the same as the other two. But if there are any other variables in the regression, they are different.
Comment
anne jagdberg

Join Date: Feb 2018

Posts: 28
#5

01 Mar 2022, 14:32

OK thank you, I attach the graphs
- with gamma link log
- with regression and the original variable
- with regression and the ln variable

I would like the variables to be treated as categorial ones. Thank you for your question and the information about writing c.var1!
As you can see, there are no other variables.

My codes look as follows:

************************************************** ******************************************
** gamma distribution
svy: glm sports i.cohort i.AGE i.AGE#i.cohort if sports>0, family(gamma) link(log)
margins, over(AGE cohort)
marginsplot, title("Cohort, age, sports") xtitle("Age") ///
ytitle("Sports (Gamma)") ///
plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
ci1opts(lcol(gs0)) ///
ci2opts(lcol(gs5)) ///
ci3opts(lcol(gs8)) ///
ci4opts(lcol(gs11)) ///
ci5opts(lcol(gs14))
graph export gammalinkall_sports_across_age.tif, replace

** without log
svy: reg sports i.cohort i.AGE i.AGE#i.cohort if sports>0
margins, over(AGE cohort)
marginsplot, title("Cohort, age, sports") xtitle("Age") ///
ytitle("sports)") ///
plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
ci1opts(lcol(gs0)) ///
ci2opts(lcol(gs5)) ///
ci3opts(lcol(gs8)) ///
ci4opts(lcol(gs11)) ///
ci5opts(lcol(gs14))
graph export all_sports_across_age.tif, replace
*/

** all ln sports
svy: reg lnsports i.cohort i.AGE i.AGE#i.cohort if sports>0
margins, over(AGE cohort)
marginsplot, title("Cohort, age, ln sports") xtitle("Age") ///
ytitle("Ln(sports)") ///
plot1opts(msymbol(D) lcolor(gs0) mcolor(gs0)) ///
plot2opts(msymbol(X) lcolor(gs5) mcolor(gs5)) ///
plot3opts(msymbol(O) lcolor(gs8) mcolor(gs8)) ///
plot4opts(msymbol(T) lcolor(gs11) mcolor(gs11)) ///
plot5opts(msymbol(S) lcolor(gs14) mcolor(gs14)) ///
ci1opts(lcol(gs0)) ///
ci2opts(lcol(gs5)) ///
ci3opts(lcol(gs8)) ///
ci4opts(lcol(gs11)) ///
ci5opts(lcol(gs14))
graph export all_lnsports_across_age.tif, replace

************************************************** ******************************************

Attached Files
Comment
anne jagdberg

Join Date: Feb 2018

Posts: 28
#6

02 Mar 2022, 01:16

I have an amendment... I was trying this and that to solve the problem and what I now found was.

glm sports i.cohort if drinker==1, family(gamma) link(log)
margin cohort, atmeans

glm sports i.cohort if drinker==1
margin cohort, atmeans

In both cases, it delivers the same results for margins, which I do not understand.
Can anyone explain that?

If I examine the "original" values (via mean sports, over(cohort)) the values are different (which I understand, I do not understand that the above mentioned codes deliver the same results).

This is similar to the problem above, that with gamma link the plots are the same as if I do not use the logarithm.

Thank you!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

02 Mar 2022, 08:08

It's the job of the generalized linear models you asked for to fit and report the means of the outcome for each category. A glm is a more or less fancy variation on

mean outcome | predictors

What differs is just what kind of uncertainty is expected around the mean function.

Code:

sysuse auto, clear

glm mpg i.foreign
predict raw

glm mpg i.foreign, link(log)
predict log

glm mpg i.foreign, f(gamma)
predict gamma

tabdisp foreign, c(raw log gamma)

----------------------------------------------------------------------
Car       |
origin    | Predicted mean mpg  Predicted mean mpg  Predicted mean mpg
----------+-----------------------------------------------------------
 Domestic |           19.82692            19.82692            19.82692
  Foreign |           24.77273            24.77273            24.77274
----------------------------------------------------------------------

There is a tiny amount of numeric noise there, but in principle all those means should be considered identical.

Last edited by Nick Cox; 02 Mar 2022, 08:50.

Announcement

Logaristhm vs. gamma log link

Comment

Comment

Comment

Comment

Comment

Comment