Interpretation of a linear (percentage) - log regression model

Jessica Berrett

Join Date: Sep 2019

Posts: 57
#1

Interpretation of a linear (percentage) - log regression model

30 May 2021, 14:25

To provide context, I am running a fixed effects regression model, assessing the relationship between the percent an organization spends on overhead to how much it produces in terms of the number of houses built, as well as how much revenue it generates.

My independent variable is the overhead ratio, which is between 0 and 1. My dependent variable is the log of the total number of houses. The coefficient is .88 and significant. So, I take the exponent of .88, which I believe is 2.41, subtract 1 and multiply times 100, which gives me 141%. For the interpretation, I'm saying that a one-percentage-point increase in the overhead ratio equates to a 141% increase in the total number of houses built. However, this seems way too high.

I also use a second dependent variable - the log of total revenue. For this, I get a significant coefficient of .32, which again, I take the exponent of .32 and get 1.38. I then subtract 1 and multiply times 100, which gives me 37.17%. I interpret this as a one-percentage-point increase in the overhead ratio equates to a 37.17% increase in total houses built. Again, this seems way too high to me.

Am I doing the calculations and interpretations correctly? Many thanks in advance!
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

30 May 2021, 14:37

My independent variable is the overhead ratio, which is between 0 and 1. My dependent variable is the log of the total number of houses. The coefficient is .88 and significant. So, I take the exponent of .88, which I believe is 2.41, subtract 1 and multiply times 100, which gives me 141%. For the interpretation, I'm saying that a one-percentage-point increase in the overhead ratio equates to a 141% increase in the total number of houses built. However, this seems way too high.

The problem lies in the "one-percentage-point" increase interpretation, which does not agree with the nature of the independent variable. The independent variable was included as 0 to 1, and not 0 to 100, so one unit increase of that is not one percent point, it's from nothing to all. If it's just one percent point increase, the regression coefficient should first be multiplied with 0.01 (instead of 1) before taking exponent.
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#3

30 May 2021, 14:56

Hi Ken, I appreciate your quick response and definitely recognize the problem now.
However, to make sure I'm understanding this correctly, I first multiply .88*.01 = .0088. I take the exponent of .0088 = 1.01. I subtract one and multiply x 100, which gives me .88%. So then I would say "A one-unit increase in the overhead ratio equates to a .88% increase in the total number of houses." I was told that this is wrong as this would be the interpretation if I did not log the total number of houses.
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#4

30 May 2021, 15:42

As I continue to read up on this topic, I'm finding some conflicting information.

When you have a 0-1 variable or a percentage, some say to multiply the coefficient by .01 first, whereas others say multiply the coefficient by 100 first.

Additionally, when you have a log value some say to take the exponent followed by subtracting 1 and multiply times 100, whereas others say simply take the exponent and this is the percent change.

Now, I'm definitely confused : /
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

30 May 2021, 16:33

Let's just calculate it directly and forget about confused formulas.

Your regression equation is overhead_ratio = 0.88*log(houses_built) + other terms. So let's see what happens when houses_built increase by 1 percent (not percentage point: houses_built is a count, not a percentage). houses_built transforms to 1.01*houses_built. log(1.01*houses_built) = log(1.01) + log(houses_built). So log(houses_built) increase by log(1.01) = 0.00995 (to 5 decimal places). If log(houses_built) has increased by 0.00995, then 0.88*log(houses_built) will increase by 0.88*0.00995 = 0.008756. So overhead_ratio increase by 0.008756. This is an absolute change, a difference, not a ratio or a percentage, nor percentage points. It is an absolute change in overhead_ratio of 0.008756 associated with a 1% increase in houses_built.

This is all derived from first principles. It relies only on knowing what a regression equation is and knowing that the logarithm of a product is the sum of the logarithms.
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#6

30 May 2021, 17:10

Hi Clyde, I just want to clarify that the overhead is the independent variable, while houses built is the dependent variable. So my regression equation is log(houses_built) = 0.88*overhead_ratio. Can you help guide me through this interpretation?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#7

30 May 2021, 20:06

Hi Jessica: Here's how I would report it. Increase overhead_ratio by .1, which means a 10 percentage point increase -- if this is a reasonable increase to think about. Then log(houses_built) increases by .088. This means houses_built increases by about 8.8%. With this kind of magnitude you really don't need the more accurate calculation. If you increase overhead_ratio by .1 -- a one percentage point increase -- then houses_built increases by about .88%, just as you stated above.
Comment
Jessica Berrett

Join Date: Sep 2019

Posts: 57
#8

30 May 2021, 20:17

Thank you Jeff, I really appreciate this. And just to clarify, do you mean a .1 increase in the overhead ratio, would be a 10 percentage point increase, whereas a .01 (in your comment above you have .1) increase in the overhead ratio would be a 1 percentage point increase?

What I wrote originally in my paper was "This means that a one-percentage-point increase in the overhead ratio equates to a 0.88% increase in the total number of houses. For example, an average organization in the sample has an overhead ratio of 19% and produces 11.76 houses. If it increased its overhead ratio by one percentage point, or to 20%, that would equate to 11.86 houses or a .88% increase."

What the reviewer said is that "This would make sense if the dependent variable was the number of houses produced. However, the dependent variable is the log of the number of houses produced." I've been driving myself crazy trying to figure out what I did wrong, but now I believe that the reviewer is actually incorrect.

Last edited by Jessica Berrett; 30 May 2021, 20:24.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#9

30 May 2021, 22:03

Jessica: You are correct, the reviewer is wrong.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#10

31 May 2021, 12:25

Re #6. Sorry, I misread your question. Disregard #5.
Comment

Kumari Gunjan

Join Date: Jun 2021
Posts: 18

#11

21 Jun 2021, 08:41

Hello everyone,
I have a similar problem with little change in interpretation.
I am trying to see the change in income of three group of employment categories( emp1 emp2 emp3) over 4 month. I have kept the month 1 and emp3 as base and try to see how in successive month, income of various categories fall relative to month 1.
I have enterred the code as follows-

Code:

 reg log_INCOME i.time##emp1 i.time##emp2

Result of this model is as mentioned below.

Code:

                                               
log_INCOME    Coef.    Std. Err.    t    P>t    [95% Conf.    Interval]
                        
time    
2    -.008156    .0251312    -0.32    0.746    -.0574135    .0411016
3    -.1116136    .0251463    -4.44    0.000    -.1609008    -.0623264
4    -1.06105    .0257911    -41.14    0.000    -1.111601    -1.010499
    
1.emp1    -.3679419    .0294475    -12.49    0.000    -.4256596    -.3102242
    
time#emp1    
2 1    -.0052179    .0417039    -0.13    0.900    -.0869584    .0765225
3 1    -.0714011    .0417018    -1.71    0.087    -.1531374    .0103351
4 1    .058207    .0426856    1.36    0.173    -.0254576    .1418716
    
1.emp2    -.3656286    .0242438    -15.08    0.000    -.4131469    -.3181104
    
time#emp2    
2 1    .0076022    .0342933    0.22    0.825    -.0596133    .0748177
3 1    -.2466461    .0343435    -7.18    0.000    -.31396    -.1793323
4 1    -.7617976    .0353724    -21.54    0.000    -.8311281    -.6924672
    
_cons    8.268434    .0177704    465.29    0.000    8.233604    8.303264

In my model coefficient of time shows the fall in income for base category of employment i.e. emp3. Now, issue is that coefficient of month 4 is 1.06105. Following the formula (e^b-1) for dummy variable, I find that it is (e^1.06105 - 1) = 188 %. I explained it as follows, relative to month 1 income for emp3 category falls by 188 % for emp3. However, I do the similar exercise in excel using the arithmetic mean of each month for each category, i found only 33 % fall income of emp3 category in month 4. However I tried using Geometric mean also, still I am getting 65% fall in income in month 4 for emp3.

So, 1st question is how to interpret the coefficient of month in my model.
2nd, if you will see the excel calculations in detail, you will find that for month 4 by Arithmetic mean percentage fall is less for EMP3 category i.e. 33 % compared to Emp1 i.e. 38% but when we see the GM, then Percentage fall is higher for Emp3 than Emp1.

Code:

 time
Income (A.M.)

Income (G.M.)
Income (A.M.)

Income (G.M.)
Income (A.M.)

Income (G.M.)



Emp1
fall %
Emp1
fall %
Emp2
fall %
Emp2
fall %
Emp3
fall %
Emp3
fall %

1
3950.088

2698.61

3410.845

2704.86

5060.638

3898.838


2
3864.538
0.31
2662.759
1.328499
3363.748
1.38
2703.362
0.055382
5044.986
0.31
3867.169
0.812268

3
3417.898
13.47
2247.283
16.72442
2674.907
21.58
1890.404
30.11084
4774.57
5.65
3487.081
10.56102

4
2436.332
38.32
989.9444
63.31651
1377.592
59.61
437.0106
83.8435
3387.815
33.06
1349.357
65.39079

Sorry for posting this long question but I am tired reading all available material and not getting solution.
If data is required, please tell me. I will post it also. Thank you very much in advance.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

21 Jun 2021, 11:25

The coefficient of 4.time is not 1.06105, it's -1.06105. So, in the base category of emp, namely emp = 3, at time = 4, log income decreases by 1.06105 compared to time 1. If log income decreases by 1.06105, then income itself changes by a multiplicative factor of exp(-1.06105) = 0.346 (to 3 decimals). Since 1-0.346 = 0.654, you could say that income in employment category 3 declines by 65.4% at time 4 compared to time 1. Please note also that if these are observational data, this language should be slightly revised to eliminate any suggestions of causality.
Comment
Kumari Gunjan

Join Date: Jun 2021

Posts: 18
#13

22 Jun 2021, 04:34

Originally posted by Kumari Gunjan View Post

2nd, if you will see the excel calculations in detail, you will find that for month 4 by Arithmetic mean percentage fall is less for EMP3 category i.e. 33 % compared to Emp1 i.e. 38% but when we see the GM, then Percentage fall is higher for Emp3 than Emp1.

Thank you very much sir for you quick reply. I would like to hear your comments on the 2nd part of my question. Why I am getting different percentage fall from AM and GM. In addition, is there a model which takes Arithmetic mean rather than geometric mean? I run the Poisson regression also because there was large number of zero income in month 4. Result is very much similar to arithmetic mean table. However, I am not sure whether I can take it or not?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

22 Jun 2021, 09:37

I would like to hear your comments on the 2nd part of my question. Why I am getting different percentage fall from AM and GM.

I don't know how to approach this question as there is really no reason to expect them to be the same. So I'll turn it around: why are you asking this question? What makes you think they shouldn't be different?

However, I am not sure whether I can take it or not?

Sorry, but I don't know what this means.
Comment
Kumari Gunjan

Join Date: Jun 2021

Posts: 18
#15

22 Jun 2021, 13:49

why are you asking this question? What makes you think they shouldn't be different?

Sir, my basic motive was to see the percentage fall in income of three employment categories relative to their income in month 1. The arithmetic mean calculations do fit to my theoretical hunch as emp3 is relatively better off workers than emp1, which is also clear from excel table. The use of log-linear regression model was solely for getting percentage fall income in income for each category. However, after running this regression, I realized that fall in income is very high in magnitude and more for emp3 than emp1 which is in contrast to arithmetic mean results. Now, I am confused, which result I should rely on?
Comment

Announcement