Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of a linear (percentage) - log regression model

    To provide context, I am running a fixed effects regression model, assessing the relationship between the percent an organization spends on overhead to how much it produces in terms of the number of houses built, as well as how much revenue it generates.

    My independent variable is the overhead ratio, which is between 0 and 1. My dependent variable is the log of the total number of houses. The coefficient is .88 and significant. So, I take the exponent of .88, which I believe is 2.41, subtract 1 and multiply times 100, which gives me 141%. For the interpretation, I'm saying that a one-percentage-point increase in the overhead ratio equates to a 141% increase in the total number of houses built. However, this seems way too high.

    I also use a second dependent variable - the log of total revenue. For this, I get a significant coefficient of .32, which again, I take the exponent of .32 and get 1.38. I then subtract 1 and multiply times 100, which gives me 37.17%. I interpret this as a one-percentage-point increase in the overhead ratio equates to a 37.17% increase in total houses built. Again, this seems way too high to me.

    Am I doing the calculations and interpretations correctly? Many thanks in advance!

  • #2
    My independent variable is the overhead ratio, which is between 0 and 1. My dependent variable is the log of the total number of houses. The coefficient is .88 and significant. So, I take the exponent of .88, which I believe is 2.41, subtract 1 and multiply times 100, which gives me 141%. For the interpretation, I'm saying that a one-percentage-point increase in the overhead ratio equates to a 141% increase in the total number of houses built. However, this seems way too high.
    The problem lies in the "one-percentage-point" increase interpretation, which does not agree with the nature of the independent variable. The independent variable was included as 0 to 1, and not 0 to 100, so one unit increase of that is not one percent point, it's from nothing to all. If it's just one percent point increase, the regression coefficient should first be multiplied with 0.01 (instead of 1) before taking exponent.

    Comment


    • #3
      Hi Ken, I appreciate your quick response and definitely recognize the problem now.
      However, to make sure I'm understanding this correctly, I first multiply .88*.01 = .0088. I take the exponent of .0088 = 1.01. I subtract one and multiply x 100, which gives me .88%. So then I would say "A one-unit increase in the overhead ratio equates to a .88% increase in the total number of houses." I was told that this is wrong as this would be the interpretation if I did not log the total number of houses.

      Comment


      • #4
        As I continue to read up on this topic, I'm finding some conflicting information.

        When you have a 0-1 variable or a percentage, some say to multiply the coefficient by .01 first, whereas others say multiply the coefficient by 100 first.

        Additionally, when you have a log value some say to take the exponent followed by subtracting 1 and multiply times 100, whereas others say simply take the exponent and this is the percent change.

        Now, I'm definitely confused : /

        Comment


        • #5
          Let's just calculate it directly and forget about confused formulas.

          Your regression equation is overhead_ratio = 0.88*log(houses_built) + other terms. So let's see what happens when houses_built increase by 1 percent (not percentage point: houses_built is a count, not a percentage). houses_built transforms to 1.01*houses_built. log(1.01*houses_built) = log(1.01) + log(houses_built). So log(houses_built) increase by log(1.01) = 0.00995 (to 5 decimal places). If log(houses_built) has increased by 0.00995, then 0.88*log(houses_built) will increase by 0.88*0.00995 = 0.008756. So overhead_ratio increase by 0.008756. This is an absolute change, a difference, not a ratio or a percentage, nor percentage points. It is an absolute change in overhead_ratio of 0.008756 associated with a 1% increase in houses_built.

          This is all derived from first principles. It relies only on knowing what a regression equation is and knowing that the logarithm of a product is the sum of the logarithms.

          Comment


          • #6
            Hi Clyde, I just want to clarify that the overhead is the independent variable, while houses built is the dependent variable. So my regression equation is log(houses_built) = 0.88*overhead_ratio. Can you help guide me through this interpretation?

            Comment


            • #7
              Hi Jessica: Here's how I would report it. Increase overhead_ratio by .1, which means a 10 percentage point increase -- if this is a reasonable increase to think about. Then log(houses_built) increases by .088. This means houses_built increases by about 8.8%. With this kind of magnitude you really don't need the more accurate calculation. If you increase overhead_ratio by .1 -- a one percentage point increase -- then houses_built increases by about .88%, just as you stated above.

              Comment


              • #8
                Thank you Jeff, I really appreciate this. And just to clarify, do you mean a .1 increase in the overhead ratio, would be a 10 percentage point increase, whereas a .01 (in your comment above you have .1) increase in the overhead ratio would be a 1 percentage point increase?

                What I wrote originally in my paper was "This means that a one-percentage-point increase in the overhead ratio equates to a 0.88% increase in the total number of houses. For example, an average organization in the sample has an overhead ratio of 19% and produces 11.76 houses. If it increased its overhead ratio by one percentage point, or to 20%, that would equate to 11.86 houses or a .88% increase."

                What the reviewer said is that "This would make sense if the dependent variable was the number of houses produced. However, the dependent variable is the log of the number of houses produced." I've been driving myself crazy trying to figure out what I did wrong, but now I believe that the reviewer is actually incorrect.
                Last edited by Jessica Berrett; 30 May 2021, 20:24.

                Comment


                • #9
                  Jessica: You are correct, the reviewer is wrong.

                  Comment


                  • #10
                    Re #6. Sorry, I misread your question. Disregard #5.

                    Comment


                    • #11
                      Hello everyone,
                      I have a similar problem with little change in interpretation.
                      I am trying to see the change in income of three group of employment categories( emp1 emp2 emp3) over 4 month. I have kept the month 1 and emp3 as base and try to see how in successive month, income of various categories fall relative to month 1.
                      I have enterred the code as follows-

                      Code:
                       reg log_INCOME i.time##emp1 i.time##emp2
                      Result of this model is as mentioned below.

                      Code:
                                                                     
                      log_INCOME    Coef.    Std. Err.    t    P>t    [95% Conf.    Interval]
                                              
                      time    
                      2    -.008156    .0251312    -0.32    0.746    -.0574135    .0411016
                      3    -.1116136    .0251463    -4.44    0.000    -.1609008    -.0623264
                      4    -1.06105    .0257911    -41.14    0.000    -1.111601    -1.010499
                          
                      1.emp1    -.3679419    .0294475    -12.49    0.000    -.4256596    -.3102242
                          
                      time#emp1    
                      2 1    -.0052179    .0417039    -0.13    0.900    -.0869584    .0765225
                      3 1    -.0714011    .0417018    -1.71    0.087    -.1531374    .0103351
                      4 1    .058207    .0426856    1.36    0.173    -.0254576    .1418716
                          
                      1.emp2    -.3656286    .0242438    -15.08    0.000    -.4131469    -.3181104
                          
                      time#emp2    
                      2 1    .0076022    .0342933    0.22    0.825    -.0596133    .0748177
                      3 1    -.2466461    .0343435    -7.18    0.000    -.31396    -.1793323
                      4 1    -.7617976    .0353724    -21.54    0.000    -.8311281    -.6924672
                          
                      _cons    8.268434    .0177704    465.29    0.000    8.233604    8.303264
                      In my model coefficient of time shows the fall in income for base category of employment i.e. emp3. Now, issue is that coefficient of month 4 is 1.06105. Following the formula (e^b-1) for dummy variable, I find that it is (e^1.06105 - 1) = 188 %. I explained it as follows, relative to month 1 income for emp3 category falls by 188 % for emp3. However, I do the similar exercise in excel using the arithmetic mean of each month for each category, i found only 33 % fall income of emp3 category in month 4. However I tried using Geometric mean also, still I am getting 65% fall in income in month 4 for emp3.

                      So, 1st question is how to interpret the coefficient of month in my model.
                      2nd, if you will see the excel calculations in detail, you will find that for month 4 by Arithmetic mean percentage fall is less for EMP3 category i.e. 33 % compared to Emp1 i.e. 38% but when we see the GM, then Percentage fall is higher for Emp3 than Emp1.

                      Code:
                       
                      time Income (A.M.) Income (G.M.) Income (A.M.) Income (G.M.) Income (A.M.) Income (G.M.)
                      Emp1 fall % Emp1 fall % Emp2 fall % Emp2 fall % Emp3 fall % Emp3 fall %
                      1 3950.088 2698.61 3410.845 2704.86 5060.638 3898.838
                      2 3864.538 0.31 2662.759 1.328499 3363.748 1.38 2703.362 0.055382 5044.986 0.31 3867.169 0.812268
                      3 3417.898 13.47 2247.283 16.72442 2674.907 21.58 1890.404 30.11084 4774.57 5.65 3487.081 10.56102
                      4 2436.332 38.32 989.9444 63.31651 1377.592 59.61 437.0106 83.8435 3387.815 33.06 1349.357 65.39079
                      Sorry for posting this long question but I am tired reading all available material and not getting solution.
                      If data is required, please tell me. I will post it also. Thank you very much in advance.

                      Comment


                      • #12
                        The coefficient of 4.time is not 1.06105, it's -1.06105. So, in the base category of emp, namely emp = 3, at time = 4, log income decreases by 1.06105 compared to time 1. If log income decreases by 1.06105, then income itself changes by a multiplicative factor of exp(-1.06105) = 0.346 (to 3 decimals). Since 1-0.346 = 0.654, you could say that income in employment category 3 declines by 65.4% at time 4 compared to time 1. Please note also that if these are observational data, this language should be slightly revised to eliminate any suggestions of causality.

                        Comment


                        • #13
                          Originally posted by Kumari Gunjan View Post

                          2nd, if you will see the excel calculations in detail, you will find that for month 4 by Arithmetic mean percentage fall is less for EMP3 category i.e. 33 % compared to Emp1 i.e. 38% but when we see the GM, then Percentage fall is higher for Emp3 than Emp1.
                          Thank you very much sir for you quick reply. I would like to hear your comments on the 2nd part of my question. Why I am getting different percentage fall from AM and GM. In addition, is there a model which takes Arithmetic mean rather than geometric mean? I run the Poisson regression also because there was large number of zero income in month 4. Result is very much similar to arithmetic mean table. However, I am not sure whether I can take it or not?

                          Comment


                          • #14
                            I would like to hear your comments on the 2nd part of my question. Why I am getting different percentage fall from AM and GM.
                            I don't know how to approach this question as there is really no reason to expect them to be the same. So I'll turn it around: why are you asking this question? What makes you think they shouldn't be different?

                            However, I am not sure whether I can take it or not?
                            Sorry, but I don't know what this means.

                            Comment


                            • #15
                              why are you asking this question? What makes you think they shouldn't be different?
                              Sir, my basic motive was to see the percentage fall in income of three employment categories relative to their income in month 1. The arithmetic mean calculations do fit to my theoretical hunch as emp3 is relatively better off workers than emp1, which is also clear from excel table. The use of log-linear regression model was solely for getting percentage fall income in income for each category. However, after running this regression, I realized that fall in income is very high in magnitude and more for emp3 than emp1 which is in contrast to arithmetic mean results. Now, I am confused, which result I should rely on?

                              Comment

                              Working...
                              X