interpretation of coefficients for log transformed dependent variable panel data

Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#16

17 Dec 2018, 08:56

Well, by setting all the variables to their mean values, you are stipulating specific values of all the predictors, and therefore also stipulating a specific expected value for Profits itself, and then repeat that with an increment of 1 in X. The difference between those is interpretable as the expected change in profits associated with a unit change in X conditional on all variables being at their mean values, but it is not applicable to any other combination of values of the variables. So it is a number of very limited usefulness since, in most situations, there are few or no entities that are actually at mean values on all relevant attributes. Code for this, doing it in one step, would look something like this:

Code:

margins, dydx(X) expression(exp(predict())) atmeans
Comment

Nathan Johnson

Join Date: Apr 2021
Posts: 7

#17

26 Apr 2021, 15:34

I have a natural log transformed dependent variable FDI_ln and most of my explanatory variables are not log transformed and are expressed in percentages. I get extremely large values for Education (which is a percentage of population with Bachelor degree and higher) and i don't know how to explain it. Does it mean something is wrong? How do i explain it? the usual (exp(coeff)-1)*100 doesn't seem to make sense and as noted above, it is only effective when the coeff is close to 0. What do i do with large coefficient?

Code:

Variable |   OLSr         REr         FEb        FEwr     
-------------+------------------------------------------------
      GDP_ln |    1.9155      2.0834      1.8106     12.1263  
             |    0.1993      0.2291      0.2415      3.9066  
          PI |   -0.0000     -0.0000     -0.0001     -0.0001  
             |    0.0000      0.0000      0.0000      0.0001  
         Imp |    1.2987      0.3188      1.9996    -15.5376  
             |    1.6586      2.0627      2.1859      6.0715  
         Edu |   11.7774     11.0138     14.1486    -57.2353  
             |    3.6502      5.1918      4.2088     25.7762  
         CIT |   11.1217     11.4127     12.0772     30.8919  
             |    5.5386      8.0793      6.8739     19.2843  
          GC |   -0.1150     -0.1677     -0.0689     -0.2576  
             |    0.1315      0.1483      0.2464      0.1440  
          TI |    0.0045      0.0066     -0.0026   (omitted)  
             |    0.0104      0.0136      0.0163              
        Yr15 |   -0.1050     -0.1437      1.5076     -0.3081  
             |    0.2841      0.2198      1.4452      0.2532  
        Yr16 |   -0.0845     -0.1406      2.1220     -0.0116  
             |    0.3038      0.2731      1.3941      0.3591  
        Yr17 |   -0.2069     -0.1315     -3.2034      0.3304  
             |    0.2945      0.2314      1.1091      0.4947  
        Yr18 |   -0.4070     -0.4667      0.7548     -0.1433  
             |    0.3018      0.2521      1.1628      0.6919  
        Yr19 |   -1.0170     -1.0412     -0.3111     -0.5307  
             |    0.3415      0.2984      1.0830      0.8870  
       _cons |  -19.8169    -21.7617    -18.2076    -1.2e+02  
             |    1.9519      2.3618      2.5830     43.3154  
-------------+------------------------------------------------
           N |       206         206         206         206  
        r2_a |    0.7399                  0.8987      0.1617  
--------------------------------------------------------------
                                                  legend: b/se

[/CODE]

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#18

27 Apr 2021, 11:39

When you say Edu is a "percent," do you really mean a percent or is it a proportion? Even if it's the latter, the effect seems quite large. It would mean a 0.01 increase in Edu increases log(FDI) by about .118 in the first column -- about a 12% increase in FDI. And of course the flipping of signs in the last column is concerning.
1 like
Comment
Nathan Johnson

Join Date: Apr 2021

Posts: 7
#19

27 Apr 2021, 17:24

Thank you very much for your reply. So stocked to get a reply from someone so famous and whose book I have been flipping in the past 3 months trying to finish my thesis... My data of Edu, import, CIT are all in proportions. So is that a proper way to interpret my coefficient - a 0.01 increase in Edu increases log(FDI) by about .118 in the first column -- about a 12% increase in FDI? Because i was trying to calculate (exp(coeff)-1)*100... would you kindly point me to where can i look up more information about interpreting such large coefficient? When i search, all tutorials seem to point at : (exp(coeff)-1)*100 rule with log transformed dependent variable. Indeed, the flipping of signs doesn't make much sense - it is a fixed effect within model, and i guess my explanation is that even though the GDP, edu, and other variables were going "strong" or positive for FDI, the FDI still was going down over time for the states?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#20

27 Apr 2021, 17:42

You are always allowed to change your x by any amount, but you should think hard about the units of measurement. A change of 0.01 in Edu means a one percentage point increase. Then, take this change and multiply it by the coefficient to get the .118. If you want, you can exponentiate this value and subtract one: exp(.118) - 1 = .125, which is 12.5%. Equation (6.8) in my introductory econometrics text shows this for any change in x. For me, I'm happy with the approximation that .118 is about 12 percent, but not for much larger changes.
1 like
Comment
Shagufta Gupta

Join Date: Aug 2021

Posts: 4
#21

02 Aug 2021, 11:08

Originally posted by Clyde Schechter View Post

Phillipe, you have to think about what your variables mean. When you transform them, or rescale them, the meaning changes.

The marginal effect is always the estimate of the change in outcome variable associated with a 1-unit change in the predictor variable. When your variable is scaled as a proportion from 0 to 1, that means a change going all the way from 0 at the bottom to 1 at the top. When your variable is scaled from 0 to 100 as a percentage, then a unit change means a change from, say, 5% to 6%, or 87% to 88% or something like that. So a unit change when your variable is scaled as a proportion is going to have a much larger effect than a unit change when your variable is scaled as a percentage. So your calculations are correct in both instances. The change you calculated for the proportion model is not unreasonably large, given that in that situation a unit change means going all the way from the bottom to the top of the scale, not just a tiny 1/100th of the way.

Hi Clyde. I am working on a correlated random effects model where I have a log-lin specification. Digital exports is the DV and an index of data restrictions is one of the IVs, other IVs are a mix of log and linear vars. I was struggling with the interpretation of the coefficient of a linear independent variable when the dependent variable was logged. Upon searching I came across this thread where you have given extremely helpful responses. I have a follow up question on your response quoted above. My Question is: What if my IV is an index that is supposed to vary from 0-1 but all values in my sample are extremely close and differ only at 3^rd, 4^th or even 5^th decimal level? How do I then interpret a coefficient of -180? By your suggestion I am getting ~100% decline in DV (digital exports) per unit change in IV (index of data restrictions). Can I still interpret it as, if IV moves from 0-1 the DV declines by ~100%? Additionally this IV is time invariant and so I only have between effects for this.

Last edited by Shagufta Gupta; 02 Aug 2021, 12:02.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#22

02 Aug 2021, 12:37

So there are two good reasons not to look at an effect as IV goes from 0 to 1 in this context.

1. Theoretical reason: it is quite clear from what you describe that this is an extrapolation way beyond the range of the observed data. There is no good reason to believe that the relationships observed apply outside the range of the observed data.

2. Practical reason: it is quite clear that a change from 0 to 1 is not a remotely plausible event in your context. So why would anyone want to know, or care about, the implications of something so fantastical?

A very important question for you to think about is this: if this variable is supposed to range from 0 to 1, why, in your data, are the values so closely packed into a tiny piece of that range such that you need 5 or 6 decimal places to see any variation? That sounds like you are either gathering data from some highly restrictive, idiosyncratic subpopulation, or something went drastically wrong with the measurement process or data management. In short, I would not trust this variable at all. If it is correct data from an unusual population, it is fair to say that given the severely restricted range observed, this variable is unlikely to be reliably measured, and its validity as a measure of whatever construct it tries to assess is in serious question as well. If it is incorrect data, then, evidently, the solution is to fix the data errors.
Comment
Shagufta Gupta

Join Date: Aug 2021

Posts: 4
#23

02 Aug 2021, 13:05

Thank you Clyde. So here is how my index is created: IV_k=Average (DR_i) where i is not equal to k and n= 63
DR is an index with possible values between 0 and 1 but the actual values range from .03 to .82 and mean is .254. Since for country K, IV_kis the average of all DR indices except its own the values I get for IV_kare v close. Basically DR_i is an index that measures restrictions that a country k places on trade. By creating my index IV_kI am seeking to get a measure of restrictions country k faces in the market.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#24

02 Aug 2021, 13:46

So in each country, IV_k is the average of 62 values of DR_i, and for any two countries, 61 of the 62 values are the same. That explains why there is so little variation in this variable. It is really unlikely that a variable constructed in that way will provide much explanatory power for anything. It is nearly a constant. To the extent that it does have explanatory power for something, because its variation is on such a narrow range, its coefficient will be large--but don't confuse large with important in this context.

If I were going to use this variable at all (which I would only do if the construct is crucial to the research question and there is absolutely no way to get a better measure of it), I would probably rescale it so that it runs from 0 to 1:

Code:

summ IV, meanonly gen rescaled_IV = (IV-`r(min)')/(`r(max)' - `r(min)')

and use the rescaled version in the model. That way when you calculate a marginal effect, you will at least be getting something you can explain as the expected difference in outcome associated with a change in IV from lowest to highest observed value.

That said, if this construct is important to your modeling, I would really invest effort in finding a better way to measure it. Since this is not my field, I don't feel able to offer you concrete guidance about how to do that. There are numerous economists here, and one of them might have an idea for you. Alternatively, you might consult a colleague at your own institution who has experience with this. The one thought that occurs to me, though I have no idea whether this makes sense from an economic perspective, might be to use a weighted average, weighting each other country's DR_i by the amount of trade country K has with country i. That would at least increase the variation in the resulting variable, perhaps to the point of giving you a usable index. And to this non-economist's mind, it sounds like it would have some face validity as a sensible measure of the restrictions country K faces in the overall market. Again, though, not being an economist, I cannot say whether that makes economic sense or not--and economic sense will trump statistical sleight of hand every time.

Last edited by Clyde Schechter; 02 Aug 2021, 13:49.
Comment
Shagufta Gupta

Join Date: Aug 2021

Posts: 4
#25

02 Aug 2021, 15:06

Thank you Clyde for taking the time to help. I will try the two alternatives suggested by you. Really appreciate your advice.
Comment
Shagufta Gupta

Join Date: Aug 2021

Posts: 4
#26

04 Aug 2021, 09:19

Originally posted by Clyde Schechter View Post

So in each country, IV_k is the average of 62 values of DR_i, and for any two countries, 61 of the 62 values are the same. That explains why there is so little variation in this variable. It is really unlikely that a variable constructed in that way will provide much explanatory power for anything. It is nearly a constant. To the extent that it does have explanatory power for something, because its variation is on such a narrow range, its coefficient will be large--but don't confuse large with important in this context.

If I were going to use this variable at all (which I would only do if the construct is crucial to the research question and there is absolutely no way to get a better measure of it), I would probably rescale it so that it runs from 0 to 1:

Code:

summ IV, meanonly gen rescaled_IV = (IV-`r(min)')/(`r(max)' - `r(min)')

and use the rescaled version in the model. That way when you calculate a marginal effect, you will at least be getting something you can explain as the expected difference in outcome associated with a change in IV from lowest to highest observed value.

That said, if this construct is important to your modeling, I would really invest effort in finding a better way to measure it. Since this is not my field, I don't feel able to offer you concrete guidance about how to do that. There are numerous economists here, and one of them might have an idea for you. Alternatively, you might consult a colleague at your own institution who has experience with this. The one thought that occurs to me, though I have no idea whether this makes sense from an economic perspective, might be to use a weighted average, weighting each other country's DR_i by the amount of trade country K has with country i. That would at least increase the variation in the resulting variable, perhaps to the point of giving you a usable index. And to this non-economist's mind, it sounds like it would have some face validity as a sensible measure of the restrictions country K faces in the overall market. Again, though, not being an economist, I cannot say whether that makes economic sense or not--and economic sense will trump statistical sleight of hand every time.

Clyde, so I used a rescaled version of a GDP weighted index and got a coefficient of -2.66. This is a between effects coefficient in the correlated random effects model. Based on my understanding, the coefficient means that moving from lowest index value 0 to highest index value 1 the DV declines by 93%. Is my understanding correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29957
#27

04 Aug 2021, 13:02

Based on my understanding, the coefficient means that moving from lowest index value 0 to highest index value 1 the DV declines by 93%. Is my understanding correct?

Probably not. It means that the expected value of the DV associated with the highest index value (1) is 2.66 (whatever units DV is in) lower than the expected value of the DV associated with the lowest index value (0). Whether that would translate into a 93% difference or some other cannot be determined based on the information you have provided.

I'm going to guess that your claim of 93% arises because your DV is actually the natural logarithm of some other variable of interest and you are imputing a corresponding relative difference in that other variable. If that's what you have in mind, well, yes, exp(-2.66) is 0.07 to 2 decimal places, which means a 93% decline in this other variable of interest.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment