Fixed Effect panel data using log transformation for ease of interpretation

Julie Janssen

Join Date: Jan 2022

Posts: 8
#1

Fixed Effect panel data using log transformation for ease of interpretation

05 Jan 2022, 03:16

Dear all,

I am performing a fixed effect panel data: 20 countries over a 15-year period

DV: Gini-index

IV: FDI (second model including mediator governmental quality)

CONTROL: economic growth, education, unemployment

My question is: my supervisor advised me to create log variables for education and unemployment, just for the ease of interpretation (they are not skewed at all). In case I do so, do I only have to display them in my descriptives (along with their ‘’normal’’ values), or do I need to replace them also in the model equation (since he advised me to do it for the ease of interpretation).

Any help is much appreciated!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

05 Jan 2022, 06:40

Julie:
welcome to this forum.
Your supervisor recommended a linear-log model as far as the predictors are concerned (I assume that the regressand remains in its original metric).
You have to replace them in the right-hand side of your regression equation.
Conversely, I would leave them in their orignal metric in the descriptive statistics table and add a footnote reporting their ln values.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#3

05 Jan 2022, 09:14

Thank you for your reaction, I already used a lot of information you posted on the different forums!
Actually I do not understand why converting these two variables into log variables makes sense if this does not ''improve'' the results my regression (I am not that into statistics unfortunately). The only main difference is that the parameter estimates for the two control variables become more significant...
Would it still make sense to convert them into log variables?
Thank you!
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#4

05 Jan 2022, 09:18

Is education an average education level for the people in the country? And unem the unemployment rate? Using log(unem) is a bit unusual because a small change in log(unem) is roughly a percentage change in something measured as a percent. Also, increasing education by one unit is one year, which also has a natural interpretation. Using log(educ) means you're increase average years of schooling by a certain percent. If they're just acting as controls then it probably doesn't matter much. For me, leaving them in their original form is more "natural," but that's mostly a matter of taste.
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#5

05 Jan 2022, 12:28

Education is measured in average years of schooling; unemployment is the % of the total labour force. Thank you: that makes a lot sense to me!

Another question: in the model without my control variables, the parameter estimate of the moderator (FDI*GOV) is negative, whereas it is positive in the model including the control variables (economic growth, education and unemployment). Does this implicate that I am using the wrong control variables? Only the parameter estimate of my control variable unemployment is significant... Sorry I have a hard time understanding Stata and statistics... Once again: any help is much appreciated!

Last edited by Julie Janssen; 05 Jan 2022, 12:42.
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#6

06 Jan 2022, 03:01

An another last question: I want to make a margins plot of GINI, FDI and the moderating role of governance but when I use the command margins fdigov#fdi Stata says:
fdigov: factor variables may not contain noninteger values
It seems like Stata handles FDI and GOV als factor variables. After having looked on multiple forums and reading the manual, I changed the command to c.fdigov#fdi but then Stata says:
only factor variables and their interactions are allowed.
Then I tried egen newid = group(id), label
but then Stata says: factor variables may not contain noninteger values
I have been watching many videos and forums but nothing resolves this problem. Does it mean it is not possible to make a margins plot with my variables?
Any help is much appreciated!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

06 Jan 2022, 03:55

Julie:
1) you ran regressions with different specifications: no onder that coefficients differ. See the literature in your researech field to see what others did in the past when dealing with the very same research goal;
2) see Example 8: Margins of interactions, -margins- entry, Stata .pdf manual

Kind regards,
Carlo
(Stata 19.0)
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#8

07 Jan 2022, 10:47

Thank you once again Carlo.
Does this mean that I cannot combine ratio variables (such as stock ratio / gdp) and continuous variables (such as average years of schooling and GINI index)? It seems like it since for all my ratio variables, Stata says that the skewness is 0.000, which does not correspond with the histograms I have plotted for the same variables. Or does this have something to do with the settings?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#9

07 Jan 2022, 10:53

Julie:
sorry, but I do not follow you on the following points:
1) usually, a ratio variable is continuos (just like average years of schooling and, exception made for same extreme scenarios, GINI index, as far as I can see from today's access at https://data.worldbank.org/indicator/SI.POV.GINI);
2) the skewness of what?

Kind regards,
Carlo
(Stata 19.0)
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#10

07 Jan 2022, 11:54

Sorry, I'll clarify! I mean:
1. does it matter that average years of schooling is measured in terms of time entities, while economic growth (for instance) is measured in terms of % per year. Or do you have to ''let Stata know'' that we'r dealing with different entities. From your answer it seems like that is not a problem?
2. The skewness you display in the descriptive statistics:
for GINI, economic growth and average years of schooling it has the values 0.009, 0.528 and 0.1, respectively. But for FDI, quality of governance and the log of financial development, Stata says that the skewness is 0.000. However, when I make a histogram of these variables, they do not seem perfectly distributed...
Sorry, I have a hard time getting grip on how Stata works...
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#11

07 Jan 2022, 14:52

Julie:
as far as I can see the issues ( with no quantitative or graphical support, though):
1) when intepreting, say, the Gini index as a coefficient you should keep in mind that its variationd are expressed on percenrage points;
2) if you have many observations, analytical test on distribution moments may lead you astray. Visual inspection is, on average, more reliable in these instances.

Kind regards,
Carlo
(Stata 19.0)
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#12

08 Jan 2022, 03:31

Thanks once again... You are really helping me!
However, I do not have a lot of variables (max 300 per value). If visual inspection then shows that the variable is not normally distributed, but the skewness test gives a skewness value of 0.000, what is going wrong?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#13

08 Jan 2022, 04:14

Julie:
1) do you mean that you have about 300 observations for each variable?
2)

A normal distribution and any other symmetric distribution with finite third moment has a skewness of 0

(quoted from https://en.wikipedia.org/wiki/Skewness)

Kind regards,
Carlo
(Stata 19.0)
Comment
Julie Janssen

Join Date: Jan 2022

Posts: 8
#14

08 Jan 2022, 04:18

Yes, indeed: 300 variables for each observation.
For, for instance, inflation the minimum (rounded) is -2.5, the maximum is 44.4, the mean is 6.9, and the standard deviation is 6.9. However, the sktest indicates a skewness of 0.000, which doesn't make any sense I believe?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

#15

08 Jan 2022, 04:52

Julie:
as you do not post any excerpt/example of the variable you're complaining about, guess-work is the only available tool from my side:

Code:

. corr2data A, n(300) means(6.9) sds(6.9)
(obs 300)

. sum

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
           A |        300         6.9         6.9  -9.816057   24.36199

. sum, d

                              A
-------------------------------------------------------------
      Percentiles      Smallest
 1%     -7.30485      -9.816057
 5%    -5.021238       -9.16327
10%    -2.424862      -7.631421       Obs                 300
25%     2.119836      -6.978278       Sum of wgt.         300

50%     6.830155                      Mean                6.9
                        Largest       Std. dev.           6.9
75%     11.63588       21.69975
90%     15.63859       22.25993       Variance          47.61
95%     19.09629       22.31519       Skewness       .0451692
99%     21.97984       24.36199       Kurtosis        2.61633

Skewness is appraoching 0, as the tails of the distribution are similar:

Code:

. centile A, centile(2.5 97.5)

                                                          Binom. interp.  
    Variable |       Obs  Percentile    Centile        [95% conf. interval]
-------------+-------------------------------------------------------------
           A |       300        2.5   -6.203067       -7.531035   -5.205158
             |                 97.5    20.74849        19.52786    22.17383

. count if A<=-6.203067
  7

. count if A>= 20.74849
  7

.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Fixed Effect panel data using log transformation for ease of interpretation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment