Insignificant variable results in Fixed Effects regression

Stijn Braams

Join Date: Apr 2021
Posts: 10

Insignificant variable results in Fixed Effects regression

06 May 2021, 08:00

Currently, I'm working on my thesis working doing a Fixed Effects regression in a dataset consisting of 1,301 observations with 7 variables. I'm using Stata 16.0. The dependent variable is CO2 emissions per capita, whereas the independent variables are government ideology (categorial; -1 right-wing; 0 center; 1 left-wing government), Herfindahl index (from 0 to 1), Polity 2 score (from -10 to 10), Urban pop (% urban pop of total pop), trade openness (% trade of total gdp), log of gdp_per_capita and gdp^2 (in millions). The Fixed Effects model came as most appropriate by doing the Hausman test.

From previous post on the forum, I managed to structure the data and do a -xtreg, fe- regression resulting in the following results by entering the following command:

Code:

xtreg co2_per_capita execrlc herfgov polity2 urban_pop trade_open log_gdp gdp2, fe robust

Code:

Fixed-effects (within) regression               Number of obs     =        995
Group variable: panel_id                        Number of groups  =         41

R-sq:                                           Obs per group:
     within  = 0.4099                                         min =          1
     between = 0.5192                                         avg =       24.3
     overall = 0.4376                                         max =         44

                                                F(7,40)           =      48.18
corr(u_i, Xb)  = 0.2276                         Prob > F          =     0.0000

                              (Std. Err. adjusted for 41 clusters in panel_id)
------------------------------------------------------------------------------
             |               Robust
co2_per_ca~a |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     execrlc |   .0389004   .0468245     0.83   0.411    -.0557355    .1335363
     herfgov |   .0126363   .1187266     0.11   0.916    -.2273191    .2525917
     polity2 |  -.0325272   .0187234    -1.74   0.090    -.0703687    .0053142
   urban_pop |   .0361227   .0145646     2.48   0.017     .0066866    .0655589
  trade_open |   .0065079   .0039343     1.65   0.106    -.0014437    .0144595
gdp_per_ca~a |   .0000583   .0000341     1.71   0.095    -.0000106    .0001273
        gdp2 |  -.0005166   .0009303    -0.56   0.582    -.0023969    .0013636
       _cons |   -.363398   .7462756    -0.49   0.629    -1.871677    1.144881
-------------+----------------------------------------------------------------
     sigma_u |  1.5154859
     sigma_e |  .36628114
         rho |  .94480887   (fraction of variance due to u_i)

The summary of my variables:

Code:

   Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
    panel_id |      1,301    163.6595    107.1012          6        377
     country |          0
        code |          0
        year |      1,301    1997.003    11.61816       1975       2018
co2_per_ca~a |        995    2.042711    1.634142       .041      6.496
-------------+---------------------------------------------------------
 co2_per_gdp |      1,215    .3726897    .2829472       .038       2.61
     execrlc |      1,301    .2221368    .8973476         -1          1
     herfgov |      1,301    .8013089    .2756591   .0743667          1
     polity2 |      1,301     4.68947    6.330302         -9         10
   urban_pop |      1,301    58.92987    20.57164      7.834     97.403
-------------+---------------------------------------------------------
  trade_open |      1,301    61.76426    30.59038   8.384615   152.5161
gdp_per_ca~a |      1,301    6184.989    8368.914   104.2722   38542.72
        gdp2 |      1,301     108.239    262.4872   .0108727   1485.541
        left |      1,301    .5380477     .498742          0          1
       right |      1,301    .3159108    .4650564          0          1
-------------+---------------------------------------------------------
     log_gdp |      1,301    7.901027    1.356007   4.647005   10.55952

My questions are the following:
1) For the GDP per capita^2, I had to divide the variable by 1,000,000 to get results from the regression. Is this normal?
2) My overall regression seems significant whereas my variable of interest, government ideology (execrlc) is not. Are there ways to fix this? Should I do another regression method or include/exclude some variables?
3) Should I include interaction terms or should other variables be taken the log of (or undo the log of some)?

I'm quite uncertain on what to do as I can't figure out what the next step should be and where to go. Hopefully, you guys can help me on this

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#2

06 May 2021, 08:34

Stijn:
I cannot fing anything sinister in your regression (provided that your set of predictors gives the fairest and truest view of teh data generating process you're investigating).
You probably searched for turning points with linear and square -gdp_per_capita-, but your results do not support any non-linear relationship with the regressand: hence, you can safely re-run your regerssion with the linear term only.
That said, the best way to create interactions and catgorical variables with Stata is to rely on the wonderful capabilities of -fvvarlist- notation:

Code:

c.gdp_per_capita##c.gdp_per_capita

Your variable of interest is not significant: this is simply a matter of fact that does not make your regression good or bad.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#3

06 May 2021, 08:48

One thing to consider is that fixed effects models only use variability within units (I presume that is in your case countries) to identify the parameters. If most countries don't change much, then there isn't much information that can be used to identify the effects (i.e. large standard errors and unstable results). Given the variables names and the timeframe (max 44 years?) I would not be surprised that most countries are just too stable to reliably estimate a fixed effects model. Also, if just a few countries experience big changes, then they will dominate your estimates. Is that what you want? Could be, but it is also possible that it would be a bad thing. At the very least it is something to be aware of, to find out if that is the case, and who those influential countries are, and make a deliberate and open decision.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Stijn Braams

Join Date: Apr 2021

Posts: 10
#4

07 May 2021, 05:16

Both thank you for your fast responses.

Originally posted by Carlo Lazzaro View Post

Stijn:
I cannot fing anything sinister in your regression (provided that your set of predictors gives the fairest and truest view of teh data generating process you're investigating).
You probably searched for turning points with linear and square -gdp_per_capita-, but your results do not support any non-linear relationship with the regressand: hence, you can safely re-run your regerssion with the linear term only.
That said, the best way to create interactions and catgorical variables with Stata is to rely on the wonderful capabilities of -fvvarlist- notation:

Code:

c.gdp_per_capita##c.gdp_per_capita

Your variable of interest is not significant: this is simply a matter of fact that does not make your regression good or bad.

I read the -fvvarlist- page of Stata and looked up the videos recommended on that page. When I try c.gdp_per_capita##c.gdp_per_capita however, I didn't give the same results as when I enter gdp2 and gdp_per_capita on their own. Therefore, I used the following equation and got these results:

Code:

xtreg co2_per_capita ib(2).execrlc herfgov polity2 urban_pop trade_open gdp2 log_gdp, fe robust

Code:

Fixed-effects (within) regression Number of obs = 995 Group variable: panel_id Number of groups = 41 R-sq: Obs per group: within = 0.4365 min = 1 between = 0.5187 avg = 24.3 overall = 0.4516 max = 44 F(8,40) = 16.52 corr(u_i, Xb) = 0.3363 Prob > F = 0.0000 (Std. Err. adjusted for 41 clusters in panel_id) ------------------------------------------------------------------------------ | Robust co2_per_ca~a | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- execrlc | Right | .0423436 .0868627 0.49 0.629 -.1332125 .2178997 Left | -.0409511 .091005 -0.45 0.655 -.2248791 .1429768 | herfgov | -.0086094 .1171134 -0.07 0.942 -.2453044 .2280856 polity2 | -.0316682 .0184227 -1.72 0.093 -.0689018 .0055654 urban_pop | .0200203 .0105151 1.90 0.064 -.0012315 .0412721 trade_open | .0061216 .0038347 1.60 0.118 -.0016285 .0138718 gdp2 | .0007584 .0004152 1.83 0.075 -.0000808 .0015975 log_gdp | .2954222 .0840796 3.51 0.001 .125491 .4653534 _cons | -1.50268 .7987724 -1.88 0.067 -3.117059 .1116992 -------------+---------------------------------------------------------------- sigma_u | 1.5852772 sigma_e | .35812024 rho | .95144531 (fraction of variance due to u_i) ------------------------------------------------------------------------------

The results looked more significant than before, but I was wondering how I could explain the non-significance of my variable of interest. How could I explain that my model is significant but my variables aren't?

Originally posted by Maarten Buis View Post

One thing to consider is that fixed effects models only use variability within units (I presume that is in your case countries) to identify the parameters. If most countries don't change much, then there isn't much information that can be used to identify the effects (i.e. large standard errors and unstable results). Given the variables names and the timeframe (max 44 years?) I would not be surprised that most countries are just too stable to reliably estimate a fixed effects model. Also, if just a few countries experience big changes, then they will dominate your estimates. Is that what you want? Could be, but it is also possible that it would be a bad thing. At the very least it is something to be aware of, to find out if that is the case, and who those influential countries are, and make a deliberate and open decision.

You are probably right. I looked at the data and saw that some of the countries didn't even change their government ideology over time (e.g. Belgium). What model do you think is more appropriate to use in this case?
Comment
Stijn Braams

Join Date: Apr 2021

Posts: 10
#5

07 May 2021, 08:13

Looked at it and discovered that I still had the entry of GDP2 = GDP2/1,000,000. This is probably what caused the difference in result between c.gdp_per_capita##c.gdp_per_capita and GDP2. However, still without dividing it by a million, the results won't show for the regression.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#6

07 May 2021, 09:05

StiJn:
first, I would double-check whethet -GDP2-=-gdp_per_capita-^2.
That said, when the model is jointly significant but predictors are not, it may well be that you have a quasi-multicollinearity issue (basically, at least two variables are highly correlated and -xtreg- cannot partition their contributions to variation in regressand).

Kind regards,
Carlo
(Stata 19.0)
Comment
Stijn Braams

Join Date: Apr 2021

Posts: 10
#7

07 May 2021, 09:29

Sorry for bothering you once again, but I was wondering the following.

Originally posted by Carlo Lazzaro View Post

StiJn:
first, I would double-check whethet -GDP2-=-gdp_per_capita-^2.
That said, when the model is jointly significant but predictors are not, it may well be that you have a quasi-multicollinearity issue (basically, at least two variables are highly correlated and -xtreg- cannot partition their contributions to variation in regressand).

I have put the equation for GDP2 down below and it looks like that should not cause the issue.

Code:

gen gdp2 = (gdp_per_capita)^2

I've read upon quasi-multicollinearity and saw some other posts you made about this topic. I did a Collinearity Diagnostics using the -collin- command before doing the regression and came with the following statistics:

Code:

Collinearity Diagnostics SQRT R- Variable VIF VIF Tolerance Squared ---------------------------------------------------- co2_per_capita 2.52 1.59 0.3964 0.6036 execrlc 1.28 1.13 0.7806 0.2194 herfgov 1.18 1.09 0.8463 0.1537 polity2 1.40 1.18 0.7127 0.2873 urban_pop 1.88 1.37 0.5325 0.4675 trade_open 1.06 1.03 0.9429 0.0571 gdp_per_capita 11.40 3.38 0.0877 0.9123 gdp2 7.37 2.72 0.1356 0.8644 ---------------------------------------------------- Mean VIF 3.51 Cond Eigenval Index --------------------------------- 1 6.1939 1.0000 2 1.5129 2.0234 3 0.5694 3.2982 4 0.3352 4.2985 5 0.1779 5.9012 6 0.0911 8.2476 7 0.0645 9.8022 8 0.0377 12.8167 9 0.0175 18.8161 --------------------------------- Condition Number 18.8161 Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept) Det(correlation matrix) 0.0261

In your post (https://www.statalist.org/forums/for...earity-and-vif) you said that any mean VIF of above 1 is reason for concern. Mine is 3.51, so I'm quite in on a challenge. On this post (https://www.stata.com/statalist/arch.../msg01063.html) they stated that you should leave out the variables which cause the collinearity. Does that mean I should exclude gdp_per_capita and gdp2 as it is the main concern for the collinearity?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

07 May 2021, 11:50

Stijn:
thanks for your update.
What if you run -estat vce, corr- after -xtreg,fe-?

As far as your first question is concerned, in the following toy-example the coefficient are identical regardless I go -fvvarlist- or create interaction myself:

Code:

. use "https://www.stata-press.com/data/r16/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. g sqage=age^2
(24 missing values generated)

. xtreg ln_wage c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
             |
 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
             |
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtreg ln_wage age sqage,  fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1087                                         min =          1
     between = 0.1006                                         avg =        6.1
     overall = 0.0865                                         max =         15

                                                F(2,4709)         =     507.42
corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000

                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
       sqage |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
-------------+----------------------------------------------------------------
     sigma_u |   .4039153
     sigma_e |  .30245467
         rho |  .64073314   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Last edited by Carlo Lazzaro; 07 May 2021, 11:55.

Kind regards,
Carlo
(Stata 19.0)

Comment

Stijn Braams

Join Date: Apr 2021

Posts: 10
#9

07 May 2021, 12:11

Originally posted by Carlo Lazzaro View Post

Stijn:
thanks for your update.
What if you run -estat vce, corr- after -xtreg,fe-?

As far as your first question is concerned, in the following toy-example the coefficient are identical regardless I go -fvvarlist- or create interaction myself:

When I run the correlation matrix I get the following results indicating a high correlation of GDP per capita with the other variables (which does makes sense in some cases):

Code:

Correlation matrix of coefficients of xtreg model | 1. 3. e(V) | execrlc execrlc herfgov polity2 gdp_pe~a gdp2 -------------+------------------------------------------------------------ 1.execrlc | 1.0000 3.execrlc | 0.9767 1.0000 herfgov | -0.0500 -0.0629 1.0000 polity2 | 0.2317 0.2361 0.0923 1.0000 gdp_per_ca~a | -0.6605 -0.5730 -0.1289 -0.4516 1.0000 gdp2 | 0.4683 0.4170 0.0316 0.4029 -0.8564 1.0000 _cons | -0.6556 -0.7167 -0.5020 -0.2094 0.1908 -0.0937 | e(V) | _cons -------------+---------- _cons | 1.0000 1. execrlc = right-wing ideology, 3. execrlc = left-wing

About the interaction term: You're right. When I do it like that I got the same results. However, I get no for the F-value and Prob > F. When I divide the GDP per capita by a million I get the results mentioned before. I have two questions regarding this:

1) Is it okay to divide GDP per capita squared by a million if I mention it in the paper or will this cause biased results?
2) Apart from clustering the SD, is there another way to fix for multicollinearity?

Thank you in advance. You have been a big help so far.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#10

07 May 2021, 12:15

The political systems of countries like Belgium and Switzerland are weird, and I can imagine various measures of political ideology that would (incorrectly) show no change over time. So I would look at that measure in more detail. What does it exactly measure? Is that meaningful for all countries you whish to study? What are the alternatives? In all likelihood your measure of ideology does not measure what you think it measures.

If there isn't enough information present when only looking at changes within countries, then you need to also include differences between countries. That means a random effect model.

As to your quadratic effect: I would use GDP per capita /1000, and its square. Do you think that a single dollar/euro/yen/krone/... increase would lead to a meaningful change in co2 emissions? Moreover, this may also help with the stability of your model. Of course you should use the factor variable notation.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#11

07 May 2021, 20:57

I would first try taking the log of co2_per_capita and the same with your GDP variable. You can also include the square of the log (not the log of the square). Then you will estimate an elasticity. This won't solve the problem of little variation over time, but my guess is that it's a better starting point.
Comment
Stijn Braams

Join Date: Apr 2021

Posts: 10
#12

08 May 2021, 02:10

Originally posted by Maarten Buis View Post

The political systems of countries like Belgium and Switzerland are weird, and I can imagine various measures of political ideology that would (incorrectly) show no change over time. So I would look at that measure in more detail. What does it exactly measure? Is that meaningful for all countries you whish to study? What are the alternatives? In all likelihood your measure of ideology does not measure what you think it measures.

If there isn't enough information present when only looking at changes within countries, then you need to also include differences between countries. That means a random effect model.

As to your quadratic effect: I would use GDP per capita /1000, and its square. Do you think that a single dollar/euro/yen/krone/... increase would lead to a meaningful change in co2 emissions? Moreover, this may also help with the stability of your model. Of course you should use the factor variable notation.

The government ideology is measured by party orientation with respect to economic policy, coded based on the description of the party in the sources, using the following criteria: Right: for parties that are defined as conservative, Christian democratic, or right-wing. Left: for parties that are defined as communist, socialist, social democratic, or left-wing. (Source: DPI2020). It has its flaws but for the variable government ideology it's probably the best way to come near the real values. I carried out the Hausman test and obtained that the Fixed Effects model was the most appropriate. However, could I argue in my thesis that despite the result of the Hausman test, I went with RE since not enough data is found for differences within countries? Or should the Hausman test be leading in this?

Originally posted by Jeff Wooldridge View Post

I would first try taking the log of co2_per_capita and the same with your GDP variable. You can also include the square of the log (not the log of the square). Then you will estimate an elasticity. This won't solve the problem of little variation over time, but my guess is that it's a better starting point.

I followed your advice and discovered that this might be the more appropriate way to use the variables. However, I had two questions regarding this:
1) What is the reason for taking the log of co2 and GDP? Why is it more sufficient in this case than not taking the log?
2) What do you mean by saying "estimate an elasticity"? What should I do with it?

Thank you both for the help!
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3426
#13

08 May 2021, 03:51

What party do you consider in each country? Those who form the government? That is not the right choice for each country. You have mentioned Belgium, where the political institutions result in extremely elaborate coalitions typically including many radically different flavours of political parties, so the average political orientation does not change much. Another such example is Switzerland, although the institutions, level of conflict, historical reasons, etc. are completely different.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#14

08 May 2021, 08:06

Stijn: Taking the log of variables that have wide variation and are always strictly positive is a staple of empirical economics. Without the quadratic, you will get a coefficient, such as 0.341, which will tell you that a 1% increase in GDP per capita leads to a .341% increase in CO2 emissions per capita. To me, this makes more sense than whatever units of measurement your CO2 variable is in. Plus, it's free of global inflation effects on GDP. As a statistical matter, you will reduce the chance of outliers influencing the results, and the "traditional" assumptions of normality and homoskedasticity are usually closer to being true. I recommend you read Chapter 6 in my introductory econometrics book. Older versions are easy to find ....
Comment

Announcement

Insignificant variable results in Fixed Effects regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment