Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insignificant variable results in Fixed Effects regression

    Currently, I'm working on my thesis working doing a Fixed Effects regression in a dataset consisting of 1,301 observations with 7 variables. I'm using Stata 16.0. The dependent variable is CO2 emissions per capita, whereas the independent variables are government ideology (categorial; -1 right-wing; 0 center; 1 left-wing government), Herfindahl index (from 0 to 1), Polity 2 score (from -10 to 10), Urban pop (% urban pop of total pop), trade openness (% trade of total gdp), log of gdp_per_capita and gdp^2 (in millions). The Fixed Effects model came as most appropriate by doing the Hausman test.

    From previous post on the forum, I managed to structure the data and do a -xtreg, fe- regression resulting in the following results by entering the following command:

    Code:
    xtreg co2_per_capita execrlc herfgov polity2 urban_pop trade_open log_gdp gdp2, fe robust
    Code:
    Fixed-effects (within) regression               Number of obs     =        995
    Group variable: panel_id                        Number of groups  =         41
    
    R-sq:                                           Obs per group:
         within  = 0.4099                                         min =          1
         between = 0.5192                                         avg =       24.3
         overall = 0.4376                                         max =         44
    
                                                    F(7,40)           =      48.18
    corr(u_i, Xb)  = 0.2276                         Prob > F          =     0.0000
    
                                  (Std. Err. adjusted for 41 clusters in panel_id)
    ------------------------------------------------------------------------------
                 |               Robust
    co2_per_ca~a |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
         execrlc |   .0389004   .0468245     0.83   0.411    -.0557355    .1335363
         herfgov |   .0126363   .1187266     0.11   0.916    -.2273191    .2525917
         polity2 |  -.0325272   .0187234    -1.74   0.090    -.0703687    .0053142
       urban_pop |   .0361227   .0145646     2.48   0.017     .0066866    .0655589
      trade_open |   .0065079   .0039343     1.65   0.106    -.0014437    .0144595
    gdp_per_ca~a |   .0000583   .0000341     1.71   0.095    -.0000106    .0001273
            gdp2 |  -.0005166   .0009303    -0.56   0.582    -.0023969    .0013636
           _cons |   -.363398   .7462756    -0.49   0.629    -1.871677    1.144881
    -------------+----------------------------------------------------------------
         sigma_u |  1.5154859
         sigma_e |  .36628114
             rho |  .94480887   (fraction of variance due to u_i)
    The summary of my variables:
    Code:
       Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
        panel_id |      1,301    163.6595    107.1012          6        377
         country |          0
            code |          0
            year |      1,301    1997.003    11.61816       1975       2018
    co2_per_ca~a |        995    2.042711    1.634142       .041      6.496
    -------------+---------------------------------------------------------
     co2_per_gdp |      1,215    .3726897    .2829472       .038       2.61
         execrlc |      1,301    .2221368    .8973476         -1          1
         herfgov |      1,301    .8013089    .2756591   .0743667          1
         polity2 |      1,301     4.68947    6.330302         -9         10
       urban_pop |      1,301    58.92987    20.57164      7.834     97.403
    -------------+---------------------------------------------------------
      trade_open |      1,301    61.76426    30.59038   8.384615   152.5161
    gdp_per_ca~a |      1,301    6184.989    8368.914   104.2722   38542.72
            gdp2 |      1,301     108.239    262.4872   .0108727   1485.541
            left |      1,301    .5380477     .498742          0          1
           right |      1,301    .3159108    .4650564          0          1
    -------------+---------------------------------------------------------
         log_gdp |      1,301    7.901027    1.356007   4.647005   10.55952
    My questions are the following:
    1) For the GDP per capita^2, I had to divide the variable by 1,000,000 to get results from the regression. Is this normal?
    2) My overall regression seems significant whereas my variable of interest, government ideology (execrlc) is not. Are there ways to fix this? Should I do another regression method or include/exclude some variables?
    3) Should I include interaction terms or should other variables be taken the log of (or undo the log of some)?

    I'm quite uncertain on what to do as I can't figure out what the next step should be and where to go. Hopefully, you guys can help me on this

  • #2
    Stijn:
    I cannot fing anything sinister in your regression (provided that your set of predictors gives the fairest and truest view of teh data generating process you're investigating).
    You probably searched for turning points with linear and square -gdp_per_capita-, but your results do not support any non-linear relationship with the regressand: hence, you can safely re-run your regerssion with the linear term only.
    That said, the best way to create interactions and catgorical variables with Stata is to rely on the wonderful capabilities of -fvvarlist- notation:
    Code:
    c.gdp_per_capita##c.gdp_per_capita
    Your variable of interest is not significant: this is simply a matter of fact that does not make your regression good or bad.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      One thing to consider is that fixed effects models only use variability within units (I presume that is in your case countries) to identify the parameters. If most countries don't change much, then there isn't much information that can be used to identify the effects (i.e. large standard errors and unstable results). Given the variables names and the timeframe (max 44 years?) I would not be surprised that most countries are just too stable to reliably estimate a fixed effects model. Also, if just a few countries experience big changes, then they will dominate your estimates. Is that what you want? Could be, but it is also possible that it would be a bad thing. At the very least it is something to be aware of, to find out if that is the case, and who those influential countries are, and make a deliberate and open decision.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Both thank you for your fast responses.

        Originally posted by Carlo Lazzaro View Post
        Stijn:
        I cannot fing anything sinister in your regression (provided that your set of predictors gives the fairest and truest view of teh data generating process you're investigating).
        You probably searched for turning points with linear and square -gdp_per_capita-, but your results do not support any non-linear relationship with the regressand: hence, you can safely re-run your regerssion with the linear term only.
        That said, the best way to create interactions and catgorical variables with Stata is to rely on the wonderful capabilities of -fvvarlist- notation:
        Code:
        c.gdp_per_capita##c.gdp_per_capita
        Your variable of interest is not significant: this is simply a matter of fact that does not make your regression good or bad.
        I read the -fvvarlist- page of Stata and looked up the videos recommended on that page. When I try c.gdp_per_capita##c.gdp_per_capita however, I didn't give the same results as when I enter gdp2 and gdp_per_capita on their own. Therefore, I used the following equation and got these results:

        Code:
        xtreg co2_per_capita ib(2).execrlc herfgov polity2 urban_pop trade_open gdp2 log_gdp, fe robust
        Code:
        Fixed-effects (within) regression               Number of obs     =        995
        Group variable: panel_id                        Number of groups  =         41
        
        R-sq:                                           Obs per group:
             within  = 0.4365                                         min =          1
             between = 0.5187                                         avg =       24.3
             overall = 0.4516                                         max =         44
        
                                                        F(8,40)           =      16.52
        corr(u_i, Xb)  = 0.3363                         Prob > F          =     0.0000
        
                                      (Std. Err. adjusted for 41 clusters in panel_id)
        ------------------------------------------------------------------------------
                     |               Robust
        co2_per_ca~a |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             execrlc |
              Right  |   .0423436   .0868627     0.49   0.629    -.1332125    .2178997
               Left  |  -.0409511    .091005    -0.45   0.655    -.2248791    .1429768
                     |
             herfgov |  -.0086094   .1171134    -0.07   0.942    -.2453044    .2280856
             polity2 |  -.0316682   .0184227    -1.72   0.093    -.0689018    .0055654
           urban_pop |   .0200203   .0105151     1.90   0.064    -.0012315    .0412721
          trade_open |   .0061216   .0038347     1.60   0.118    -.0016285    .0138718
                gdp2 |   .0007584   .0004152     1.83   0.075    -.0000808    .0015975
             log_gdp |   .2954222   .0840796     3.51   0.001      .125491    .4653534
               _cons |   -1.50268   .7987724    -1.88   0.067    -3.117059    .1116992
        -------------+----------------------------------------------------------------
             sigma_u |  1.5852772
             sigma_e |  .35812024
                 rho |  .95144531   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        The results looked more significant than before, but I was wondering how I could explain the non-significance of my variable of interest. How could I explain that my model is significant but my variables aren't?

        Originally posted by Maarten Buis View Post
        One thing to consider is that fixed effects models only use variability within units (I presume that is in your case countries) to identify the parameters. If most countries don't change much, then there isn't much information that can be used to identify the effects (i.e. large standard errors and unstable results). Given the variables names and the timeframe (max 44 years?) I would not be surprised that most countries are just too stable to reliably estimate a fixed effects model. Also, if just a few countries experience big changes, then they will dominate your estimates. Is that what you want? Could be, but it is also possible that it would be a bad thing. At the very least it is something to be aware of, to find out if that is the case, and who those influential countries are, and make a deliberate and open decision.
        You are probably right. I looked at the data and saw that some of the countries didn't even change their government ideology over time (e.g. Belgium). What model do you think is more appropriate to use in this case?

        Comment


        • #5
          Looked at it and discovered that I still had the entry of GDP2 = GDP2/1,000,000. This is probably what caused the difference in result between c.gdp_per_capita##c.gdp_per_capita and GDP2. However, still without dividing it by a million, the results won't show for the regression.

          Comment


          • #6
            StiJn:
            first, I would double-check whethet -GDP2-=-gdp_per_capita-^2.
            That said, when the model is jointly significant but predictors are not, it may well be that you have a quasi-multicollinearity issue (basically, at least two variables are highly correlated and -xtreg- cannot partition their contributions to variation in regressand).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Sorry for bothering you once again, but I was wondering the following.

              Originally posted by Carlo Lazzaro View Post
              StiJn:
              first, I would double-check whethet -GDP2-=-gdp_per_capita-^2.
              That said, when the model is jointly significant but predictors are not, it may well be that you have a quasi-multicollinearity issue (basically, at least two variables are highly correlated and -xtreg- cannot partition their contributions to variation in regressand).
              I have put the equation for GDP2 down below and it looks like that should not cause the issue.
              Code:
              gen gdp2 = (gdp_per_capita)^2
              I've read upon quasi-multicollinearity and saw some other posts you made about this topic. I did a Collinearity Diagnostics using the -collin- command before doing the regression and came with the following statistics:

              Code:
              Collinearity Diagnostics
              
                                      SQRT                   R-
                Variable      VIF     VIF    Tolerance    Squared
              ----------------------------------------------------
              co2_per_capita      2.52    1.59    0.3964      0.6036
                 execrlc      1.28    1.13    0.7806      0.2194
                 herfgov      1.18    1.09    0.8463      0.1537
                 polity2      1.40    1.18    0.7127      0.2873
               urban_pop      1.88    1.37    0.5325      0.4675
              trade_open      1.06    1.03    0.9429      0.0571
              gdp_per_capita     11.40    3.38    0.0877      0.9123
                    gdp2      7.37    2.72    0.1356      0.8644
              ----------------------------------------------------
                Mean VIF      3.51
              
                                         Cond
                      Eigenval          Index
              ---------------------------------
                  1     6.1939          1.0000
                  2     1.5129          2.0234
                  3     0.5694          3.2982
                  4     0.3352          4.2985
                  5     0.1779          5.9012
                  6     0.0911          8.2476
                  7     0.0645          9.8022
                  8     0.0377         12.8167
                  9     0.0175         18.8161
              ---------------------------------
               Condition Number        18.8161 
               Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
               Det(correlation matrix)    0.0261
              In your post (https://www.statalist.org/forums/for...earity-and-vif) you said that any mean VIF of above 1 is reason for concern. Mine is 3.51, so I'm quite in on a challenge. On this post (https://www.stata.com/statalist/arch.../msg01063.html) they stated that you should leave out the variables which cause the collinearity. Does that mean I should exclude gdp_per_capita and gdp2 as it is the main concern for the collinearity?

              Comment


              • #8
                Stijn:
                thanks for your update.
                What if you run -estat vce, corr- after -xtreg,fe-?

                As far as your first question is concerned, in the following toy-example the coefficient are identical regardless I go -fvvarlist- or create interaction myself:
                Code:
                . use "https://www.stata-press.com/data/r16/nlswork.dta"
                (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                
                . g sqage=age^2
                (24 missing values generated)
                
                . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
                
                Fixed-effects (within) regression               Number of obs     =     28,510
                Group variable: idcode                          Number of groups  =      4,710
                
                R-sq:                                           Obs per group:
                     within  = 0.1087                                         min =          1
                     between = 0.1006                                         avg =        6.1
                     overall = 0.0865                                         max =         15
                
                                                                F(2,4709)         =     507.42
                corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
                
                                             (Std. Err. adjusted for 4,710 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                             |
                 c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                             |
                       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
                -------------+----------------------------------------------------------------
                     sigma_u |   .4039153
                     sigma_e |  .30245467
                         rho |  .64073314   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                . xtreg ln_wage age sqage,  fe vce(cluster idcode)
                
                Fixed-effects (within) regression               Number of obs     =     28,510
                Group variable: idcode                          Number of groups  =      4,710
                
                R-sq:                                           Obs per group:
                     within  = 0.1087                                         min =          1
                     between = 0.1006                                         avg =        6.1
                     overall = 0.0865                                         max =         15
                
                                                                F(2,4709)         =     507.42
                corr(u_i, Xb)  = 0.0440                         Prob > F          =     0.0000
                
                                             (Std. Err. adjusted for 4,710 clusters in idcode)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                         age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                       sqage |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                       _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
                -------------+----------------------------------------------------------------
                     sigma_u |   .4039153
                     sigma_e |  .30245467
                         rho |  .64073314   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------
                
                .
                Last edited by Carlo Lazzaro; 07 May 2021, 11:55.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Stijn:
                  thanks for your update.
                  What if you run -estat vce, corr- after -xtreg,fe-?

                  As far as your first question is concerned, in the following toy-example the coefficient are identical regardless I go -fvvarlist- or create interaction myself:
                  When I run the correlation matrix I get the following results indicating a high correlation of GDP per capita with the other variables (which does makes sense in some cases):

                  Code:
                  Correlation matrix of coefficients of xtreg model
                  
                               |        1.        3.                                        
                          e(V) |  execrlc   execrlc   herfgov   polity2  gdp_pe~a      gdp2 
                  -------------+------------------------------------------------------------
                     1.execrlc |   1.0000                                                   
                     3.execrlc |   0.9767    1.0000                                         
                       herfgov |  -0.0500   -0.0629    1.0000                               
                       polity2 |   0.2317    0.2361    0.0923    1.0000                     
                  gdp_per_ca~a |  -0.6605   -0.5730   -0.1289   -0.4516    1.0000           
                          gdp2 |   0.4683    0.4170    0.0316    0.4029   -0.8564    1.0000 
                         _cons |  -0.6556   -0.7167   -0.5020   -0.2094    0.1908   -0.0937 
                  
                               |          
                          e(V) |    _cons 
                  -------------+----------
                         _cons |   1.0000 
                  
                  1. execrlc = right-wing ideology, 3. execrlc = left-wing
                  About the interaction term: You're right. When I do it like that I got the same results. However, I get no for the F-value and Prob > F. When I divide the GDP per capita by a million I get the results mentioned before. I have two questions regarding this:

                  1) Is it okay to divide GDP per capita squared by a million if I mention it in the paper or will this cause biased results?
                  2) Apart from clustering the SD, is there another way to fix for multicollinearity?

                  Thank you in advance. You have been a big help so far.

                  Comment


                  • #10
                    The political systems of countries like Belgium and Switzerland are weird, and I can imagine various measures of political ideology that would (incorrectly) show no change over time. So I would look at that measure in more detail. What does it exactly measure? Is that meaningful for all countries you whish to study? What are the alternatives? In all likelihood your measure of ideology does not measure what you think it measures.

                    If there isn't enough information present when only looking at changes within countries, then you need to also include differences between countries. That means a random effect model.

                    As to your quadratic effect: I would use GDP per capita /1000, and its square. Do you think that a single dollar/euro/yen/krone/... increase would lead to a meaningful change in co2 emissions? Moreover, this may also help with the stability of your model. Of course you should use the factor variable notation.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      I would first try taking the log of co2_per_capita and the same with your GDP variable. You can also include the square of the log (not the log of the square). Then you will estimate an elasticity. This won't solve the problem of little variation over time, but my guess is that it's a better starting point.

                      Comment


                      • #12
                        Originally posted by Maarten Buis View Post
                        The political systems of countries like Belgium and Switzerland are weird, and I can imagine various measures of political ideology that would (incorrectly) show no change over time. So I would look at that measure in more detail. What does it exactly measure? Is that meaningful for all countries you whish to study? What are the alternatives? In all likelihood your measure of ideology does not measure what you think it measures.

                        If there isn't enough information present when only looking at changes within countries, then you need to also include differences between countries. That means a random effect model.

                        As to your quadratic effect: I would use GDP per capita /1000, and its square. Do you think that a single dollar/euro/yen/krone/... increase would lead to a meaningful change in co2 emissions? Moreover, this may also help with the stability of your model. Of course you should use the factor variable notation.
                        The government ideology is measured by party orientation with respect to economic policy, coded based on the description of the party in the sources, using the following criteria: Right: for parties that are defined as conservative, Christian democratic, or right-wing. Left: for parties that are defined as communist, socialist, social democratic, or left-wing. (Source: DPI2020). It has its flaws but for the variable government ideology it's probably the best way to come near the real values. I carried out the Hausman test and obtained that the Fixed Effects model was the most appropriate. However, could I argue in my thesis that despite the result of the Hausman test, I went with RE since not enough data is found for differences within countries? Or should the Hausman test be leading in this?

                        Originally posted by Jeff Wooldridge View Post
                        I would first try taking the log of co2_per_capita and the same with your GDP variable. You can also include the square of the log (not the log of the square). Then you will estimate an elasticity. This won't solve the problem of little variation over time, but my guess is that it's a better starting point.
                        I followed your advice and discovered that this might be the more appropriate way to use the variables. However, I had two questions regarding this:
                        1) What is the reason for taking the log of co2 and GDP? Why is it more sufficient in this case than not taking the log?
                        2) What do you mean by saying "estimate an elasticity"? What should I do with it?

                        Thank you both for the help!

                        Comment


                        • #13
                          What party do you consider in each country? Those who form the government? That is not the right choice for each country. You have mentioned Belgium, where the political institutions result in extremely elaborate coalitions typically including many radically different flavours of political parties, so the average political orientation does not change much. Another such example is Switzerland, although the institutions, level of conflict, historical reasons, etc. are completely different.
                          ---------------------------------
                          Maarten L. Buis
                          University of Konstanz
                          Department of history and sociology
                          box 40
                          78457 Konstanz
                          Germany
                          http://www.maartenbuis.nl
                          ---------------------------------

                          Comment


                          • #14
                            Stijn: Taking the log of variables that have wide variation and are always strictly positive is a staple of empirical economics. Without the quadratic, you will get a coefficient, such as 0.341, which will tell you that a 1% increase in GDP per capita leads to a .341% increase in CO2 emissions per capita. To me, this makes more sense than whatever units of measurement your CO2 variable is in. Plus, it's free of global inflation effects on GDP. As a statistical matter, you will reduce the chance of outliers influencing the results, and the "traditional" assumptions of normality and homoskedasticity are usually closer to being true. I recommend you read Chapter 6 in my introductory econometrics book. Older versions are easy to find ....

                            Comment

                            Working...
                            X