Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Year and industry fixed effects

    Hi guys,

    In my regression model, I need to control for year and industry fixed effects.
    However, I'm not familiar with using this in a regression and the information on the internet does not help me any further.

    See the CODE below for an example what my dataset looks like. Var1 is my dependent variable and var2 is my independent variable, which is a dummy variable. The others are all control variables, of which some are dummies. The first two columns are the fiscal year (fyear) and the CIK (identifier) numbers.

    I read that the code for including year-fixed effects is xtreg, but how does this work? Do I simply type xtreg instead of regress to conduct the regression analysis? And is the only difference between the two the year-fixed effects? Or do I need another code? I already 'created' panel data by successfully using the xtset command.

    Second, how does this work for industry fixed effects? I know that I need to have the SIC (industry) codes for the firms, but when I do, what code(s) do I need to type in order to get industry fixed effects, so that I will run the regression with both year- and industry-fixed effects?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double fyear long CIK int var1 byte var2 float(logSIZE_w MTB_w) int AGE_IPO_w float(spiDummy LOGoneplusNBSEG_w LOGoneplusNGSEG_w ROA_w) byte(MAdummy SEOdummy DummyDLW)
    2015  827871 100 0  7.234498 11.401678 1 0 1.3862944  .6931472   .10729168 0 1 1
    2016  827871  99 0  7.103163  5.967246 2 1 1.3862944  .6931472    .1150776 0 0 1
    2017  827871 100 0  6.675983  3.273505 3 1 1.3862944  .6931472  .035843123 0 0 1
    2012  858339  96 0  6.765924 1.0428375 0 1  .6931472  .6931472  .004796609 0 1 1
    2013  858339 101 0  7.988432 1.1964637 1 1 1.3862944  .6931472  .029811475 0 1 1
    2014  858339 106 0  7.729757 1.2981538 2 1  2.772589  .6931472  .013939634 1 1 1
    2015  858339 101 0   7.04233  .9107052 3 0  2.772589  .6931472   .28276432 0 0 1
    2016  858339 104 0  7.130499  1.179099 4 1  2.772589  .6931472   .06878565 0 0 1
    2017  858339 104 0  9.065615  1.209964 5 1  .6931472  .6931472   .07283562 1 0 1
    2012  880177  90 1  6.189701 2.8622916 0 0  .6931472  .6931472  .015024709 0 1 1
    2013  880177  93 1  5.490572  1.738865 1 0  .6931472  .6931472   .02839049 0 0 1
    2015  883980  93 0  9.468133  1.310466 0 1  2.564949   1.94591   .01646152 0 1 1
    2016  883980  92 0  9.468133 1.2168345 1 1  2.564949   1.94591 .0030276144 0 0 1
    2017  883980  90 0  9.468133  1.194136 2 1  2.564949   1.94591 .0080108065 0 1 1
    2017  912766  90 0  7.840877 1.0732107 0 1  .6931472  2.772589   .02082115 0 1 1
    2014  921299 103 1  7.386668  3.840873 0 0 1.3862944  2.772589  .007237051 0 1 1
    2015  921299 108 0  7.543635  4.595307 1 0 1.3862944  2.772589   .10401749 0 0 1
    Last edited by Pepijn Peters; 15 Oct 2019, 05:40.

  • #2
    There are two steps involved to do fixed-effects modeling. The first is to -xtset- the data. If you read -help xtset- you will see that -xtset- allows you to specify one or two variables. The first variable is mandatory and identifies the panel or group--in your case that would be the SIC code. This variable must be numeric. I have only limited experience with SIC codes: the ones I have used are numbers, but if there are some that contain non-numeric characters, then you must first -encode- them to create a new, numeric variable that -xtset- will accept. The second variable in -xtset- is optional: it specifies a time variable. This second variable is only allowed if the combination of the panel variable and the time variable uniquely identify observations in the data set. That is, you cannot specify a time variable in -xtset- if there is any panel which has more than one observation for the same year. As your example data does not contain any SIC code variable (or if it does, I can't recognize it) I don't know if you have this problem or not. In any case, unless you will eventually need to do analyses requiring lags, leads, or autocorrelations, the second variable in -xtset- is unnecessary.

    Once your data are -xtset-, you can then use any of Stata's -xt- commands, such as -xtreg-. When you use an -xt- command, fixed effects corresponding to the panel variable in your -xtset- command are automatically included in the model (but results for those effects are not shown in the output). The time variable, even if specified in -xtset-, is not automatically included in the model. If you want time fixed effects as well as industry, you will need to explicitly include the time variable in the model. So what we're looking at is something like this:

    Code:
    xtset sic_code_variable // AND POSSIBLY year
    xtreg outcome predictor_variables i.year, fe
    will do a regression that includes fixed effects for both sic code and year. There will be results shown for the years, but not for the sic's.

    Comment


    • #3
      Hi Clyde,

      Thanks for your comprehensive response; this really helps!

      Let me inform you a bit more on my procedures:

      I did the xtset already, on CIK and fyear. CIK is an unique company identifier. I read about the fixed effects and when I incorporate the firm fixed effects (based on CIK numbers), I would not need the industry fixed effects (they are collinear).

      Now, I did the xtreg regression on my dependent, independent and control variables, and also included i.fyear at the end of the command (to include year fixed effects). I think what you say is that the firm fixed effects are automatically incorporated in the regression because I did the xtset on CIK numbers, right? The "number of groups" shown in the regression output is exactly the number of different CIK numbers in my sample, so I think this is true.

      Now what botters me is that when I used the regress command, the output gave me a large positive coefficient - which was significant - on the independent variable. When I use the xtreg, the output gives me an insignificant negative coefficient. Is there anything I could do here?

      If you would need an example, please ask.
      Last edited by Pepijn Peters; 16 Oct 2019, 00:47.

      Comment


      • #4
        Pepijn:
        two comments about your last reply:
        1) as per FAQ, please share what you typed and what Stata gave you back. It worths more that tons of words: it is always difficult to explain qualitatively what is essentially quantitative;
        2) you're probably comparing two different regression models. What happens when, in -regress-, you add in the right-hand side of your regression equation the -i.CIK- predictor?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo,

          1. Please see below for the two outputs of the regressions. The first is the output when I use regress as the command (and include i.fyear) and the second is the output when I use xtreg as the command (and include i.fyear).

          Regress:
          Code:
                Source |       SS           df       MS      Number of obs   =     1,216
          -------------+----------------------------------   F(17, 1198)     =     20.16
                 Model |  17587.3082        17  1034.54754   Prob > F        =    0.0000
              Residual |  61489.7938     1,198  51.3270399   R-squared       =    0.2224
          -------------+----------------------------------   Adj R-squared   =    0.2114
                 Total |   79077.102     1,215  65.0840345   Root MSE        =    7.1643
          
          -----------------------------------------------------------------------------------
                 bogindex_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ------------------+----------------------------------------------------------------
                        egc |   1.455403   .6179286     2.36   0.019     .2430606    2.667746
                  logSIZE_w |   .0537797   .2313815     0.23   0.816    -.4001784    .5077378
                      MTB_w |  -.3560624   .1056944    -3.37   0.001    -.5634291   -.1486957
                    AGE_IPO |   .6235395   .2403697     2.59   0.010      .151947    1.095132
                   spiDummy |  -3.078925   .4488517    -6.86   0.000    -3.959548   -2.198302
          LOGoneplusNBSEG_w |  -.8499284   .5376904    -1.58   0.114    -1.904848    .2049912
          LOGoneplusNGSEG_w |  -3.351886   .3561815    -9.41   0.000    -4.050695   -2.653077
                      ROA_w |   8.455898   2.777385     3.04   0.002     3.006819    13.90498
                LogNITEMS_w |  -3.303121     4.2201    -0.78   0.434    -11.58273    4.976487
                    MAdummy |  -1.995824   1.146018    -1.74   0.082     -4.24425    .2526022
                   SEOdummy |   1.980199   .4816958     4.11   0.000     1.035138    2.925261
                   DummyDLW |   .6609418   1.343901     0.49   0.623     -1.97572    3.297604
                            |
                      fyear |
                      2013  |   5.618195   2.044632     2.75   0.006     1.606738    9.629652
                      2014  |   7.249985   1.967891     3.68   0.000     3.389088    11.11088
                      2015  |   8.278689   1.972914     4.20   0.000     4.407937    12.14944
                      2016  |   8.499862   1.990884     4.27   0.000     4.593855    12.40587
                      2017  |   8.921632   2.011748     4.43   0.000     4.974691    12.86857
                            |
                      _cons |   107.2165   23.36503     4.59   0.000     61.37559    153.0574
          -----------------------------------------------------------------------------------
          xtreg:
          Code:
          Random-effects GLS regression                   Number of obs     =      1,216
          Group variable: CIK                             Number of groups  =        386
          
          R-sq:                                           Obs per group:
               within  = 0.1565                                         min =          1
               between = 0.1395                                         avg =        3.2
               overall = 0.1243                                         max =          6
          
                                                          Wald chi2(17)     =     185.91
          corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
          
          -----------------------------------------------------------------------------------
                 bogindex_w |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ------------------+----------------------------------------------------------------
                        egc |  -.3222823    .247269    -1.30   0.192    -.8069207     .162356
                  logSIZE_w |   .1447385   .1457261     0.99   0.321    -.1408795    .4303564
                      MTB_w |  -.1236751   .0516837    -2.39   0.017    -.2249733   -.0223768
                    AGE_IPO |  -.2454837   .2693566    -0.91   0.362    -.7734129    .2824455
                   spiDummy |  -.2519081   .1582281    -1.59   0.111    -.5620294    .0582132
          LOGoneplusNBSEG_w |  -.1903097   .3852702    -0.49   0.621    -.9454255    .5648061
          LOGoneplusNGSEG_w |   -1.30022   .3329519    -3.91   0.000    -1.952794   -.6476464
                      ROA_w |   .8183385   .9661106     0.85   0.397    -1.075203     2.71188
                LogNITEMS_w |  -3.484442   3.114454    -1.12   0.263     -9.58866    2.619776
                    MAdummy |   .1489838   .4720665     0.32   0.752    -.7762496    1.074217
                   SEOdummy |   .2076608   .1622771     1.28   0.201    -.1103966    .5257181
                   DummyDLW |   1.156666   2.549849     0.45   0.650    -3.840947    6.154278
                            |
                      fyear |
                      2013  |   2.675986   .7225257     3.70   0.000     1.259862    4.092111
                      2014  |     4.0907   .8469771     4.83   0.000     2.430655    5.750744
                      2015  |   4.976979   1.024558     4.86   0.000     2.968883    6.985075
                      2016  |   5.292592   1.234704     4.29   0.000     2.872617    7.712568
                      2017  |   6.174317   1.456429     4.24   0.000      3.31977    9.028865
                            |
                      _cons |   107.6912   17.48729     6.16   0.000     73.41677    141.9657
          ------------------+----------------------------------------------------------------
                    sigma_u |  6.6861248
                    sigma_e |  1.7548742
                        rho |  .93555179   (fraction of variance due to u_i)
          -----------------------------------------------------------------------------------
          What confuses me is that for xtreg, the output says "random-effects GLS regression". I thought this was a fixed effects regression?

          2. I get an error when I use i.CIK in the regression, namely 'matsize too small'.

          Comment


          • #6
            Pepijn:
            you should code -xtreg,fe- to run a panel data regression with -fe- specification; otherwise, Stata imposes by default the -re- specification.
            That said, take a look at the following example and see that the values of -age- coefficient are the same with -regress- and -xtreg,fe-:
            Code:
            use "http://www.stata-press.com/data/r15/nlswork.dta"
            
            . xtset idcode year
                   panel variable:  idcode (unbalanced)
                    time variable:  year, 68 to 88, but with gaps
                            delta:  1 unit
            
            . xtreg ln_wage age if idcode<=4, fe
            
            Fixed-effects (within) regression               Number of obs     =         50
            Group variable: idcode                          Number of groups  =          4
            
            R-sq:                                           Obs per group:
                 within  = 0.0912                                         min =         11
                 between = 0.0122                                         avg =       12.5
                 overall = 0.0412                                         max =         15
            
                                                            F(1,45)           =       4.52
            corr(u_i, Xb)  = -0.1407                        Prob > F          =     0.0391
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     age |   .0157496   .0074109     2.13   0.039     .0008233    .0306758
                   _cons |   1.345101   .2231985     6.03   0.000     .8955561    1.794646
            -------------+----------------------------------------------------------------
                 sigma_u |  .23522653
                 sigma_e |  .32804009
                     rho |  .33957838   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            F test that all u_i=0: F(3, 45) = 6.62                       Prob > F = 0.0008
            
            . regress ln_wage age i.idcode if idcode<=4
            
                  Source |       SS           df       MS      Number of obs   =        50
            -------------+----------------------------------   F(4, 45)        =      5.66
                   Model |  2.43763115         4  .609407788   Prob > F        =    0.0009
                Residual |  4.84246358        45  .107610302   R-squared       =    0.3348
            -------------+----------------------------------   Adj R-squared   =    0.2757
                   Total |  7.28009473        49  .148573362   Root MSE        =    .32804
            
            ------------------------------------------------------------------------------
                 ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                     age |   .0157496   .0074109     2.13   0.039     .0008233    .0306758
                         |
                  idcode |
                      2  |  -.3681284   .1341267    -2.74   0.009    -.6382734   -.0979834
                      3  |  -.5323265   .1320694    -4.03   0.000    -.7983278   -.2663252
                      4  |  -.1479344   .1451202    -1.02   0.313    -.4402216    .1443527
                         |
                   _cons |   1.625695    .216915     7.49   0.000     1.188806    2.062585
            ------------------------------------------------------------------------------
            
            .
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hi Carlo,

              Including the "fe" gives me:

              Code:
               xtreg bogindex_w egc logSIZE_w MTB_w AGE_IPO spiDummy LOGoneplusNBSEG_w LOGoneplusNGSEG_w ROA_w LogNITEMS_w MAdummy SEOdummy DummyDLW i.fyear, fe
              note: DummyDLW omitted because of collinearity
              
              Fixed-effects (within) regression               Number of obs     =      1,216
              Group variable: CIK                             Number of groups  =        386
              
              R-sq:                                           Obs per group:
                   within  = 0.1936                                         min =          1
                   between = 0.0082                                         avg =        3.2
                   overall = 0.0021                                         max =          6
              
                                                              F(16,814)         =      12.21
              corr(u_i, Xb)  = -0.2816                        Prob > F          =     0.0000
              
              -----------------------------------------------------------------------------------
                     bogindex_w |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
              ------------------+----------------------------------------------------------------
                            egc |  -.3165931   .2392314    -1.32   0.186    -.7861763    .1529901
                      logSIZE_w |   .5634882    .155171     3.63   0.000     .2589057    .8680707
                          MTB_w |  -.1771036   .0518687    -3.41   0.001    -.2789158   -.0752914
                        AGE_IPO |   -1.15784   .8905745    -1.30   0.194    -2.905933    .5902535
                       spiDummy |  -.1867412   .1527634    -1.22   0.222    -.4865977    .1131153
              LOGoneplusNBSEG_w |   .1986967   .4001573     0.50   0.620    -.5867651    .9841585
              LOGoneplusNGSEG_w |   .6950278   .3947262     1.76   0.079    -.0797734    1.469829
                          ROA_w |   .1588677   .9316388     0.17   0.865     -1.66983    1.987565
                    LogNITEMS_w |   1.750145   3.300894     0.53   0.596    -4.729122    8.229412
                        MAdummy |   .4000297   .4592032     0.87   0.384    -.5013323    1.301392
                       SEOdummy |   .0732174    .156622     0.47   0.640    -.2342131    .3806479
                       DummyDLW |          0  (omitted)
                                |
                          fyear |
                          2013  |   1.845715   1.115048     1.66   0.098    -.3429941    4.034424
                          2014  |   4.060266   1.901095     2.14   0.033     .3286398    7.791891
                          2015  |    5.92071   2.760175     2.15   0.032     .5028104    11.33861
                          2016  |   7.040381   3.629492     1.94   0.053    -.0838861    14.16465
                          2017  |   8.677354   4.496175     1.93   0.054    -.1481095    17.50282
                                |
                          _cons |   74.34884   18.41622     4.04   0.000     38.19997    110.4977
              ------------------+----------------------------------------------------------------
                        sigma_u |  8.3776246
                        sigma_e |  1.7548742
                            rho |  .95796604   (fraction of variance due to u_i)
              -----------------------------------------------------------------------------------
              F test that all u_i=0: F(385, 814) = 49.76                   Prob > F = 0.0000
              The regression does not work, because when I include "i.CIK" it gives me an error that the matsize is too small, so firm fixed effects cannot be incorporated when I use regress. But if what you say is also true for my example, then this would be the 'final' output right?

              When I do the univariate analysis using "regress bogindex_w egc", the coefficient is 2.44 (positive), and in this xtreg example the coefficient is -.42. Isn't this strange?

              Another point: the output gives that DummyDLW is omitted because of collinearity. Why is this? When I command correlate on all my variables it is not correlated with any other variable more than 0.1.
              Last edited by Pepijn Peters; 16 Oct 2019, 04:57.

              Comment


              • #8
                Pepijn:
                - with so many observatins, you need to increase your -matsize- to run an -fe- regress: this one of the reasons why -xtreg,fe- should be your first choice if you have a panel dataset and you want to run an -fe- regression;
                - if your -DummyDLW- does not vary within the same panel as year go by (ie, is time-invariant), as expected it is omitted by the -fe- estimator.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo,

                  Thanks for your help! So anything to do about the non-significance of the independent (egc) variable (because it is significant when I just regress bogindex_w with egc (univariate) or when I do not incorporate fixed effects), or is this just the way it is?

                  Comment


                  • #10
                    Pepijn:
                    I'm personally really skeptical about univariate regressions whenever the main goal of a given reserach should be achieved with a multiple regression.
                    That said, I would not compare the significance you got from univariate and multiple regression.
                    Besides, as the
                    Code:
                     
                     F test that all u_i=0: F(385, 814) = 49.76                   Prob > F = 0.0000
                    that appears as a foootnote under your -xtreg,fe- regression outcome table reaches statistical significance (that is, it proves that a panel-wise effect exists), I would kick out -regress-.
                    As am aside, I would check via -hausman- if -fe- is actually the way to with your dataset.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Dear Clyde Schechter
                      As you pointed
                      If you want time fixed effects as well as industry, you will need to explicitly include the time variable in the model...............
                      , in the regressions so far I have come across, I used to run the following codes
                      Code:
                      xtset industry year     
                      xtreg depvar indepvars i.year, fe
                      Thereby including both industry fixed effects and year fixed effects.
                      However, recently I come across a term called two-way joint fixed effects. The sentence is as "we introduce the province fixed effect and two-way joint fixed effects among year, province, and industry".
                      What does this mean? Also how to set the data(which panel var,province or industry) in such cases? Based on my understanding province is the biggest aggregation which contains industries and firms in it
                      so should we run the following
                      Code:
                      xtset province year
                      xtreg depvar indepvars i.year i.industry, fe
                      I am quite confused about this? Also if we control for industry fixed affects, can we have provincial fixed effects? Is this the two-way joint fixed effects model?

                      Comment


                      • #12
                        Iai:
                        please note that if -fe- is actually the way to go with your dataset, all time-invariant predictors will be washed out by the -fe- machinery (and -i.industry- is a possible example).
                        As an aside, to retrieve multiple -fe- you may want to consider the community-contributed command -reghdfe-.
                        Eventually, while I've never heard about two-way joint fe (but it may well be my fault), you can probably achieve it by grouping -province- and -industry- via the -egen- function -group- and add -i.year-.
                        Obviously, the more complex the model gets, the harder is to explain/disseminate its results.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          ""we introduce the province fixed effect and two-way joint fixed effects among year, province, and industry"."
                          Authors can invent their own terminology. Without seeing the context it is not possible to answer. You provide no reference.

                          Comment


                          • #14
                            Thanks Carlo Lazzaro &Eric de Souza for the help. As Carlo suggested, reghfde was known to me(though I never used it) but I couldnt recall it here.
                            My sincere apologies to Eric de Souza for not quoting the reference. The paper is titled as "Corporate governance quality and financial leverage: Evidence
                            from China"( it is a recent paper, "https://www.sciencedirect.com/science/article/pii/S1057521920302933?dgcid=raven_sd_search_email).

                            In the above paper, they have stated, "Although the year and industry fixed effects are included to control for unobserved time-variant or industry-variant influences, we do not focus on regional differences on financing. Such factors could also be time-variant at the same province and industry. To rule out such possibilities, we add other fixed effects in our baseline model: province fixed effects and two-way joint fixed effects among year, province, and industry".

                            Their results are reproduced as follows
                            Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
                            Year FE Yes Yes Yes Yes Yes Yes
                            Industry FE Yes Yes Yes Yes Yes Yes
                            Province FE Yes Yes Yes Yes
                            Province#Year FE Yes Yes
                            Industry#Year FE Yes Yes
                            I have a few questions based on the table
                            First, can we put both Industry FE Province FE as in the case of model 2? If Industry is time-invariant, then it will subsume the power of Province FE, don't they? Or under what circumstance we can have both?

                            Second, model 3 requires the following codes right?

                            Code:
                            xtset province year   // I assume that we should set the panel at the highest aggregate level, here, province
                            egen province_year=group(province year)       // grouping province and year 
                            xtreg depvar indep var i.industry i.province_year,fe
                            Any help in this regard will be extremely helpful

                            incidentally, I also came across a paper with the title two-way fixed effects, Two-way fixed effects estimators with heterogeneous treatment effects∗ Clément de Chaisemartin† Xavier D’Haultfœuille‡





                            Comment


                            • #15
                              You should add i.year

                              Comment

                              Working...
                              X