Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed effects, robust standard errors and clustered standard errors

    Hi,

    I'm making a difference-in-differences analysis with multiple interaction terms for returns - three periods with one treatment group.
    I use 500 firms and 11 sectors (which I have taken into account by i.sector). Moreover, I use 5 control variables such as size, ROE etc.

    My questions relate to fixed effect and the choice of adjusting standard errors.

    1) The dataset had heteroskedasticity, hence I understand it as I should apply clustered (by firm) or robust standard errors. Is that correct? And what is the difference?

    2) I run one regression without fixed effects and one with fixed effects. When I apply the fixed effects it omits all my control variables, my sectors as well as my treatment variable due to collinearity. What is the explanation for this? And is the model still correct?

    Thank you so much in advance!

    Best,
    Guest
    Last edited by sladmin; 10 Jun 2021, 14:46. Reason: anonymize original poster

  • #2
    And for example for my fixed effect model and not fixed effects model there is a big difference between the two predicted constants. Why is that?

    Comment


    • #3
      Fixed effects will remove time-invariant characteristics. I suggest you do some searches or look in a textbook for the basic econometric procedure of a fixed effects estimator (the Stata manual for xtreg will also be useful).

      Therefore, presumably the variables that are being dropped are time-invariant (i.e. they are the same across time for each unit)? If not, please share your Stata commands and some info on the dataset so we can see what is going on.

      In a pooled dataset with heteroskedasticity you should use robust standard errors. This will adjust the standard errors to take account of the heteroskedasticity. If you have a panel dataset then you are probably better off using clustered standard errors as your heteroskedasticity will be related to the reporting of each unit (firms).

      A regression estimated using FE will differ from OLS (I assume that is the alternative you talk about) because the FE removes time-invariant characteristics. This is the whole benefit of using FE! Suppose we think that some firms have better managers and that explains to some degree why the outcome variable is higher for such firms (say the outcome variable is profit and firms which are better managed are more profitable). But we can't measure "management skills". Then this will be an omitted variable. If we estimate using OLS then we have a biased coefficient on our variable of interest. If we assume that managemenet skills don't change over time (not an innocuous assumption, to note) then we can estimate using fixed effects to strip-out this time-invariant characteristic (along with any other time-invariant characteristics included in the model).

      Best,
      Rhys

      Comment


      • #4
        If you have -xtset- your data
        Code:
        firm period
        then Stata automatically calculates standard errors robust to heteroskedasticity and arbitrary within firm correlation, regardless of whether you do -xtreg, robust- or -xtreg, robust cluster(firm)-

        For the rest of your questions show exactly what you typed, and exactly what Stata returned to you.

        Originally posted by Guest
        Hi,

        I'm making a difference-in-differences analysis with multiple interaction terms for returns - three periods with one treatment group.
        I use 500 firms and 11 sectors (which I have taken into account by i.sector). Moreover, I use 5 control variables such as size, ROE etc.

        My questions relate to fixed effect and the choice of adjusting standard errors.

        1) The dataset had heteroskedasticity, hence I understand it as I should apply clustered (by firm) or robust standard errors. Is that correct? And what is the difference?

        2) I run one regression without fixed effects and one with fixed effects. When I apply the fixed effects it omits all my control variables, my sectors as well as my treatment variable due to collinearity. What is the explanation for this? And is the model still correct?

        Thank you so much in advance!

        Best,
        Guest
        Last edited by sladmin; 10 Jun 2021, 14:47. Reason: anonymize original poster

        Comment


        • #5
          Guest:
          as an aside to others' helpful replies:
          1) under -xtreg- (I assume you're using this -xt- command) both -robust- and -cluster- options do the very same job (as they tell Stata to adopt a cluster-robust standard error);
          2) running regressions with different specifications and obtaining different resulst comes with no wonder at all. It is also expected that the fixed-effect estimator wipes out all the time-invariant variables (-sector. is a point in case, as firms rarely change industry as time goes by).
          That said, I'm under the impression that you need to increase your knowledge of panel data regression (which is a demanding research field): just take a look at -xtreg- entry in Stata .pdf manual and related references.
          Rhys and and Joro's advice of acting on the FAQ and share what you typed and what Stata gave you back (within CODE delimiters, please) should become your habit when you post on this forum. Thanks.
          Last edited by sladmin; 10 Jun 2021, 14:47. Reason: anonymize original poster
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            I made the following two regressions (first without fixed effects, and second with fixed effects):

            xtreg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1, vce(cluster CompanyNo)

            xtreg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG!= 1, fe vce(cluster CompanyNo)

            I get the following output for the first model:
            Click image for larger version

Name:	Screenshot 2021-04-24 at 7.31.13 AM.png
Views:	3
Size:	120.7 KB
ID:	1605477

            Click image for larger version

Name:	Screenshot 2021-04-24 at 8.10.38 AM.png
Views:	1
Size:	145.7 KB
ID:	1605478

            For the second model I get:

            Click image for larger version

Name:	Screenshot 2021-04-24 at 8.11.25 AM.png
Views:	1
Size:	117.8 KB
ID:	1605479

            Click image for larger version

Name:	Screenshot 2021-04-24 at 8.11.31 AM.png
Views:	1
Size:	120.4 KB
ID:	1605480



            Before to test for the OLS assumptions I have done the following:


            Linearity, Random Sample & Zero Conditional Mean

            I run the following in Stata to test for linearity and zero conditional mean:

            reg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1
            Why can I only do this with the reg command and not xtreg? And is that fine?

            predict pred, xb replace
            predict resid, resid
            scatter resid pred

            Click image for larger version

Name:	Screenshot 2021-04-24 at 8.22.02 AM.png
Views:	1
Size:	164.0 KB
ID:	1605481




            As the above figure does not look like the examples I have seen online I wonder how to interpret the linearity and zero conditional mean from above?

            Could we simply argue for the zero conditional mean that according to XX there should be no omitted variables, hence to avoid this bias, we have included relevant variables recgonized in the literature?

            In regards to the random sample, as we look at S&P 500 and use all of it, I would argue that this fulfills the random sample assumption?

            Multicollinearity
            We test by using correlation and the VIF values.
            corr RawReturn ESG_score E_score S_score G_score LN_assets Leverage Liquidity MBV ROA
            vif

            (VIF model could not be uploaded due to the maximum attachments)
            (correlation matrix could not be uploaded due to the maximum attachments)

            As we see from above no correlation is higher than 0.7 (argued in the literature that there will be some correlation but below 0.7 is fine) and the VIF is below 10 (also argued in the literature). Hence, we see no multicollinearity.

            Heteraskadacity
            From the plot above we can see heteraskadacity from the horizontal lines, correct? However, this can also be tested using the following in Stata:

            reg RawReturn Top20_ESG Crash Recovery 1.Top20_ESG#1.Crash 1.Top20_ESG#1.Recovery i.GICSectors LN_assets Leverage Liquidity MBV ROA if Not20_ESG != 1
            estat hettest

            (Breusch-Pagan test could not be uploaded due to the maximum attachments)
            But the output was
            Ho: Constant variance
            Variables: fitted values of RawReturn
            chi2 = 2289.25
            Prob > chi2 = 0.0000

            I.e., there is heteroskedasticity and we apply the robust standard errors.



            So above is the assumption for the OLS, and thereby random effect model. As we also run the fixed effects model, how do these assumptions differ? As we understand the heteraskdacity would not need to be included but otherwise, it should be the same.



            We run this regression model for the top ESG (as above) but we also run it for a decomponent of only the E score (environmental score). I.e., Top_E instead which will include different companies in the top (same dataset though). In addition, we run it for abnormal returns instead of raw returns.
            In theory, we should run these tests for each model. Correct?



            Thank you so much in advance!!! It is really appreciated.

            Best,
            Guest
            Last edited by sladmin; 10 Jun 2021, 14:47. Reason: anonymize original poster

            Comment


            • #7
              Guest:
              1) you have 202 panels with up to 79 theoretical observations per panel.
              Hence, it does not make sense to test for heteroskedasticity and/or autocorrelation as the latter is very likely to be present in your dataset: therefore, go straignt to cluster or robust standar errors, that take both heteroskedasticity and/or autocorrelation into account:
              2) your -fe- estimator wipes out all time-invarianta variables: no wonder about that;
              3) there's no gain in running -regress- with your panel dataset.
              Last edited by sladmin; 10 Jun 2021, 14:48. Reason: anonymize original poster
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Hi,

                Thank you very much for your response.
                I have tried to read more online regarding panel data. I did the Hausman test and got that FE was to prefer compared to RE. However, this does not imply that FE is better than the "normal" OLS regression I did. Is that correctly understood? So I could present both results in my paper, i.e., one for the "normal" OLS and one adding fixed effects?

                Thank you again.

                Best,
                Guest
                Last edited by sladmin; 10 Jun 2021, 14:48. Reason: anonymize original poster

                Comment


                • #9
                  Originally posted by Guest
                  Hi,

                  Thank you very much for your response.
                  I have tried to read more online regarding panel data. I did the Hausman test and got that FE was to prefer compared to RE. However, this does not imply that FE is better than the "normal" OLS regression I did. Is that correctly understood? So I could present both results in my paper, i.e., one for the "normal" OLS and one adding fixed effects?

                  Thank you again.

                  Best,
                  Guest
                  Guest,

                  FE and OLS estimation are doing completely different things. Neither is "better" but one will be more appropriate for your dataset. If you suspect you have time-invariant omitted variable bias then that is the reason why you are estimating using FE. Almost by definition, this means that OLS will produce biased coefficients.
                  Some papers might still present the results to OLS (perhaps in the first column), before saying that these results are biased and they prefer the results to the FE estimation.

                  If you still aren't sure the difference between OLS and FE, and the purpose of doing FE then I urge you to do some more reading because this is quite an important distinction.

                  Best,
                  Rhys
                  Last edited by sladmin; 10 Jun 2021, 14:48. Reason: anonymize original poster

                  Comment


                  • #10
                    Guest:
                    as an aside to Rhys' helpful hint, plese note that your -regress- code in #6 due to the lack of the -cluster- option, treats your observations as they were independent (which is not the case with panel data).
                    My take is that you have one more reason to skip reporting -regress- results in your paper (like Rhys, my expereince is that oftentimes "normal", i.e., cross-sectional -regress- results are reported just to add afterward that this is not the way to go with panel data, as they violate some relevant OLS assumptions).
                    Last edited by sladmin; 10 Jun 2021, 14:48. Reason: anonymize original poster
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment

                    Working...
                    X