Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS or fixed effects, or random effects

    Hi All,

    My question is very basic if I am using regress y x i.industry i.year, vce (cluster firmid)

    Does it consider as fixed effects or random effects or none? If it fixed, do I need to check hausman test first?

    Regards,
    Andy

  • #2
    -regress- includes neither fixed effects nor random effects. The only impact firmid has on that command is to cause the use of a cluster robust standard error calculation that recognizes correlation of errors within firmid. If you have panel data, or serial cross-sections, this is an incomplete way to handle the situation and you should be using either -xtreg, fe- or -xtreg, re-, at least initially. If, after examining the output of those, it seems that the fixed or random effects are all essentially zero, then you could go back to using -regress-. But only under those circumstances.

    Comment


    • #3
      I thought individual dummies also work as fixed effects. What do you mean by saying if they all are zero?

      Comment


      • #4
        Yes, but you do not have any dummies for firmid in your model, just industry and year. So you have fixed effects for industry and year, but clustering VCE on firmid, with no fixed effects for that. If firms are nested within industries, then you are trying to fit a three level model into a single level and all you've got is a jumble.

        As for the all zero part, at the bottom of -xt..., fe- model outputs there is a line that says F-test that all u_i are zero. If you do a random effects regression, then the last row of the output table is the estimate of rho. If that is close to zero, then the random effects are all very close to zero and you might go back to a single-level model with -regress-.

        Comment


        • #5
          Hello Andy,

          I would like to add an example and some comments to what Clyde mentioned.

          First the comments. If the assumptions of a random-effects model are satisfied, -regress-, xtreg, fe, -xtreg, re-, and -meglm-, should give you consistent parameter estimates. The difference is that -regress- and -xtreg, fe- are less efficient than the last two estimators. If the assumptions of the random effects model are not satisfied, only -xtreg, fe- gives consistent estimates. I am in particular focused on the assumption that the time invariant unobservables are unrelated to the covariates.

          Below is an example where the random effects model assumptions are satisfied.

          Code:
          . clear
          
          . set seed 111
          
          . set obs 1000
          number of observations (_N) was 0, now 1,000
          
          . generate id = _n
          
          . generate a  = rchi2(1) - 1
          
          . expand 10
          (9,000 observations created)
          
          . bysort id: generate time = _n
          
          . xtset id time
                 panel variable:  id (strongly balanced)
                  time variable:  time, 1 to 10
                          delta:  1 unit
          
          . generate x = rchi2(1)
          
          . generate e = rchi2(1) - 1
          
          . generate y = 1 + x + a + e
          
          . quietly regress y x, vce(cluster id)
          
          . estimates store reg
          
          . quietly xtreg y x
          
          . estimates store re
          
          . quietly meglm y x || id:
          
          . estimates store me
          
          . quietly xtreg y x, fe
          
          . estimates store fe
          
          . estimates table reg re me fe, se eq(1) drop(/var(_cons[id]) /var(e.y))
          
          ------------------------------------------------------------------
              Variable |    reg           re           me           fe      
          -------------+----------------------------------------------------
                     x |  1.0151222    1.0059463    1.0059473    1.0050269  
                       |  .01376745    .01058109    .01058064    .01062974  
                 _cons |  1.0868951    1.0960764    1.0960754    1.0969963  
                       |  .05154396    .05112642    .05109772    .01788983  
          ------------------------------------------------------------------
                                                                legend: b/se
          The constant and slope parameter true values are equal to 1 and the estimators give us that.

          What if the regressors are related to the time invariant unobservable? The output table at the end shows you that in that case only the fixed effects model gives consistent estimates.

          Code:
          . clear
          
          . set seed 111
          
          . set obs 1000
          number of observations (_N) was 0, now 1,000
          
          . generate id = _n
          
          . generate a  = rchi2(1) - 1
          
          . expand 10
          (9,000 observations created)
          
          . bysort id: generate time = _n
          
          . xtset id time
                 panel variable:  id (strongly balanced)
                  time variable:  time, 1 to 10
                          delta:  1 unit
          
          . generate x = rchi2(1) + 2*a
          
          . generate e = rchi2(1) - 1
          
          . generate y = 1 + x + a + e
          
          . quietly regress y x, vce(cluster id)
          
          . estimates store reg
          
          . quietly xtreg y x
          
          . estimates store re
          
          . quietly meglm y x || id:
          
          . estimates store me
          
          . quietly xtreg y x, fe
          
          . estimates store fe
          
          . estimates table reg re me fe, se eq(1) drop(/var(_cons[id]) /var(e.y))
          
          ------------------------------------------------------------------
              Variable |    reg           re           me           fe      
          -------------+----------------------------------------------------
                     x |   1.409196    1.3947018    1.3723182    1.0050269  
                       |  .01038688    .00503585    .00832905    .01062974  
                 _cons |  .63096136    .64764707    .67341488    1.0962392  
                       |  .01683822    .01817219    .02170664    .01888554  
          ------------------------------------------------------------------
                                                                legend: b/se

          Comment


          • #6
            Clyde, many thanks for your answer. I meant by industry dummies through ib.industry or ib.year.

            How could I form a panel through Industry and time, I think there is no way of doing it?

            The thing is I want to control for industry and time, but I could not have all three effects including firm as well. I could use vce in that regard. Is not it reasonable approach?

            Comment


            • #7
              Enrique, the only thing that does not vary with time is more likely to be industry, else all other x variable will change. Thanks for this example. I think in OLS when we use industry and time dummies, its neither random nor its fixed but it control for industry and time effect right? is it right to cluster by firm if we assume within firm there is autocorrelation?

              Comment


              • #8
                If, and you don't clearly say whether it is or it isn't, the structure of your data is that you have repeated observations (year) on firms (firmid) and firms are nested in industries (industry), then you do not have panel data. You have multi-level data.

                Probably the least compromised approach you could take that leaves you working in the panel-data framework (more or less) would be to -xtset firmid year- and then -xtreg ...., vce(cluster industry)-, either as fe or re. This model can be estimated, and it properly accounts for correlation of errors within industry. It does not, however, incorporate industry-level effects.

                But really, if you can justify assuming that the error terms are independent of any time-invariant properties of the predictors in the model, you would be better off going to multi-level modeling.

                Comment


                • #9
                  As a rule of thumb, how could you even do xtset with industry and time, as there are many industries there are 10 industries, thus in 2007, an observation consistent of similar industry that does not vary, and so does for other years. Then, best thing left is to industry and time dumimes right?

                  Comment


                  • #10
                    as there are many industries there are 10 industries, thus in 2007, an observation consistent of similar industry that does not vary, and so does for other years.
                    Sorry, but I don't understand this clause.

                    You can't do -xtset industry time- because you have multiple observations with the same values of industry and time (namely one for each firm in the industry).

                    But if you -xtset firmid- (or -xtset firmid year-) you can't use industries indicators (dummies). You can put them in your model, but they will be omitted because they are colinear with the firm level effects. It is never possible to estimate the effects of things that are constant over time within panels in a fixed-effects panel regression.

                    The bottom line is you don't have panel data. You have three-level data. If you force a two-level (panel, xt-type) analysis on the data, then something will have to be sacrificed. You have some latitude to decide which things you will sacrifice and which you will model faithfully. But you can't have it all. You just can't fit three levels into a two-level model without chopping something off.

                    Comment


                    • #11
                      Obviously, you are right I understand. But the past studies who used industry and time together either they are focusing on one industry per panel (which is not true as they have a lot of observations) and otherwise there is no way of forming a panel. I think is not more or less same, by using industry and time dummies you control for industry and time effects and with use of cluster you care for auto correlation.

                      Many thanks clyde, you have been very nice.

                      Comment


                      • #12
                        and when we say panel we mean fixed effects? or fixed effects does not necessary mean panel!

                        Comment


                        • #13
                          Concerning the clause, I meant by saying, for example, there are 8 industries and within these 8 there are ten identities in year 2007 for basic industry and financial industries. There is no way of forming a panel. I meant does panel directly implies to fixed?

                          Comment


                          • #14
                            Can you please explain that studies which use industry and fixed effects. Does it mean they are forming a panel? Because in each year 2007 there are many industries. In that way the fixed effects is only by using dummy of industries and time?

                            Comment


                            • #15
                              The data structure you describe is not panel data. It is 3-level data. I do not follow the literature you are describing and, even if I did, a forum like this is not the place to do a review of it. Suffice it to say that when you have three level data, you cannot use panel-data analyses without introducing some kind of distortion and losing something along the way. You have several choices:

                              1. You can use a multi-level model. This means going to random effects rather than fixed effects, so consistent estimation is not guaranteed.

                              2. You can use panel estimators setting the top level (industry) as the panel. In this case you can also estimate the middle-level effects (firm) by including indicator variables for them. But you get stuck in terms of a cluster-robust variance estimator: clustering at either top or mid-level ignores the clustering in the other. Now, Sergio Correa's -reghdfe- allows you to specify multiple variables for vce() clustering and also absorb fixed effects for multiple variables. To be honest, I don't really know how this works out in terms of statistical validity, but this may be a way out of the problem. I actually think that the use of multiple fixed-effects and multiple-clusters here is only applicable when the effects involved are crossed, not nested, as, for example, panel and time effects. But I'm not really sure. If you are interested in pursuing this, you can get -reghdfe- from SSC and start with its help file.

                              3. You can use panel estimators, setting the mid level as the panel. In this case you cannot estimate top-level effects because they are constant within the panels, and, hence get omitted due to colinearity with fixed effects.

                              So it seems there is no ideal approach. As the saying goes: you pays your money, you makes your choice.

                              Comment

                              Working...
                              X