Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted because of collinearity with fixed effects

    Dear Stata community I have a burning question. I am running a regression according to the current international trade literature. However either using reg or xtreg with fixed effects some firms are omitted due to collinearity, and firm no.1 was "dropped" to prevent the dummy variable trap. After reading many post I didn't get a clear answer to my problem. Therefore my questions.

    i) Is this something common?
    ii) Is this something I can leave with, as I am mostly interested in the value of status?
    iii) If not. Is there anything I can do?


    reg lval status lage lsize iext i.year i.industry i.firm
    note: 3258.firm omitted because of collinearity
    note: 3381.firm omitted because of collinearity

    Source | SS df MS Number of obs = 9400
    -------------+------------------------------ F(1795, 7604) = 9.87
    Model | 2783.47115 1795 1.5506803 Prob > F = 0.0000
    Residual | 1194.87009 7604 .157137045 R-squared = 0.6997
    -------------+------------------------------ Adj R-squared = 0.6288
    Total | 3978.34124 9399 .42327282 Root MSE = .39641

    ------------------------------------------------------------------------------
    lval | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    status | .0219116 .0209441 1.05 0.296 -.0191446 .0629679
    lage | .1520159 .0279789 5.43 0.000 .0971695 .2068623
    lsize | -.1636843 .0185827 -8.81 0.000 -.2001115 -.1272571
    iext | .0386521 .053303 0.73 0.468 -.0658364 .1431406

    Should you need more info, do not doubt to let me know.

  • #2
    Guido:
    welcome to this forum.
    As per your questions:
    1) yes, it's pretty common (and ther's nothing you can do but change your model specification);
    2) what you experienced is something that everybody dealing with econometrics live with on a daily basis;
    3) as per 1), just change your model specification (if feasible).

    Evntually, two closing-out comments:
    - -regress- seldom outperforms -xtreg- when it comes to panel data regression (see -xtreg- entry in Stata .pdf manual);
    - please post what you typed and what Stata gave you back via CODE delimiters (see the FAQ on this and other Statalist-related topic). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      There are a few possibilities. If one or more of the variables status, lage, lsize, iext, or i.industry is a constant attribute of each firm, not changing from one observation of that firm to the next, then that variable will be colinear with the collection of i.firm indicators, then that will account for colinearity. Stata has no choice but to omit at least one variable from any group of colinear variables, otherwise the model could not be identified. Stata's current approach is to retain variables listed earlier in the command and remove those that are listed later in the command. Since you listed your firm indicators last, those are the ones being dropped to break the colinearity.

      If you re-run the regression listing the i.firm variables first, Stata will drop something else: those other variables will be the ones that are causing the problem for you. You then need to explore why this colinearity exists. If the variable is one that is expected to be constant for all observations of a given firm (industry sounds like a good candidate for this!) then the solution is to remove it from the model. If it turns out that the variable being dropped is one that should not be constant within firms, then you need to check if you have data errors.

      Comment


      • #4
        Thank you so much, both to Carlo and Clyde for such precise and prompt replies. Yes, Clyde you nailed it. It is due to i.industry, in particular industry no.5 and n.20. That's a bummer because I am using the same model specification that most papers (including some that use the exact same database and years than myself) which makes me question how much cooking is done behind the scenes or that they do not report this type of problems.

        According to your experience and best judgment,
        a) is it still useful to present the estimators of lval status lage lsize iext, which are my main variables and do not enter into detail of what happens with the fixed effects of i.year, i.industry and i.firm? I am afraid to change the model specification because it is so common to use this one in the literature that it will seem strange to present a modified version.

        b) do these estimators (under the 2 industries omitted because of colinearity) still give valuable information? Most importantly on the variable status (exporting (1) vs non exporting (0)) ?

        Once again thank you for your patience to explain with such detail the problems that we, the beginners, face with econometrics.

        Comment


        • #5
          As a non-economist, I don't feel qualified to answer questions about what is valuable or useful in econometrics. Perhaps Carlo will respond to that.

          I will use this opportunity, however, to make a few statistical and meta-statistical points.

          1. There is a lot of crap that gets published. Peer review is far from perfect.
          2. Methods sections in articles are typically skimpy and leave out much detail.
          3. It is possible that previous authors have encountered similar problems but didn't scrutinize their outputs carefully enough to notice what happened, or didn't appreciate what it meant.
          4. Could it be that some of the previous articles using this model used random rather than fixed effects for firm or industry? That would get around this problem. Moreover, what you actually have here is 3-level data, and by using fixed-effects regression you are shoehorning it into a 2-level model. Perhaps others have chosen to use a better model specification, at the price of not being able to guarantee consistent estimation of its parameters.
          5. I have often seen, in this situation, fixed-effects at the firm level, not including an industry effect in the model (because you can't), but with clustering of the standard errors at the industry level. Perhaps this is what you are seeing in the literature you've read.

          Comment


          • #6
            Guido:
            some asides to Clyde's helpful comments (which I share):
            1) I don't get what you mean by avoiding details concerning fixed effect. In my opinion, the issue is methodological: are you intended to run a fixed-effect panel data regression (by the way, have you switched to -xtreg-?) or other appraoches are equally feasible? What's the standard model specification in your research field for the kind of research you're engaged with?
            2) collinearity is a matter of fact. If you are confident that your analysis is based on the same dataset that inspired previous articles that did not report your very same problem, perhaps Author used a different specification (as Clyde said) or hid the shortcomings you're experiencing.

            I would also ask your supervisor's opinion on the whole matter.

            That said, as a general advice, your chances of getting more positive replies are conditional on posting what you typed and what Stata gave you back within CODE delimiters (see the FAQ on this). Thanks.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Hello, is there a way to force Stata to drop a specific a variable within a group of factor variables, instead of dropping the first one? Let's say i.industry is composed of 20 industry sectors and I would like Stata to drop (to prevent the dummy variable trap) industry number 5. Is it possible?

              Comment


              • #8
                Yes. Instead of specifying i.industry, specify it as ib5.industry. Do read -help fvvarlist- to gain a thorough understanding of factor-variable notation.

                Comment


                • #9
                  Thank you very much Clyde!

                  Comment


                  • #10
                    Hello again Stata community.
                    I have a strange question that many people in my Uni have no idea how to answer me, so I post the question online reaching our for help.
                    I am doing a regression with robust fixed effects and factor variables time and industry (i.time i.industry). A friend suggested me to employ i.time#i.industry to check how both factor variables interact. I did so and it look like: xtreg lsas status lwage lage lage2 lsize iext innovate i.year#i.industry, fe cluster(firm)

                    Later I run the regression using i.time#i.industry but also i.time i.industry. Looking like: xtreg lsas status lwage lage lage2 lsize iext innovate i.year i.industry i.year#i.industry, fe cluster(firm)

                    What it is striking to me is that the results are practically the same (Coef and t), , except some minor changes in the constant result for some samples of firms.

                    Does anybody know why might be the reasons?

                    Thank you very much!

                    Comment


                    • #11
                      The answer is precisely the title of this thread: colinearity with fixed effects. Industry is constant over time for any given firm, so the i.industry indicators are colinear with the firm level fixed effects. And i.year is, of course, the time fixed effect. So even though you think you have added i.industry and i.year to the model, in fact only i.year has been added. Evidently, the year indicators don't have much effect on the other estimates, suggesting that those yearly shocks are either very modest, or perhaps the other model variables' distributions don't very much from year to year.

                      Comment


                      • #12
                        Hello Clyde,

                        thank you for your answer but I am not able to follow it. Let me explain myself. When I add to the fixed effect regression only i.industry the estimates vary significantly. And when I add only i.year they also vary significantly. By significantly I mean that some variables that in the fixed effect regression with no factor variables are non-significant become with any of the factor variables highly significant. This is because for every year I use an industry price index different for each industry to create constant prices by industry and also at a firm level the industrial sector might vary in the 26 year period time I am examining due to mergers, acquisitions or activity changes.

                        My question regards i.time#i.industry, because I do not understand very well what it represents. When I use it (without i.year i.industry) again results vary from all the previous ones in a major way.

                        However when I use it all: i.year i.industry i.time#i.industry, in the same regression, the result are almost identical that using only i.time#i.industry in the regression. But I don't understand why.

                        Does anyone know the reason?

                        Comment


                        • #13
                          I don't know how to explain it any more clearly than I did before. I hope somebody else will try.

                          Comment


                          • #14
                            Hello everybody,

                            I have excatly the same problem:

                            NIM - dependent var
                            Treat PostPeriod Treat_PostPeriod - DiD method
                            Banksize Depositratio RWA Liquidity Creditrisk GDP INF HHI - independent var

                            where my baseline reg looks like:

                            yijt = α + β1 * (Treatmentit) + β2 * (PostPeriodt) + β3 * (Treatmentit * PostPeriodt ) + β4 * factor1ijt+⋯+βn * factorNijt+ δit + ϴit + εit

                            δt and θt stands for year and bank fixed effects

                            But when I run:

                            reg NIM Treat PostPeriod Treat_PostPeriod Banksize Depositratio RWA Liquidity Creditrisk GDP INF HHI i.Year i.id, cluster (Companyname)

                            I get :

                            note: 2019.Year omitted because of collinearity.
                            note: 99.id omitted because of collinearity.

                            Although, after I read this discussion I still do not really understand why in my case 2019 and bank number 99 is omitted since according to Clyde: "If one or more of the variables is a constant attribute of each firm, not changing from one observation of that firm to the next, then that variable will be colinear with the collection of i.year or i.id indicators, then that will account for colinearity."

                            But after checking the dataset bank with id 99 have all of its attributes different for each year as well as data for 2019 are for all banks different with respect to 2018.

                            Can you please explain it to me as well + do you think this is a huge mistake and therefore the regression that I have done woouldn't present valuable information on the studied topic ?

                            Thank you very much for your answers.

                            Comment


                            • #15
                              You should just have your treatment variable, which is the interaction Treat_PostPeriod,when you use TWFE. The variable Treatment only has an i subscript, right? So it is redundant. And the PostPeriod dummy is subsumed by the the year effects. So use two-way FE with just the interaction. The coefficient on the interaction is the estimated treatment effect.

                              Comment

                              Working...
                              X