Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtreg with dummy variable interaction term, variable omitted because of collinearity

    Hello!

    I have read many posts about variable omitted because of collinearity but still can't solve my problem. I have a panel data from 2010-2019, I want to run xtreg with firm and year fixed effects. The regression model looks like this
    Code:
    xtreg F.ind_adj_subsidy i.v_first##i.H_subsidy controls i.year, fe vce(robust)
    where the DV is industry adjusted value of subsidy in year t+1, two main IVs are v_first (=1 if the firm is first time be in certain status) and H_subsidy (=1 if the firm's subsidy is above industry median in a year), they both are dummies. When I run regression, stata tells me v_first and the interaction term v_first*H_subsidy are omitted because of collinearity. I think this is probably because there is too little v_first =1 in the sample?
    Code:
    xttab v_first
    
                      Overall             Between            Within
    v_first   |    Freq.  Percent      Freq.  Percent        Percent
    ----------+-----------------------------------------------------
            0 |   20613     98.46      3177     95.78          98.41
            1 |     323      1.54       323      9.74          58.96
    ----------+-----------------------------------------------------
        Total |   20936    100.00      3500    105.52          94.77
    
    xttab H_subsidy
    
                      Overall             Between            Within
    H_subsidy |    Freq.  Percent      Freq.  Percent        Percent
    ----------+-----------------------------------------------------
            0 |   10953     52.32      2556     77.06          68.98
            1 |    9983     47.68      2441     73.59          63.66
    ----------+-----------------------------------------------------
        Total |   20936    100.00      4997    150.65          66.38
    So I wonder:
    1. what causes the collinearity problem, if it is possible to tell.
    2. If I do need v_first in the regression, how to solve the collinearity problem.

    Thanks a lot for any suggestions!

  • #2
    Flora:
    the most likely issue is perfect collinearity with fixed effect.
    As usual, more positive replies are conditional on posting the -xtreg- outcome table, too (as per FAQ). Thanks.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo,

      Thanks for your reply. Sorry that I should have posted the results earlier. c1 to c5 are 5 control variables, and the sample period should be 2010-2018, I wrote it wrong in post #1.

      Code:
      xtreg F.ind_adj_subsidy i.v_first##i.H_subsidy c1 c2 c3 c4 c5 i.Year, fe vce(robust)
      note: 0.v_first omitted because of collinearity
      note: 0.v_first#1.H_subsidy omitted because of collinearity
      
      Fixed-effects (within) regression               Number of obs     =     16,743
      Group variable: Symbol                          Number of groups  =      2,935
      
      R-sq:                                           Obs per group:
           within  = 0.1313                                         min =          1
           between = 0.4523                                         avg =        5.7
           overall = 0.3494                                         max =          9
      
                                                      F(14,2934)        =      68.25
      corr(u_i, Xb)  = 0.1723                         Prob > F          =     0.0000
      
                                        (Std. Err. adjusted for 2,935 clusters in Symbol)
      -----------------------------------------------------------------------------------
                        |               Robust
      F.ind_adj_subsidy |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ------------------+----------------------------------------------------------------
              0.v_first |          0  (omitted)
            1.H_subsidy |   .2732701   .0193838    14.10   0.000     .2352628    .3112773
                        |
      v_first#H_subsidy |
                   0 1  |          0  (omitted)
                        |
                     c1 |    .508838   .0268067    18.98   0.000     .4562762    .5613998
                     c2 |   .0018179   .0009602     1.89   0.058    -.0000647    .0037006
                     c3 |   .0136683   .0030876     4.43   0.000     .0076142    .0197225
                     c4 |  -.1914263   .0785727    -2.44   0.015    -.3454896   -.0373631
                     c5 |  -.0053233   .0080127    -0.66   0.507    -.0210344    .0103878
                        |
                   Year |
                  2011  |  -.0062278   .0271842    -0.23   0.819    -.0595299    .0470743
                  2012  |  -.0300008   .0279894    -1.07   0.284    -.0848816    .0248801
                  2013  |   -.070874   .0306638    -2.31   0.021    -.1309989   -.0107492
                  2014  |  -.0796062   .0330454    -2.41   0.016    -.1444007   -.0148116
                  2015  |  -.1659715   .0357867    -4.64   0.000    -.2361411   -.0958018
                  2016  |  -.1803758   .0370477    -4.87   0.000     -.253018   -.1077336
                  2017  |  -.2200118   .0381543    -5.77   0.000    -.2948238   -.1451998
                  2018  |  -.2418005   .0404973    -5.97   0.000    -.3212065   -.1623944
                        |
                  _cons |  -4.044752   .1992964   -20.30   0.000    -4.435527   -3.653977
      ------------------+----------------------------------------------------------------
                sigma_u |  .73048084
                sigma_e |  .67816143
                    rho |  .53709056   (fraction of variance due to u_i)
      -----------------------------------------------------------------------------------

      Comment


      • #4
        Flora:
        thanks for posting your -xtreg,fe- outcome table, too.
        Now things are clearer:
        1) -0.v_first- is perfectly collinear with the fixed effect and, as such, is omitted;
        2) you NxT=20936, but the -xtreg,fe- outcome tables report 16,743 NxT observations. It may well be that the small proportion of observations belonging to -1.v_first- level are omitted due to casewise deletion (Stata's default approach aimed at getting rid of all the observations with missing values in at least one variable);
        3) hence, no coefficient for -1.first#1.H.subsidy-.

        That said:
        1) you can avoid interacting those two variables;
        2) I would also check whether the regression is correctly specified.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Hi Carlo,

          Thanks for your reply and suggestions. I probably can partition the sample into above/below median industry subsidy and run regression separately, so I could avoid interacting v_first and H_subsidy.

          Comment


          • #6
            Flora:
            you can also remain -xtreg-and avoid interctinn the two variables.
            Otherwise, you can plug in the -above_below_median_industry_subsidy- predictor and live with it, still going -xtreg- (running separated regressions should be the very last chance, as you may lose interesting pieces of information this way).
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              I see, thanks for your advice!

              Comment

              Working...
              X