Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple omitted variables in dummy variable regression

    Dear experts,

    I have a question regarding the omitted variables during a dummy variable regression. I have a panel data set from the year (YEAR) 1990-2021 and I do a dummy variable regression using
    Code:
    xi: reg <control variables> i.YEAR
    (with the constant). I get the following output:
    Code:
    i.YEAR _IYEAR_1990-2021 (naturally coded; _IYEAR_1990 omitted)
    note: _IYEAR_2021 omitted because of collinearity.
    My understanding is that 1990 dummy is omitted by stata to avoid the dummy variable bias and the coefficients of the other dummy variables are w.r.t. to the dummy variable for 1990. However, the 2021 dummy variable is also omitted. So, my question is if there is any way to avoid the omission of 2021 dummy variable because I am interested in getting the coefficient of 2021 dummy variable?

    Or I need to get rid of the observations for 2021 and the corresponding dummy variable (I tried this and then only one dummy variable(1990) is dropped)?

    Thanks in advance,
    Monit
    Last edited by Monit Singh; 14 Dec 2022, 10:16.

  • #2
    Monit:
    welcome to this forum.
    In order to increase your chances of getting helpful replies, please follow the FAQ and share what you typed and what Stata gave you back via CODE delimiters. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Monit:
      welcome to this forum.
      In order to increase your chances of getting helpful replies, please follow the FAQ and share what you typed and what Stata gave you back via CODE delimiters. Thanks.
      Thanks Carlo Lazzaro
      I added the input and output from Stata in the code block.

      Best regards,
      Monit

      Comment


      • #4
        Monit:
        1) why using -xi:- when -fvvarlist- notation is available for creating categorical variables and interactions?;
        2) why using -regress- as your first choice when dealing with a panel dataset (see -xtreg-, instead)?;
        3) you did not share the -regress- outcome table.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo Lazzaro


          1. Still new to Stata, read about -fvvar- and indeed -xi- is obsolete. Thanks for the tip.

          2. Well, I wanted to run the model with FE. Since including time dummies, so running plain regression. Am I doing something wrong here?

          3. So now, the way I run the regression is

          Code:
          reg PRICE EOD i.YEAR
          and here is the output table
          Code:
          ------------------------------------------------------------------------------
                 PRICE | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                 EOD |  -.0005088   .0000577    -8.83   0.000    -.0006218   -.0003958
                       |
                  YEAR |
                 1991  |  -.8508183   1.456398    -0.58   0.559    -3.706304    2.004667
                 1992  |  -.6453494   1.434586    -0.45   0.653     -3.45807    2.167371
                 1993  |  -.9500783   1.414972    -0.67   0.502    -3.724342    1.824185
                 1994  |  -1.011667   1.411299    -0.72   0.474     -3.77873    1.755395
                 1995  |  -1.358915     1.4007    -0.97   0.332    -4.105197    1.387366
                 1996  |  -1.506129   1.387488    -1.09   0.278    -4.226506    1.214247
                 1997  |  -1.486535   1.394036    -1.07   0.286     -4.21975    1.246679
                 1998  |  -1.130941    1.38121    -0.82   0.413    -3.839009    1.577127
                 1999  |  -1.224752   1.378139    -0.89   0.374    -3.926799    1.477295
                 2000  |  -1.291036   1.372278    -0.94   0.347    -3.981592    1.399519
                 2001  |  -.6593383   1.366535    -0.48   0.629    -3.338634    2.019958
                 2002  |  -.0615555      1.361    -0.05   0.964    -2.729999    2.606888
                 2003  |   .1383801   1.353216     0.10   0.919    -2.514802    2.791562
                 2004  |   .3530488   1.348486     0.26   0.793     -2.29086    2.996957
                 2005  |   .1191028   1.309813     0.09   0.928    -2.448981    2.687186
                 2006  |   .4247897   1.308845     0.32   0.746    -2.141396    2.990975
                 2007  |   .8595577   1.310509     0.66   0.512     -1.70989    3.429005
                 2008  |   1.086087   1.309408     0.83   0.407    -1.481202    3.653377
                 2009  |   .8326992   1.306012     0.64   0.524    -1.727933    3.393331
                 2010  |   1.077802     1.3082     0.82   0.410     -1.48712    3.642723
                 2011  |   1.317392    1.31247     1.00   0.316    -1.255901    3.890684
                 2012  |   1.580894   1.315624     1.20   0.230    -.9985838    4.160371
                 2013  |   1.416578   1.314432     1.08   0.281    -1.160562    3.993717
                 2014  |   1.647931   1.314443     1.25   0.210    -.9292296    4.225093
                 2015  |   1.485555   1.311599     1.13   0.257     -1.08603     4.05714
                 2016  |   1.297858   1.313196     0.99   0.323     -1.27686    3.872575
                 2017  |   1.646627   1.318895     1.25   0.212    -.9392623    4.232517
                 2018  |   2.093446   1.320044     1.59   0.113    -.4946966    4.681588
                 2019  |   2.134703   1.318116     1.62   0.105    -.4496602    4.719066
                 2020  |   2.654675   1.320301     2.01   0.044     .0660282    5.243323
                       |
                 _cons |    6.36834   1.039087     6.13   0.000     4.331055    8.405625
          ------------------------------------------------------------------------------
          My understanding is that 1990 is skipped so it is the base year and coefficients of years are w.r.t. to the coefficient of the year 1990 (defined by _cons). But you see the coefficient of year 2021 is also missing. So, I am not sure what happened to it and how can we interpret it?

          Thanks in advance.

          Best regards,
          Monit

          Comment


          • #6
            The most likely explanation is that the variable EOD itself defines a subset of the years. For example, if EOD distinguishes 2007 and before from 2008 and after, then EOD will be colinear with the year indicators, so either EOD or one of the year indicators must be omitted to identify the model. In short, it is mathematically impossible to estimate both an EOD effect and an effect for every year (even after removing the base category for i.year) in the same model. If you are not sure whether this is what is happening in your data, run -tab year EOD if e(sample)-. If every year's observations belong exclusively to one of the values of EOD, then this is what is going on.

            If not, please post back showing example data, and use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              Monit:
              as an aside to Clyde's helpful advice:
              1) you state that you're dealing with a panel dataset, but I cannot see the panel identifier in your regression equation;
              2) the most efficient way to code a panel data regression with -fe- specification is:
              Code:
              xtset panelid timevar
              xtreg &lt;depvar&gt; &lt;indepvar&gt; i.timevar, fe
              ;
              3) eventually, if you have=>30 panels, you may want to consider non-default standard errors (see -robust- or -vce(cluster panelid)- options. They do the very same job under -xtreg- (whereas their effect differ under -regress-).
              Last edited by Carlo Lazzaro; 14 Dec 2022, 13:59.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Thanks for the helpful comments Clyde Schechter
                Indeed the issue was with my data, and after fixing it I am able to get the coefficients of 2021 as well. Thanks for drawing my attention towards it. However, may I ask a follow up question on Carlo Lazzaro's comment?

                I am under the impression that FE regression and LSDV i.e. OLS with dummy variables are same. So if I do

                Code:
                reg PRICE EOD i.YEAR
                or

                Code:
                xtset _NR YEAR, yearly
                xtreg PRICE EOD i.YEAR, fe vce(cluster _NR)
                aren't these both the same? Or is there any difference between the two estimations?

                Below is an excerpt of my data (_NR is the panel identifier)

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input int(_NR YEAR) double(PRICE EOD)
                2 1992 23.27596153374564  200.852219772323
                2 1993 28.00942102194265 367.2792250775811
                2 1994 16.32684291273761 586.4163396442615
                2 1995 17.85800219311415 750.6044491788258
                2 1996 17.21755478692065 1009.977274838206
                2 1997 13.29635341840615 717.3800477456734
                end
                format %ty YEAR
                Thanks in advance for your help.

                Regards,
                Monit

                Comment


                • #9
                  Monit:
                  1) my prevoius chink of code should have been:
                  Code:
                   
                   xtreg depvar indepvars; i.timevar, fe
                  2) the two codes you mention actually give back the same results for that shared coefficients, as we can see from the following toy-example:
                  Code:
                  . use "https://www.stata-press.com/data/r17/nlswork.dta"
                  (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                  
                  . regress ln_wage c.age##c.age i.year i.idcode if idcode<=3
                  
                        Source |       SS           df       MS      Number of obs   =        39
                  -------------+----------------------------------   F(18, 20)       =      4.86
                         Model |  4.21278813        18  .234043785   Prob > F        =    0.0005
                      Residual |  .962950828        20  .048147541   R-squared       =    0.8139
                  -------------+----------------------------------   Adj R-squared   =    0.6465
                         Total |  5.17573896        38  .136203657   Root MSE        =    .21943
                  
                  ------------------------------------------------------------------------------
                       ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                           age |   .0773019   .2865219     0.27   0.790    -.5203723    .6749761
                               |
                   c.age#c.age |  -.0045583   .0012212    -3.73   0.001    -.0071057    -.002011
                               |
                          year |
                           69  |   .3367906   .4335876     0.78   0.446    -.5676572    1.241238
                           70  |   .2089384   .6771373     0.31   0.761    -1.203545    1.621422
                           71  |   .3144116   .9610926     0.33   0.747    -1.690392    2.319216
                           72  |   .5888124   1.253657     0.47   0.644     -2.02627    3.203894
                           73  |   .8912873   1.550825     0.57   0.572    -2.343676    4.126251
                           75  |   1.246958   2.152898     0.58   0.569    -3.243908    5.737823
                           77  |   1.560689   2.761762     0.57   0.578    -4.200247    7.321624
                           78  |   1.941522   3.068213     0.63   0.534    -4.458659    8.341703
                           80  |    2.34498   3.684737     0.64   0.532    -5.341247    10.03121
                           82  |   2.698954   4.315145     0.63   0.539     -6.30228    11.70019
                           83  |   2.994437   4.618087     0.65   0.524    -6.638723     12.6276
                           85  |   3.538578   5.245889     0.67   0.508    -7.404154    14.48131
                           87  |   3.965153   5.878139     0.67   0.508    -8.296429    16.22674
                           88  |    4.40786   6.407149     0.69   0.499    -8.957218    17.77294
                               |
                        idcode |
                            2  |  -.4183815   .0918256    -4.56   0.000    -.6099263   -.2268366
                            3  |   .6579353   1.834332     0.36   0.724    -3.168414    4.484284
                               |
                         _cons |   1.341224   4.651269     0.29   0.776    -8.361153     11.0436
                  ------------------------------------------------------------------------------
                  
                  . xtset idcode year
                  
                  Panel variable: idcode (unbalanced)
                   Time variable: year, 68 to 88, but with gaps
                           Delta: 1 unit
                  
                  . xtreg ln_wage c.age##c.age i.year if idcode<=3, fe
                  
                  Fixed-effects (within) regression               Number of obs     =         39
                  Group variable: idcode                          Number of groups  =          3
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.7404                                         min =         12
                       Between = 0.4068                                         avg =       13.0
                       Overall = 0.4014                                         max =         15
                  
                                                                  F(16,20)          =       3.57
                  corr(u_i, Xb) = -0.8560                         Prob > F          =     0.0042
                  
                  ------------------------------------------------------------------------------
                       ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                           age |   .0773019   .2865219     0.27   0.790    -.5203723    .6749761
                               |
                   c.age#c.age |  -.0045583   .0012212    -3.73   0.001    -.0071057    -.002011
                               |
                          year |
                           69  |   .3367906   .4335876     0.78   0.446    -.5676572    1.241238
                           70  |   .2089384   .6771373     0.31   0.761    -1.203545    1.621422
                           71  |   .3144116   .9610926     0.33   0.747    -1.690392    2.319216
                           72  |   .5888124   1.253657     0.47   0.644     -2.02627    3.203894
                           73  |   .8912873   1.550825     0.57   0.572    -2.343676    4.126251
                           75  |   1.246958   2.152898     0.58   0.569    -3.243908    5.737823
                           77  |   1.560689   2.761762     0.57   0.578    -4.200247    7.321624
                           78  |   1.941522   3.068213     0.63   0.534    -4.458659    8.341703
                           80  |    2.34498   3.684737     0.64   0.532    -5.341247    10.03121
                           82  |   2.698954   4.315145     0.63   0.539     -6.30228    11.70019
                           83  |   2.994437   4.618087     0.65   0.524    -6.638723     12.6276
                           85  |   3.538578   5.245889     0.67   0.508    -7.404154    14.48131
                           87  |   3.965153   5.878139     0.67   0.508    -8.296429    16.22674
                           88  |    4.40786   6.407149     0.69   0.499    -8.957218    17.77294
                               |
                         _cons |   1.465543   5.342682     0.27   0.787    -9.679096    12.61018
                  -------------+----------------------------------------------------------------
                       sigma_u |  .54258328
                       sigma_e |  .21942548
                           rho |  .85944136   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  F test that all u_i=0: F(2, 20) = 10.43                      Prob > F = 0.0008
                  
                  .
                  That said, if you have a panel dataset with a continuous regressand, you first choice should be -xtreg,fe-.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X