Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted variable bias

    In my linear regression I get a note that says omitted because of collinearity with a specific independent variable. This is a continuous variable that represents the number of years of work experience. Does anyone know how I can fix this?

  • #2
    Marleen:
    perfect collinearity arises when two predictors basically tell the same thing.
    As the main aim of regression is desentangling the contribution of each predictor (when adjusted for the remaining ones) to explain variation in the regressand, perfect collinearity does not help in this respect: that's why Stata omits one of the variable by default.
    As per above, the omitted variable shoud have a partner in crime that got away with omission and was retained in the right-hand side of the regression equation: in sum, if you want the omitted variable to be retained in the set of predictors, you should change your regression specification.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Marleen,

      It would probably help if you posted the Stata command you are using. If Stata is omitting something due to collinearity then this cannot be "fixed", it's not possible for a regression to produce a coefficient on a variable which is perfectly collinear. However, you can change what variable is omitted (out of the two) as chosen by Stata. If you use the i.var prefix then you can change this to ib#.var where # is a number, to change which variable is omitted. However I suspect your code isn't written like this if you have a continuous var?

      Correction: when I say "which var is omitted", in the i.var specification I mean which dummy (e.g. male/female) is used as the baseline.

      Best Rhys

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Marleen:
        perfect collinearity arises when two predictors basically tell the same thing.
        As the main aim of regression is desentangling the contribution of each predictor (when adjusted for the remaining ones) to explain variation in the regressand, perfect collinearity does not help in this respect: that's why Stata omits one of the variable by default.
        As per above, the omitted variable shoud have a partner in crime that got away with omission and was retained in the right-hand side of the regression equation: in sum, if you want the omitted variable to be retained in the set of predictors, you should change your regression specification.
        Thank you Carlo. I will try to change my regression specification.

        Comment


        • #5
          Originally posted by Rhys Williams View Post
          Hi Marleen,

          It would probably help if you posted the Stata command you are using. If Stata is omitting something due to collinearity then this cannot be "fixed", it's not possible for a regression to produce a coefficient on a variable which is perfectly collinear. However, you can change what variable is omitted (out of the two) as chosen by Stata. If you use the i.var prefix then you can change this to ib#.var where # is a number, to change which variable is omitted. However I suspect your code isn't written like this if you have a continuous var?

          Correction: when I say "which var is omitted", in the i.var specification I mean which dummy (e.g. male/female) is used as the baseline.

          Best Rhys
          Hi Rhys,

          The Stata command that I'm using is the following:

          regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)##c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud, vce(robust)

          The note that I'm getting has to do with the independent variable tc3g04b. I can't use the i.var prefix, because my variable tc3g04b is a contiuous variable and not a categorical one.

          Comment


          • #6
            Hi Marleen,

            You seem to be interacting a set of variables on tc3g04b using the double hashtag syntax which should include the independent var and the interactions. Therefore I think the second appearance of tc3g04b is superfluous.

            To confirm this. Change the ## to a single # and you should find that the note disappears?

            Best Rhys

            Comment


            • #7
              Originally posted by Rhys Williams View Post
              Hi Marleen,

              You seem to be interacting a set of variables on tc3g04b using the double hashtag syntax which should include the independent var and the interactions. Therefore I think the second appearance of tc3g04b is superfluous.

              To confirm this. Change the ## to a single # and you should find that the note disappears?

              Best Rhys
              The note disappeared when using a single # in stead of a double one, thank you!!

              My lineair regression is the following:

              regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud, vce(robust)

              The second appearance of tc3g04b is my control variable.

              I'm wondering if I'm missing out on some important data now that I'm using a single # in stead of a double one?

              Comment


              • #8
                When you included a double hashtag you were basically including the control variable twice. By using a single # you have created an interaction term between i.t3pautc (etc) and c.tc3g04b and then separately a control for tc3g04b.

                By using ## you include a control variable term AND tc3g04b and are therefore including the variable twice (which is why it is omitted).

                So before, your Stata command was misspecified (but the results are unaffected - Stata knew you had misspecified and had omitted).

                You can look more into the ## notation documentation if this seems confusing.

                Best,
                Rhys

                Comment


                • #9
                  Originally posted by Rhys Williams View Post
                  When you included a double hashtag you were basically including the control variable twice. By using a single # you have created an interaction term between i.t3pautc (etc) and c.tc3g04b and then separately a control for tc3g04b.

                  By using ## you include a control variable term AND tc3g04b and are therefore including the variable twice (which is why it is omitted).

                  So before, your Stata command was misspecified (but the results are unaffected - Stata knew you had misspecified and had omitted).

                  You can look more into the ## notation documentation if this seems confusing.

                  Best,
                  Rhys
                  Hi Rhys,

                  Thank you for your help!

                  The above regression was meant for the country Belgium. If I do the same regression for Estonia then I get the note 'omitted because of collinearity' for tc3g05_a with question 7. If I use a single # here instead of a double, the problem does not change here. Any idea why this is?

                  My Stata command was:

                  regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud , vce(robust)

                  Comment


                  • #10
                    Marleen:
                    without further details on your code, you seem to have too many predictors: no wonder that you came across perfect collinearity issues.
                    As an aside: why not posting also what Stata gave ypu back (via CODE delimiters, please)? Thanks.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Originally posted by Marleen Yaramis View Post

                      Hi Rhys,

                      Thank you for your help!

                      The above regression was meant for the country Belgium. If I do the same regression for Estonia then I get the note 'omitted because of collinearity' for tc3g05_a with question 7. If I use a single # here instead of a double, the problem does not change here. Any idea why this is?

                      My Stata command was:

                      regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud , vce(robust)
                      Again, it looks like you have included the variable twice.You are misspecifying the regression. You shouldn't be including it twice. Does the following work?:

                      Code:
                      regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud , vce(robust)
                      If not, as Carlo Lazzaro says, please post your Stata output so we can see what is happening.

                      Best,
                      Rhys

                      Comment


                      • #12
                        regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud , vce(robust)
                        tc3g05 was included twice. The first term means "c.t3pwload i.tc3g05 c.t3pwload#tc3g05" and the second term is again "i.tc3g05".

                        Perhaps step back and try the following simpler examples/exercises. Compare the outputs carefully:

                        Code:
                        reg t3pjobsa i.tc3g45_a
                        reg t3pjobsa i.tc3g45_a i.tc3g45_a // duplicated predictors
                        reg t3pjobsa i.tc3g45_a i.tc3g26f_a
                        reg t3pjobsa i.tc3g45_a i.tc3g26f_a i.tc3g45_a#i.tc3g26f_a
                        reg t3pjobsa i.tc3g45_a##i.tc3g26f_a // equivalent to the one above
                        reg t3pjobsa i.tc3g45_a i.tc3g26f_a i.tc3g45_a##i.tc3g26f_a // duplicates created

                        Comment


                        • #13
                          Originally posted by Ken Chui View Post

                          tc3g05 was included twice. The first term means "c.t3pwload i.tc3g05 c.t3pwload#tc3g05" and the second term is again "i.tc3g05".

                          Perhaps step back and try the following simpler examples/exercises. Compare the outputs carefully:

                          Code:
                          reg t3pjobsa i.tc3g45_a
                          reg t3pjobsa i.tc3g45_a i.tc3g45_a // duplicated predictors
                          reg t3pjobsa i.tc3g45_a i.tc3g26f_a
                          reg t3pjobsa i.tc3g45_a i.tc3g26f_a i.tc3g45_a#i.tc3g26f_a
                          reg t3pjobsa i.tc3g45_a##i.tc3g26f_a // equivalent to the one above
                          reg t3pjobsa i.tc3g45_a i.tc3g26f_a i.tc3g45_a##i.tc3g26f_a // duplicates created
                          I'm sorry but I made a mistake. It is the following bold variables that cause a problem in my model:

                          regress t3pjobsa i.tc3g45a_a i.tc3g26f_a i.tc3g26g_a i.tc3g26k_a i.tc3g45c_a i.tc3g29e i.tc3g29f i.tc3g29i i.tc3g29j (i.t3pautc i.t3pauts i.t3pautb i.t3pautp i.t3pauti)#c.tc3g04b c.t3pwload##tc3g05 (i.tc3g07*)##i.tc3g03_a i.tc3g01 i.pragegr i.tc3g03_a tc3g04b i.tc3g05 i.schloc i.tc3g12 i.nenrstud, vce(robust)

                          The notes that Stata give me are the following:

                          note: 2.tc3g07c#0b.tc3g03_a identifies no observations in the sample
                          note: 2.tc3g07c#1.tc3g03_a omitted because of collinearity
                          note: 2.tc3g07i#1.tc3g03_a omitted because of collinearity
                          note: 2.tc3g07j#1.tc3g03_a omitted because of collinearity

                          Comment


                          • #14
                            That'd depend on what tc3g07j* is. It could be that everyone picked all "0" or all "1" in i, j, and one more option, and that caused i and j to be redundant and omitted.

                            In any case, if you start seeing errors like such, chance is your data does not have the variability or even sample size for this highly complex model. It may be necessary to step back from the regression and check if some of these categorical variables can be collapsed into smaller number of groups, or some variables be reduced, etc.

                            Comment


                            • #15
                              Marleen:
                              set aside technicalities for a while, it seems that you're asking too much out of your data.
                              I would recommend to narrow the zoom of your camera and go for a more parsimonius model: regression is not a gun machine sptitting out coefficients (that should mandatorily be statistically significant ), but simply a way to give a fair and true view of the data generating process that you're investigating.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X