Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do we not worry about omitted variable bias in the first stage of a 2SLS? What if they change the outcome of the estimates?

    This question has been bothering me for quite a while. The reason why it bothered me is the following:

    When adding an (exogenous) variable to my regression/structural equation, I noticed that, although it hardly affected my dependent variable in the second stage, it had quite a large effect on my first stage (the IV).
    Although from theory, it seems to only matter what happens in the second stage (because the IV has to be uncorrelated with u in the second stage), it felt weird, removing this variable again, because it appeared to be important in the first stage and also affect my IV estimate.

    It made me wonder. If absolutely no one talks about the variables in the first stage (i.e Levitt 1997, Wooldridge Introductory Econometrics (fifth edition), Wooldridge Econometric Analysis of Cross Section and Panel Data (2010), what do I do with the knowledge that it affects my estimates?
    In particular because it appears to me that if I remove this variable again, it can have a big impact on the fitted values of the first stage that I transfer to the second stage..

    Any enlightening comments would be very much appreciated..


  • #2
    Without seeing exactly what you have done, it is hard to speculate regarding what has happened.

    For a starter, you need to include your relevant exogenous controls both in your first and second stage.

    Comment


    • #3
      Joro Kolev Thank you for your comment. That I know and that is what I did. My question is about an exogenous variable that has hardly any effect in the second stage, but appears to be important in the first stage (so it appears in both stages). Theory would suggest to remove it, because apparently it does not belong in the structural equation. What I am interested in is the deeper logic about why it does not matter that this variable is important in the first stage. I hope that clarifies things a little bit..

      Maybe I asked the question a bit better here: https://stats.stackexchange.com/ques...-of-an-iv-2sls
      Last edited by Tom Kisters; 06 Apr 2021, 02:48.

      Comment


      • #4
        Dear Tom Kisters,

        Indeed, your question is Stackexchange is much more clear. I think the key to understand this is to understand the different nature of the two steps. The second step is a structural model and you want to estimate a particular structural parameter and you need to find an estimator that identifies it. the first stage is just a linear projection and the parameters there are defined in a way that make the errors uncorrelated with the regressors. So, there is no omitted variables problem in the first stage because we define the parameters as a function of the included regressors only. Now, in your case, it looks as if there is a variable that does not matter for the second stage but matters for the first stage; that sounds like an instrument, right? Maybe you should treat it as such?

        Best wishes,

        Joao

        Comment


        • #5
          Joao Santos Silva Thank you very much for your explanation! (I have small remaining question below).

          Now, in your case, it looks as if there is a variable that does not matter for the second stage but matters for the first stage; that sounds like an instrument, right? Maybe you should treat it as such?
          Very good point haha..

          If I could ask for one small additional thing (and my apologies if this question arises, from (although believing you) still not properly understanding). If I would, only as a thought experiment, leave this (potential instrumental) variable in the way it is. Would this alter anything?
          As an example, because it explains variation in the first stage, I could imagine that the fitted portion of the first stage rises, and the residuals fall. Is this given, kind of like old fashioned scales, automatically balanced in the second stage?
          There is no way that it could strengthen the instrument for example? Or do the opposite, create noise?

          My apologies, perhaps knowing that it does not matter should be enough, but my overly curious brain just does not want to let it go. Now that I have a chance, I cannot leave it and have to ask haha.

          Comment


          • #6
            Dear Tom Kisters,

            I am not sure if I understand what you mean, but including a potential instrument as an explanatory variable may reduce the efficiency of your estimator, but it will not make it inconsistent.

            Best wishes,

            Joao

            Comment


            • #7
              Joao Santos Silva That answers my question, thank you very much!

              Comment


              • #8
                Joao Santos Silva Dear Professor,

                You have helped me a lot with this question, but I am noticing that there is still one thing lingering in my mind. I thought of an example to make my issue clear.

                I want to estimate the effect of income on life satisfaction.

                I have an instrument for income, which is quite random, but not completely: Winning a small lottery. Although winning a lottery is quite random, whether you win is obviously dependent on whether you bought a ticket, and whether you bought a ticket is again dependent on a range of factors.
                For the sake of the argument (and because I could not come up with a better example), let's assume that the reason for buying a ticket are unrelated to life satisfaction (so that the IV-requirements are not violated). For example, you need to be eligible to buy a ticket, based on whether you age is odd or even and other exogenous factors.

                Now let's say that I assume to know how to control for the factors that influence whether someone buys a ticket (because I ASSUME to know which they are, I do not actually know).

                Now to finally get to the point, I want to control for this non-randomness to make my IV random..(which makes me come full circle).

                My initial thought was, to include an odd-age dummy to the structural equation. But I would do so, only so that it appears in the first stage (they have no actual use for explaining life satisfaction). You however already explained to me that there is no bias in the first stage, no this makes no sense..

                So what do I do then?

                As alternatives, I looked into a heckman correction, but that seems to only apply to samples and not to treatments. So I looked into propensity score matching. But I have not really been able to work out how I could use that either..

                I hope this question did not test your patience to much, and I would be extremely interested in your reply..




                EDIT: I just realised that if the reason for buying a lottery ticket is random, then there would be no problem in the first place. My apologies for not being able to come up with a better example.

                Is that maybe the answer to my own question? If the reasons for the self-selection bias are exogenous, then it is not an issue in the first place?

                EDIT II: And if it is endogenous to the dependent variable, then it should obviously have been in the structural equation in the first place..
                Last edited by Tom Kisters; 26 May 2021, 05:09.

                Comment


                • #9
                  Joao Santos Silva

                  So, there is no omitted variables problem in the first stage because we define the parameters as a function of the included regressors only.
                  Does this basically mean that we first construct our second stage (with all necessary controls) and from the second stage we construct our first stage (with the controls from the second stage)?

                  if there is a variable that does not matter for the second stage but matters for the first stage; that sounds like an instrument, right?
                  Not necessarily, right? Since, the instrument (Z) can has a direct effect on Y (Z->Y), while Z can only be a valid instrument if it has an indirect impact on Y via X (Z->X->Y).
                  Last edited by Michael Schuster; 02 Apr 2022, 08:22.

                  Comment


                  • #10
                    Originally posted by Michael Schuster View Post
                    Not necessarily, right? Since, the instrument (Z) can has a direct effect on Y (Z->Y), while Z can only be a valid instrument if it has an indirect impact on Y via X (Z->X->Y).
                    If Z has a direct effect on Y, then it matters for the the second stage, contradicting the assumption in the statement you are questioning. Joao's statement stands.
                    https://www.kripfganz.de/stata/

                    Comment


                    • #11
                      Sebastian Kripfganz
                      You mean it matters as control in the second stage, right? Or maybe you could elaborate more on this?

                      My question relates to something else. Suppose I have already found an instrument and now I want to form my first stage. However, I have identified other variables that have an influence on my X variable but (probably) no influence on Y. Therefore, I do not include them in my second stage. But do I have to include them in my first stage?

                      Comment


                      • #12
                        If you include those other variables in the first stage as well, but not in the second stage, they would also be instruments. You can include them but do not have to. Including them could potentially increase the efficiency of the estimator. The estimator is still consistent if you do not include them.
                        https://www.kripfganz.de/stata/

                        Comment


                        • #13
                          Dear Michael Schuster,

                          Sebastian is correct. If your additional variables probably do not influence Y that means you are not sure about their influence and in that case they need to be included in the second stage.
                          Note that in the 2SLS, the first stage is traditionally just a computational device to estimate the second stage. So, we typically define the second stage, and the first stage follows from that.

                          Best wishes,

                          Joao

                          Comment

                          Working...
                          X