Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I use time-invariant instrument variable and country fixed effects in pooled cross sectional analysis??

    Can I use time-invariant instrumental variable while controlling for country dummies in pooled cross-sectional analysis? Note: both instrument variable and endogenous variables are country specific and dependent variable is firm-level variable.
    I know the time-invariant variables will be "absorbed" by the fixed-effects approach. However, this is not the case when used as instrument in pooled cross-sectional analysis. The output shows coefficients for the instrument in first stage regression and provide expected results for endogenous variable in second stage. I have seen several papers that followed similar approach (invariant instrument+fixed effect+pooled cross-setional dataset), but i could not find any explanation why is this a correct approach ?
    Furthermore, does it really make sense to cluster standard errors on the country-level while including country-fixed effects?
    any explanation and reference to the papers will be appreciated thanks...

  • #2
    To get the best responses from Statalist, it is usually best to post a Minimum Working Example or at least a code fragment. See the FAQ. But from what you say my guess is that you are not using -xtivreg- or community-contributed -ivreghdfe-, but instead modeling the fixed effects using the LSDV approach. That is you are explicitly including a set of dummy variables in your structural equation, but not including them among your instruments. This seems wrong to me, because by assumption the fixed effects are exogenous and thus should be included as instruments in the first stage regression. But I and others would have a better idea of what's going on if you post more information.

    Yes, it can make sense to cluster the standard errors in the presence of country fixed effects. The fixed effect sweeps out the unobserved, country-specific heterogeneity, but does not address other possible sources of heteroskedasticy. Note that we use fixed effects to remove the bias in estimated coefficients , while we cluster standard errors to to remove "bias" in the standard errors of the coefficients.

    Comment


    • #3
      Many thaks for your reply... now i am clear about second question but can you please provide a little more explanation on my first question this is my code i am using... dataset is pooled cross-sectional... with time invariant instrumental variable and country fixed effects...
      is this correct approach?

      I know the time-invariant variables will be "absorbed" by the fixed-effects approach. However, this is not the case when used as instrument in pooled cross-sectional analysis. The output shows coefficients for the instrument in first stage regression and provide expected results for endogenous variable in second stage. I have seen several papers that followed similar approach (invariant instrument+fixed effect+pooled cross-setional dataset), but i could not find any explanation why is this a correct approach ?

      ivprobit y (x1=z) x2 x3 i.isic i.ccode, vce(cluster ccode)

      Comment


      • #4
        Your problem poses several challenges. (1) you believe that the variable x1 is endogenous and should be instrumented with the variable z; (2) you have more than one fixed effect, one for ccode (country code?) and one for isic (SIC cde?); (3) you believe that disturbance terms are correlated within ccodes (countries?), but apparently you are not concerned about possible correlation among the disturbance terms within the same -isic- (SIC category); (4) your dependent variable, "y", is binary.

        In my view, of these problems, the last one to worry about should be the fact that your binary variable is binary. If I were faced with your problem, I would begin by modeling your binary outcome variable, y, using a linear probability model. See if you can get to a model you like by using -ivreg2- or, since you have more than one fixed effect, using Sergio Correia 's -ivreghdfe- (which first demeans the variables and then applies -ivreg2-). For example, what does the following command yield?

        Code:
        ivreghdfe y (x1=z) x2 x3 , absorb(i.isic i.ccode) first
        Many econometricians consider the linear probability model sufficient for causal inference, especially when endogeneity or fixed effects are involved. As I remember, that's because the theorems about the consistency of IV estimators and fixed effects apply most directly to linear models, and only with strong additional assumptions and caveats to non-linear models. Only after you believe you have satisfactorily modeled -y- with a linear probability model, should you attempt to address the binary nature of -y-. And if I were a reviewer of your paper, I would ask you to compare the estimates from a linear probability model to any model that uses a probit or logit link function to accommodate the binary nature of your dependent variable. Your audience will probably want to see both models and might be more convinced by the linear probability results than by any probit or logit accommodation.

        Of course a big problem with linear probability models is that predictions or forecasts from the fitted model may produce estimated values of y outside the unit interval. If you are doing post-estimation simulations based for example on a hypothesized change in one of your exogenous variables, you might run into this problem.

        If you do decide to move on from your linear probability model to a model that accommodates the binary nature of your dependent variable, -ivprobit- might not get you where you want to go. Alternatives could be -gsem- or -cmp-. To use -gsem-, you will have to explicitly model endogeneity, perhaps following the advice of Paul Allison posted on his blog here or the discussion of endogeneity in Alan Acock 's 2013 book beginning on page 95 (reviewed here by Richard Williams ). On the other hand, because your model is recursive and fully observed, David Roodman 's program -cmp- should work. It allows you to use a probit link function, instrument the endogenous variable, cluster the residuals and include the two LSDV fixed effects, i.ccode and i.isic.

        Incidentally, if any Statalist participants know of references other than Paul Allison's blog that explicitly compare and contrast Structural Equation approaches to endogeneity with the econometrics literature on instrumental variables, I would be grateful for other references on this topic.

        Comment


        • #5
          thank you very much for your detailed answer Mead Over ... I will read all this content you mentioned and check my results with a models you suggested.. thank you sir

          Comment


          • #6
            Also, though I haven't tried it, perhaps -eprobit- would work for you. See the Stata blog here.

            Comment


            • #7
              I don't understand how this could work. I agree with:
              the time-invariant variables will be "absorbed" by the fixed-effects approach
              But I don't understand:
              However, this is not the case when used as instrument in pooled cross-sectional analysis.
              What is the basis for this statement?
              If you include an instrument that is collinear with the fixed country effects, a Stata program might drop one of the country dummies because of collinear, rather than dropping the time-invariant instrument. But at that point the only unique, identifying variation in the instrument would be the same as that in the dropped dummy: it would effectively be a country dummy with a different name, wouldn't it?

              --David

              Comment


              • #8
                I am really grateful for your reply David Roodman,
                However, this is not the case when used as instrument in pooled cross-sectional analysis
                what I mean from this statement is that it does not drop instrument variable in the first stage regression although its time-invariant variable... so you mean that the use of time-invariant instrument in this case is still valid because it is the country dummy which is dropped out of regression not instrument? I will be really grateful if you give a little more explanation

                Comment


                • #9
                  No, David's statement is that the time-invariant instrument does not help at all. Instead of dropping it, Stata might just randomly have dropped any of the country dummies; it could equally drop the time-invariant instrument instead without affecting the results. In your case, the time-invariant instrument just assumes the role of the dropped country dummy. The first-stage predicted values are the same whether you drop the time-invariant instrument or a country dummy (from the first stage), and hence the second-stage results are the same as well. The second-stage results should actually be underidentified. You should see one of the coefficients to be omitted. Again, Stata might randomly omit one of the coefficients of the country dummies instead of the coefficient of the endogenous regressor; but that does not mean that the latter has a meaningful interpretation.
                  https://www.kripfganz.de/stata/

                  Comment


                  • #10
                    Sebastian Kripfganz very well explained sir, understood... thank you so much

                    Comment


                    • #11
                      This is why we can't let Stata do identification analysis for us. It's clear that a time-constant IV cannot be used in fixed effects, so one shouldn't try. As Sebastian noted, Stata will drop collinear variables, but not always the one that it should. Whenever one does fixed effects manually, this can happen.

                      Comment


                      • #12
                        Thank you very much Jeff Wooldridge and other members for clarification and their best suggestions...

                        Comment


                        • #13
                          Hi everyone, I have a similar question. Can I use a time-invariant instrument (that is student ability predicted by teachers at the age of 7) to predict test scores at age 16? This is a panel dataset but unfortunately the ability reported by teachers is only included in the first wave of the cohort study. I am instrumenting private school attendance which varies across waves. At the moment this is the code I am running: xtivreg zmath (private = abilitycat) gender i.FathersSocialClass i.FathersEducation i.MothersEducation i.FathersInterest i.MothersInterest classize freemeals, vce(cluster id). I would be interested to also try random effects alter on. Many thanks in advance

                          Comment

                          Working...
                          X