Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2SLS with Poisson first stage, and summation of predicted first-stage values

    Hi there,

    I realize questions like this come up a lot, but I couldn't find answers that suited exactly what's going on for me.

    I've got an instrumental variables setup with two continuous instruments Z1 and Z2, and a few exogenous variables W, in the first stage. I'm instrumenting for a count variable, and in the interest of precision, want the first stage to be a Poisson regression.

    Now, between the first stage and the second stage, I need to sum up the predicted values from the first stage, because my model in the second stage is at an aggregated level vis-a-vis the first. In particular, the first stage instruments for a sort of trade flow between each pair of states, so it's at the level of the source state, destination state, and year. In the second stage, I want to estimate the impact of the total flow into the destination state on an outcome variable.

    What I've been trying so far, based on a combination of earlier statalist posts--especially this one--is something like the following:

    Code:
    xtset src_des_num year // sets panel, where panel variable is source-destination combination
    xtpoisson flow `stage1_covars' i.year, fe vce(robust) // list of stage1_covars has been defined elsewhere and includes the two instruments
    predict flow_hat // get the Poisson-estimated values
    
    * Now I need to collapse to the destination state level
    collapse (sum) flow_hat /// total flow into state
                  (max) log_gdp_des log_pc_des officer_rate_dest /// these are constant within destination state and year
                  (mean) log_gdp_src log_pc_src norm_score_source officer_rate_source, /// these need to be averaged over source states or they don't make sense in the second stage
                  by(dest_state year) // final dataset is at destination state-year level
    
    merge 1:1 dest_state year using "[outcome dataset]", nogen
    
    * Set the new panel
    egen dest_state_num = group(dest_state)
    xtset dest_state_num year
    
    * Make new variable list
    local stage2_covars log_gdp_des log_pc_des officer_rate_dest log_gdp_src log_pc_src norm_score_source officer_rate_source
    
    * Follow instructions from statalist post
    ivregress 2sls log_homic_rate `stage2_covars' i.year i.dest_state (flow = flow_hat), vce(cluster dest_state)
    At this point I get the following error:

    Code:
    flow_hat included in both endogenous and excluded exogenous variable lists
    r(498);
    What am I doing wrong? I'm pretty sure/I've read that I can just do a linear first stage instead, and the normal 2SLS will get the job done. I also remember reading somewhere though (can't find the link) that there's more precision/efficiency/something if you estimate the first stage in its "natural" non-linear way. (Again, the endogenous variable, trade flow, is a count variable.)

    I really appreciate any and all help you could offer.

    Best,
    Isaac

  • #2
    Dear Isaac,

    I am not familiar with some of the things you are doing so I do not know what is causing the error. Anyway, I would start by seeing if the same happens if you use -ivreg2- (available from SSC).

    However, I think you need to think whether you really want to do the "first stage" using FE. The predictions you get with the FE estimator do not include the fixed effects and therefore this may not be a good instrument. Also, wouldn't it be preferable to run the Poisson regression on the aggregate data rather than aggregating the predictions?

    Best wishes,

    Joao

    Comment


    • #3
      João,

      Thanks for your response. Actually, your 2006 gravity paper with Tenreyro was instrumental (no pun intended) in this formulation, so this is a small world indeed!

      I'm not sure I follow what you're saying about FE. Are you sure Stata does not include the fixed effects? I'm looking at -help xtpoisson postestimation-, under predict -> RE/FE statistic, and it seems like there's an option to assume the fixed effect is zero, but by default that doesn't hold. Please correct me if I'm wrong on this. I would like to include source-destination fixed effects in the first stage, because they capture important time-invariant pairwise factors--especially, distance, contiguity, and remoteness. I could control for them separately, but this method would capture those factors and more, so it seems preferable. Again, please do correct me if that's not the case.

      As for running the Poisson regression on the aggregate data, the issue with that is that the outcome variable is not at the source-destination-year level, but at the destination-year level only. Thus if I were to run the analysis without having aggregated the data, there'd be 50 repeated outcome observations for each destination-year. Does that make sense? Or are you asking something different?

      Update: I think I see what you're asking. I can't run the Poisson regression on the aggregate flow data because the flows are (assumed to be) endogenous.

      Let me know what you think.

      Very best,
      Isaac

      Comment


      • #4
        Update: I figured it out. The issue was that I had forgotten to include the variable flow in the -collapse- command, so it got dropped. When I later referred to flow as the endogenous variable in the -ivregress- command, Stata read it as an abbreviation of flow_hat, and thought I was instrumenting for flow_hat using flow_hat. I corrected the problem and now things are working out. (João, I'm still curious about your opinion on the other questions.)

        Thanks everybody.

        Best,
        Isaac

        Comment


        • #5
          Dear Isaac,

          I am glad you sorted it out. About the other things:

          - I believe that by default, after -xtpoisson- with FE, predict just gives you the linear index without the fixed effects. So, this is not a good predictor of the number of events.

          - You are doing the Poisson regression to get a good instrument for flow, but in your 2SLS regression flow enters at the more aggregate level. So, you need a good predictor for the aggregate flow and the best way to get that is to do the appropriate non-linear regression at the aggregate level.

          Best wishes,

          Joao

          Comment


          • #6
            -I'll check on the -xtpoisson- postestimation thing. Assuming you're right though, is there a way to get predictions including the fixed effects?

            -My assumption was that if the pair-level trade flow is instrumented for, and thus the predicted pair-level flow is exogenous with respect to the outcome variable, then the sum of pair-level flows for a given destination state is also exogenous with respect to the outcome variable. Is that not the case? In general, when there's an exogenous independent variable, linear transformations of the variable should be fine to use as well, shouldn't they?
            The reasons I'd like to do the first stage at the pair level rather than the aggregate level are that a) then I can include pair-fixed effects; b) if I do it at the aggregate level I'm forced to use the average of all other states' policies to instrument for the trade flow, whereas the pair level I can instrument by each source state's policies.

            Again, assuming the right chain of causality holds, is there a problem with this? Intuitively, I'm assuming that for each destination state, regulations in each source state affect the destination state's outcome variable only through these regulations' effect on the trade flow between the two states. It's okay if different states react to each other's regulations by changing their own regulations, because for the data I'm looking at, each pair's trade flow is still ultimately determined only by those two states' regulations. (For the data I'm looking at, that's a fairly safe assumption.)

            Very curious to hear your thoughts.

            Best,
            Isaac

            Comment


            • #7
              Dear Isaac,

              I think there are two problems here:

              - One problem is that what you are doing does not look like the best way to get an instrument for the aggregate flow. In principle you get a better (stronger) instrument if you run the regression at the aggregate level than if you aggregate the fitted values.

              - As I said, I do not think that using a FE Poisson regression to obtain the instruments is a good idea. Even if you get the fitted values including the estimates of the FE there is the problem that the estimated FE are obtained only with T observations. This may affect the properties of the instrument.

              Best wishes,

              Joao

              Comment


              • #8
                Joao,

                Doing the first stage at the aggregate level drops the number of observations from 12,250 (49 source states * 50 destination states * 5 years), down to 250 (50 destination state * 5 years). Also, at the aggregate level, as I mentioned above, I have to use the average of the source state variables, rather than have an individual observation for each one. The upshot is that the (admittedly preliminary) first-stage results are much weaker at the aggregate level--unless I'm misunderstanding what you mean by "strength". (I'm going by the first stage's F stat, R-squared, and t-stats, which are all higher at the disaggregated level.) Is there an inbetween solution--running it at the finer level, and then regressing the total flow on the individual flows, perhaps? Is that doable?

                As for the fixed effects, fair enough, that makes sense.

                (I should also mention that all this is not for a project that I intend to complete right now: It's for a grant proposal I'm writing, and the actual project will include more variables and more years of data. Right now I would just like to be able to see, at least in a preliminary way, whether the first stage is significant, so that I can include and qualify that in the proposal.)

                Best,
                Isaac

                Comment


                • #9
                  Dear Isaac,

                  Your Poisson regression is not the 1st stage in the 2SLS, it is a preliminary step to construct an instrument. What matters for the strength of the instrument is not what you find in the Poisson regression but in the true first stage. My guess is that you will get a stronger instrument in the true first stage if your Poisson regression is done at the aggregate level because that is the level that matters for the 1st stage.

                  Best wishes,

                  Joao

                  Comment


                  • #10
                    Joao,

                    Hmmm, okay, but does that still hold if don't use Poisson at all? That is, let's say instead of using Poisson regression to construct an instrument, I simply get the fitted flows linearly, using -xtreg- or -reg-. (This way, it actually would be a "true first stage," correct?) And let's say I find that these linear results are stronger at the disaggregated level. Then does it mean the instrument is stronger when I don't disaggregate?

                    Best,
                    Isaac

                    Comment


                    • #11
                      Isaac,

                      The strength of the instrument can only be measured at the level you estimate: the aggregate level.

                      All best wishes,

                      Joao

                      Comment

                      Working...
                      X