Hi there,
I realize questions like this come up a lot, but I couldn't find answers that suited exactly what's going on for me.
I've got an instrumental variables setup with two continuous instruments Z1 and Z2, and a few exogenous variables W, in the first stage. I'm instrumenting for a count variable, and in the interest of precision, want the first stage to be a Poisson regression.
Now, between the first stage and the second stage, I need to sum up the predicted values from the first stage, because my model in the second stage is at an aggregated level vis-a-vis the first. In particular, the first stage instruments for a sort of trade flow between each pair of states, so it's at the level of the source state, destination state, and year. In the second stage, I want to estimate the impact of the total flow into the destination state on an outcome variable.
What I've been trying so far, based on a combination of earlier statalist posts--especially this one--is something like the following:
At this point I get the following error:
What am I doing wrong? I'm pretty sure/I've read that I can just do a linear first stage instead, and the normal 2SLS will get the job done. I also remember reading somewhere though (can't find the link) that there's more precision/efficiency/something if you estimate the first stage in its "natural" non-linear way. (Again, the endogenous variable, trade flow, is a count variable.)
I really appreciate any and all help you could offer.
Best,
Isaac
I realize questions like this come up a lot, but I couldn't find answers that suited exactly what's going on for me.
I've got an instrumental variables setup with two continuous instruments Z1 and Z2, and a few exogenous variables W, in the first stage. I'm instrumenting for a count variable, and in the interest of precision, want the first stage to be a Poisson regression.
Now, between the first stage and the second stage, I need to sum up the predicted values from the first stage, because my model in the second stage is at an aggregated level vis-a-vis the first. In particular, the first stage instruments for a sort of trade flow between each pair of states, so it's at the level of the source state, destination state, and year. In the second stage, I want to estimate the impact of the total flow into the destination state on an outcome variable.
What I've been trying so far, based on a combination of earlier statalist posts--especially this one--is something like the following:
Code:
xtset src_des_num year // sets panel, where panel variable is source-destination combination xtpoisson flow `stage1_covars' i.year, fe vce(robust) // list of stage1_covars has been defined elsewhere and includes the two instruments predict flow_hat // get the Poisson-estimated values * Now I need to collapse to the destination state level collapse (sum) flow_hat /// total flow into state (max) log_gdp_des log_pc_des officer_rate_dest /// these are constant within destination state and year (mean) log_gdp_src log_pc_src norm_score_source officer_rate_source, /// these need to be averaged over source states or they don't make sense in the second stage by(dest_state year) // final dataset is at destination state-year level merge 1:1 dest_state year using "[outcome dataset]", nogen * Set the new panel egen dest_state_num = group(dest_state) xtset dest_state_num year * Make new variable list local stage2_covars log_gdp_des log_pc_des officer_rate_dest log_gdp_src log_pc_src norm_score_source officer_rate_source * Follow instructions from statalist post ivregress 2sls log_homic_rate `stage2_covars' i.year i.dest_state (flow = flow_hat), vce(cluster dest_state)
Code:
flow_hat included in both endogenous and excluded exogenous variable lists r(498);
I really appreciate any and all help you could offer.
Best,
Isaac
Comment