This is a continuation of this post, where I was in the early stages of developing a forward selection algorithm for my fdid command. Here is a relevant subset of code I wish to focus on.
My code estimates as many final DID models as there are control units. The first iteration estimates (in this case) 16 submodels.. It selects the control that maximizes the pre-intervention R-squared. The R-squared for this single control DID model is the first candidate DID model. The second iteration includes the first selected unit in DID regression, and finds the next unit which maximizes the pre-intervention R-square. These two units now comprise the second candidate DID model. ... and so on. The output of the code is below, showing each potential candidate model and the R-Squared Statistics of each of them.
We can see that the second iteration has the highest R-squared statistic of the N0 control units (16 in this case). My question here is this: is there any way to keep track of what the optimal control group is (where optimal again is defined as the candidate model with the highest pre-treatment R-squared)? That way, at the end of this loop, we would be left with (say)
, as it is the best of the candidate DID models. The reason I am asking is mainly for efficiency reasons, as the way I currently get around this problem involves another iteration of looping. And... before I send my paper off to Stata Journal, I wish to have everything optimized as much as possible. How might I do this?
Code:
clear * webuse set "https://raw.githubusercontent.com/jgreathouse9/FDIDTutorial/main" webuse basque, clear keep id year gdp cls // Forward DID algorithm reshape wide gdp, j(id) i(year) qui: tsset year loc time_format: di r(tsfmt) order gdpcap5, a(year) loc interdate= 1975 qui ds loc temp: word 1 of `r(varlist)' loc time: disp "`temp'" loc t: word 2 of `r(varlist)' loc treated_unit: disp "`t'" local strings `r(varlist)' local trtime `treated_unit' `time' local predictors: list strings- trtime loc N0: word count `predictors' // Set U is a now empty set. It denotes the order of all units whose values, // when added to DID maximize the pre-period r-squared. local U // void // We use the mean of the y control units as our predictor in DID regression. // cfp= counterfactual predictions, used to calculate the R2/fit metrics. tempvar ym cfp tss // Here is the placeholder for max r2. tempname max_r2 // Forward Selection Algorithm ... qui summarize `treated_unit' local mean_observed = r(mean) * Calculate the Total Sum of Squares (TSS) qui generate double `tss' = (`treated_unit' - `mean_observed')^2 if `time' < `interdate' qui summarize `tss' local TSS = r(sum) qui while ("`predictors'" != "") { scalar `max_r2' = -99999999999 foreach var of local predictors { // Drops these, as we need them for each R2 calculation cap drop rss { // We take the mean of each element of set U and each new predictor. egen `ym' = rowmean(`U' `var') // The coefficient for the control average has to be 1. constraint 1 `ym' = 1 qui cnsreg `treated_unit' `ym' if `time' < `interdate' , constraint(1) // We predict our counterfactual qui predict `cfp' if e(sample) // Now we calculate the pre-intervention R2 statistic. * Calculate the Residual Sum of Squares (RSS) qui generate double rss = (`treated_unit' - `cfp')^2 if e(sample) qui summarize rss local RSS = r(sum) loc r2 = 1 - (`RSS' / `TSS') if `r2' > scalar(`max_r2') { scalar `max_r2' = `r2' local new_U `var' } // Here we determine which unit's values maximize the r2. drop `ym' drop `cfp' // We get rid of these now as they've served their purpose. } } local U `U' `new_U' // We add the newly selected unit to U. noi di "`U':" scalar(`max_r2') // and get rid of it from the predictors list. local predictors : list predictors - new_U }
Code:
gdpcap10:.99730678 gdpcap10 gdpcap2:.99800302 gdpcap10 gdpcap2 gdpcap17:.99792807 gdpcap10 gdpcap2 gdpcap17 gdpcap16:.99779755 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3:.99754814 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14:.99728229 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15:.99707867 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7:.99676251 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4:.99692392 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8:.9968791 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11:.99658634 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11 gdpcap1:.99599144 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11 gdpcap1 gdpcap6:.99544239 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11 gdpcap1 gdpcap6 gdpcap9:.99478396 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11 gdpcap1 gdpcap6 gdpcap9 gdpcap13:.99409127 gdpcap10 gdpcap2 gdpcap17 gdpcap16 gdpcap3 gdpcap14 gdpcap15 gdpcap7 gdpcap4 gdpcap8 gdpcap11 gdpcap1 gdpcap6 gdpcap9 gdpcap13 gdpcap12:.99279694
Code:
loc bestmodel gdpcap10 gdpcap2
Comment