I'm writing a synthetic control estimator using LASSO (SCUL) based on the original R code (which we thankfully don't need to get into very much at all). The goal of course is to predict the counterfactual for a given policy. Here I use the pre-intervention outcomes of donor-pool units as predictors of the pre-intervention outcomes for the treated unit. I impute the counterfactual via prediction, and graph the observed versus potential outcome over time, much as we currently do in synthetic control research. The sub-routine of interest in my command is the est_lasso routine, which uses the user-written cvlasso command from the lassopack
I'd like to generate prediction/confidence intervals and graph these as part of the Differences_1 graph. The utility of this is that such intervals allow myself and others to see how uncertain the estimates themselves are. However, doing this would demand a standard error, which doesn't appear to be available in this context.
Does anyone have any immediate reaction as to how I might do this?
Let's look at my full program (don't worry, I install all user written commands I use and so on, so it should work the first time).
with the code of interest here being the very last routine. Again, all I'd like to do here is simply generate uncertainty estimates for the counterfactual.
The code which runs the estimator itself for Proposition 99's dataset (publicly available from GitHub) is
I presume there's a way to do this. Any idea how I'd begin?
I'd like to generate prediction/confidence intervals and graph these as part of the Differences_1 graph. The utility of this is that such intervals allow myself and others to see how uncertain the estimates themselves are. However, doing this would demand a standard error, which doesn't appear to be available in this context.
Does anyone have any immediate reaction as to how I might do this?
Let's look at my full program (don't worry, I install all user written commands I use and so on, so it should work the first time).
Code:
***************************************************** version 14 // Stata SE set more off ***************************************************** * Programmer: Jared A. Greathouse * Institution: Georgia State University * Contact: [email protected] * Created on : Jan 2, 2022 * Contents: 1. Purpose * 2. Program Versions ***************************************************** * 1. Purpose /* */ * 2. Program cap prog drop scul // Drops previous iterations of the program *! SCUL v1.0.0, Jared Greathouse, 1/2/22 prog scul graph close _all graph drop _all * Installs needed commands loc package st0594 gr0034 dm0042_3 foreach x of loc package { // begin foreach qui: cap which cvlasso if _rc { // if command is missing qui: net inst `x'.pkg, replace } // ends if } // ends foreach loc comm gtools labvars schemepack foreach x of loc comm { // begin foreach qui: cap which `x' if _rc { // if command is missing qui: ssc inst `x', replace } // ends if } // ends foreach qui: xtset gl time: disp "`r(timevar)'" gl panel: disp "`r(panelvar)'" syntax anything, /// TReated(varname) /// We need a treatment variable as 0 1 ahead(numlist min=1 max=1 >=1 int) // Number of forecasting periods. gettoken depvar anything: anything local y_lab: variable label `depvar' gl outlab: disp "`y_lab'" /********************************************************** * Pre-Processing* **********************************************************/ preserve numcheck, unit($panel) time($time) depvar(`depvar') // Routine 1 treatprocess, treat(`treated') time($time) unit($panel) // Routine 2 data_org, unit($panel) time($time) depvar(`depvar') treat(`treated') // Routine 3 /********************************************************** * Estimation* **********************************************************/ est_lasso, time($time) h(`ahead') restore end /********************************************************** *Section 1: Data Setup **********************************************************/ cap prog drop numcheck // Subroutine 1.1 prog numcheck syntax, unit(varname) time(varname) depvar(varname) /*######################################################### * Section 1.1: Extract panel vars Before SCM can be done, we need panel data. Along with the R package, I'm checking that our main vairables of interest, that is, our panel variables and outcomes are all: a) Numeric b) Non-missing and c) Non-Constant *########################################################*/ cls di as txt "{hline}" di as txt "Algorithm: Synthetic LASSO" di as txt "{hline}" di as txt "First Step: Data Setup" di as txt "{hline}" di as txt "Checking that setup variables make sense." tempvar obs_count by `unit' (`time'), sort: gen `obs_count' = _N qui su `obs_count' qui drop if `obs_count' < `r(max)' foreach v of var `unit' `time' `depvar' { cap { conf numeric v `v', ex as !mi(`v') qui: su `v' as r(sd) ~= 0 } } if !_rc { disp as res "Setup successful. All variables `unit' (ID), `time' (Time) and `depvar' (Outcome) pass." di as txt "" disp as res "All are numeric, not missing and non-constant." } else if _rc { disp as err "All variables `unit' (ID), `time' (Time) and `depvar' must be numeric, not missing and non-constant." exit 498 } end cap prog drop treatprocess // Subroutine 1.2 prog treatprocess syntax, treat(varname) time(varname) unit(varname) /*######################################################### * Section 1.2: Check Treatment Variable Before SCM can be done, we need the treatment to make sense. Making sense is simply the treatment being non-missing and being either 0 or 1 *########################################################*/ di as txt "{hline}" di as txt "Checking that the treatment makes sense..." di as txt "{hline}" cap { cap as !mi(`treat') & inlist(`treat',0,1) qui: su `time' if `treat' ==1 gl int_date = r(min) loc last_date = r(max) su `unit' if `treat' ==1, mean loc treat_id: disp `r(mean)' } if !_rc { su $panel if `treat' ==1, mean loc clab: label ($panel) `r(mean)' gl treat_lab: disp "`clab'" qui: levelsof $panel if `treat' == 0 & $time > $int_date, l(labs) local lab : value label $panel foreach l of local labs { local all `all' `: label `lab' `l'', } loc controls: display "`all'" disp as res "The intervention variable `treat' passes. Continue." di as txt "" disp as res "Intervention is measured between $int_date to `last_date'" qui: distinct `unit' if `treat' == 0 loc dp_num = r(ndistinct) - 1 as `dp_num' > 2 di as res "{hline}" di as res "Treated Unit: $treat_lab" di as txt "" di as res "Control Units: `dp_num' total donor pool units" di as txt "" di as res "Specifically: `controls'" } end cap prog drop data_org // Subroutine 1.3 prog data_org syntax, time(varname) depvar(varname) unit(varname) treat(varname) /*######################################################### * Section 1.3: Reorganizing the Data into Matrix Form We need a wide dataset to do what we want. *########################################################*/ di as txt "{hline}" di as txt "Reshaping Data..." su `unit' if `treat' ==1, mean loc treat_id: disp `r(mean)' keep `unit' `time' `depvar' qui: greshape wide `depvar', j(`unit') i(`time') di as txt "Done" qui: tsset `time' order `depvar'`treat_id', a(`time') end /********************************************************** *Section 2: Estimating Causal Effects **********************************************************/ cap prog drop est_lasso // Subroutine 2.1 prog est_lasso syntax, time(varname) h(numlist min=1 max=1 >=1 int) di as txt "{hline}" di as txt "Second Step: Estimation" di as txt "{hline}" qui: ds loc temp: word 1 of `r(varlist)' loc time: disp "`temp'" loc t: word 2 of `r(varlist)' loc treated_unit: disp "`t'" loc a: word 3 of `r(varlist)' loc donor_one: disp "`a'" local nwords : word count `r(varlist)' loc b: word `nwords' of `r(varlist)' loc last_donor: disp "`b'" di as txt "Estimating... This could take a while..." cvlasso `treated_unit' `donor_one'-`last_donor' if `time' < $int_date, lse lglmnet roll h(`h') qui{ cap drop cf predict double cf, lopt } qui: keep `time' `treated_unit' cf g relative = `time'- $int_date g diff = `treated_unit'- cf sa "scul_$treat_lab", replace qui: esize unpaired `treated_unit' == cf if relative < 0, cohensd loc D: disp float(`r(d)') qui: su diff if relative > 0, mean loc ATT: disp %9.4g `r(mean)' tw /// (line `treated_unit' `time', lcol(black)) /// Real Outcomes (line cf `time', lcol("237 41 57")), /// Potential Outcomes xli($int_date, lcol("152 152 152")) /// scheme(white_tableau) /// legend(order(1 "$treat_lab" 2 "SCUL $treat_lab") /// color(black) fcolor(white) region(fcolor(white)) position(6) rows(1)) /// yti("$outlab") /// ylab(, noticks nogrid) xlab(, noticks nogrid) name("Differences_1") qui su relative loc time_max = `r(max)' qui su diff loc effmin = `r(min)' loc effmax = `r(max)' twoway (line diff relative, lcol(black)), xli(-1, lpat(solid) lcol(black)) yli(0, lcol(red)) /// ylab(, noticks nogrid) /// xlab(, noticks nogrid) /// scheme(white_tableau) /// legend(off) /// yti("Treatment Effect") /// xti("Relative Event Time, $treat_lab") name("Differences_2") /// note("Cohen's D: `D'" "ATT: `ATT'", position(3)) macro drop treat_lab outlab int_date end
The code which runs the estimator itself for Proposition 99's dataset (publicly available from GitHub) is
Code:
import delim "https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv", clear egen id = group(state) // makes a unique ID cap which labmask if _rc { qui net inst gr0034.pkg, replace } labmask id, values(state) xtset id year, y // We now have yearly panel data lab var packspercapita "Packs Sold per 100k" cls scul packspercapita, tr(treated) ahead(5)