Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 95% Confidence/Prediction Intervals for CVLASSO

    I'm writing a synthetic control estimator using LASSO (SCUL) based on the original R code (which we thankfully don't need to get into very much at all). The goal of course is to predict the counterfactual for a given policy. Here I use the pre-intervention outcomes of donor-pool units as predictors of the pre-intervention outcomes for the treated unit. I impute the counterfactual via prediction, and graph the observed versus potential outcome over time, much as we currently do in synthetic control research. The sub-routine of interest in my command is the est_lasso routine, which uses the user-written cvlasso command from the lassopack

    I'd like to generate prediction/confidence intervals and graph these as part of the Differences_1 graph. The utility of this is that such intervals allow myself and others to see how uncertain the estimates themselves are. However, doing this would demand a standard error, which doesn't appear to be available in this context.

    Does anyone have any immediate reaction as to how I might do this?

    Let's look at my full program (don't worry, I install all user written commands I use and so on, so it should work the first time).
    Code:
    *****************************************************
    version 14 // Stata SE
    set more off
    *****************************************************
    
    * Programmer: Jared A. Greathouse
    
    * Institution:    Georgia State University
    
    * Contact:         [email protected]
    
    * Created on : Jan 2, 2022
    
    * Contents: 1. Purpose
    
    *  2. Program Versions
    
    *****************************************************
    
    * 1. Purpose
    
    /* */
    
    
    * 2. Program
    
    cap prog drop scul // Drops previous iterations of the program
    
    *! SCUL v1.0.0, Jared Greathouse, 1/2/22
    prog scul
        graph close _all
        graph drop _all
        
        * Installs needed commands
        loc package st0594 gr0034 dm0042_3
        
        foreach x of loc package { // begin foreach
    
            qui: cap which cvlasso
        
                if _rc { // if command is missing
    
                qui: net inst `x'.pkg, replace
            
                } // ends if
        } // ends foreach
        
        loc comm gtools labvars schemepack
        
        foreach x of loc comm { // begin foreach
    
            qui: cap which `x'
        
                if _rc { // if command is missing
    
                qui: ssc inst `x', replace
            
                } // ends if
        } // ends foreach
        
    qui: xtset
    gl time: disp "`r(timevar)'"
    
    gl panel: disp "`r(panelvar)'"
    
    
        syntax anything, ///
            TReated(varname) /// We need a treatment variable as 0 1
            ahead(numlist min=1 max=1 >=1 int) // Number of forecasting periods. 
            
    gettoken depvar anything: anything
    
    local y_lab: variable label `depvar'
    
    gl outlab: disp "`y_lab'"
    
            
    /**********************************************************
    
        
        
        * Pre-Processing*
    
    
       
    **********************************************************/
    preserve
    
    numcheck, unit($panel) time($time) depvar(`depvar') // Routine 1
    
    treatprocess, treat(`treated') time($time) unit($panel) // Routine 2
    
    data_org, unit($panel) time($time) depvar(`depvar') treat(`treated') // Routine 3
    
    
    
    /**********************************************************
    
        
        
        * Estimation*
    
    
       
    **********************************************************/
    
    
    est_lasso, time($time) h(`ahead')
    
    
    restore
    end
    
    
    /**********************************************************
    
        *Section 1: Data Setup
        
    **********************************************************/
    
    cap prog drop numcheck // Subroutine 1.1
    prog numcheck
        
    syntax, unit(varname) time(varname) depvar(varname)
        
            
    /*#########################################################
    
        * Section 1.1: Extract panel vars
    
        Before SCM can be done, we need panel data.
        
        
        Along with the R package, I'm checking that
        our main vairables of interest, that is,
        our panel variables and outcomes are all:
        
        a) Numeric
        b) Non-missing and
        c) Non-Constant
        
    *########################################################*/
    cls
    di as txt "{hline}"
    di as txt "Algorithm: Synthetic LASSO"
    di as txt "{hline}"
    di as txt "First Step: Data Setup"
    di as txt "{hline}"
    di as txt "Checking that setup variables make sense."
    
    tempvar obs_count
    by `unit' (`time'), sort: gen `obs_count' = _N
    qui su `obs_count'
    
    qui drop if `obs_count' < `r(max)'
    
    
        foreach v of var `unit' `time' `depvar' {
        cap {    
            conf numeric v `v', ex
            
            as !mi(`v')
            
            qui: su `v'
            
            as r(sd) ~= 0
        }
        }
        if !_rc {
            
            
            disp as res "Setup successful. All variables `unit' (ID), `time' (Time) and `depvar' (Outcome) pass."
            di as txt ""
            disp as res "All are numeric, not missing and non-constant."
        }
        
        else if _rc {
            
            
            
            disp as err "All variables `unit' (ID), `time' (Time) and `depvar' must be numeric, not missing and non-constant."
            exit 498
        }
    
    end
            
    cap prog drop treatprocess // Subroutine 1.2
    prog treatprocess
        
    syntax, treat(varname) time(varname) unit(varname)
    
    /*#########################################################
    
        * Section 1.2: Check Treatment Variable
    
        Before SCM can be done, we need the treatment to make sense.
        
        
        Making sense is simply the treatment being non-missing
        and being either 0 or 1
    *########################################################*/
    
    di as txt "{hline}"
    di as txt "Checking that the treatment makes sense..."
    di as txt "{hline}"
    
    cap {
    cap as !mi(`treat') & inlist(`treat',0,1)
    
    qui: su `time' if `treat' ==1
    
    gl int_date = r(min)
    loc last_date = r(max)
    
    su `unit' if `treat' ==1, mean
    
    loc treat_id: disp `r(mean)'
    }
    
        if !_rc {
            
            su $panel if `treat' ==1, mean
            loc clab: label ($panel) `r(mean)'
            gl treat_lab: disp "`clab'"
            
            
            qui: levelsof $panel if `treat' == 0 & $time > $int_date, l(labs)
    
            local lab : value label $panel
    
            foreach l of local labs {
                local all `all' `: label `lab' `l'',
            }
    
            loc controls: display "`all'"
            
            
            disp as res "The intervention variable `treat' passes. Continue."        
            di as txt ""
            disp as res "Intervention is measured between $int_date to `last_date'"
            
            qui: distinct `unit' if `treat' == 0
            
            loc dp_num = r(ndistinct) - 1
            
            as `dp_num' > 2
            di as res "{hline}"
            di as res "Treated Unit: $treat_lab"
            di as txt ""
            di as res "Control Units: `dp_num' total donor pool units"
            di as txt ""
            di as res "Specifically: `controls'"
    
        }
    
    
    
    end
    
    cap prog drop data_org // Subroutine 1.3
    prog data_org
        
    syntax, time(varname) depvar(varname) unit(varname) treat(varname)
    
    /*#########################################################
    
        * Section 1.3: Reorganizing the Data into Matrix Form
    
        We need a wide dataset to do what we want.
    *########################################################*/
    
    di as txt "{hline}"
    di as txt "Reshaping Data..."
    
    su `unit' if `treat' ==1, mean
    
    loc treat_id: disp `r(mean)'
    
    
    keep `unit' `time' `depvar'
    
    qui: greshape wide `depvar', j(`unit') i(`time')
    di as txt "Done"
    qui: tsset `time'
    
    order `depvar'`treat_id', a(`time')
    
    end
    
    
    /**********************************************************
    
        *Section 2: Estimating Causal Effects
        
    **********************************************************/
    
    cap prog drop est_lasso // Subroutine 2.1
    prog est_lasso
        
    syntax, time(varname) h(numlist min=1 max=1 >=1 int)
    di as txt "{hline}"
    di as txt "Second Step: Estimation"
    di as txt "{hline}"
    
    
    qui: ds
    
    loc temp: word 1 of `r(varlist)'
    
    loc time: disp "`temp'"
    
    loc t: word 2 of `r(varlist)'
    
    loc treated_unit: disp "`t'"
    
    loc a: word 3 of `r(varlist)'
    
    loc donor_one: disp "`a'"
    
    local nwords :  word count `r(varlist)'
    
    loc b: word `nwords' of `r(varlist)'
    
    loc last_donor: disp "`b'"
    
    di as txt "Estimating... This could take a while..."
    
    cvlasso `treated_unit' `donor_one'-`last_donor' if `time' < $int_date, lse lglmnet roll h(`h')
    
    qui{
    cap drop cf
    predict double cf, lopt
    }
    
    qui: keep `time' `treated_unit' cf
    
    g relative = `time'- $int_date
    
    g diff = `treated_unit'- cf
    
    sa "scul_$treat_lab", replace
    
    qui: esize unpaired `treated_unit' == cf if relative < 0, cohensd
    
    loc D: disp float(`r(d)')
    
    qui: su diff if relative > 0, mean
    
    loc ATT: disp %9.4g `r(mean)'
    
    
    tw ///
        (line `treated_unit' `time', lcol(black)) /// Real Outcomes
        (line cf `time', lcol("237 41 57")), /// Potential Outcomes
            xli($int_date, lcol("152 152 152")) ///
            scheme(white_tableau) ///
            legend(order(1 "$treat_lab" 2 "SCUL $treat_lab") ///
            color(black) fcolor(white) region(fcolor(white)) position(6) rows(1)) ///
            yti("$outlab") ///
            ylab(, noticks nogrid) xlab(, noticks nogrid) name("Differences_1")
        
    qui su relative
    
    loc time_max = `r(max)'
    
    qui su diff
    
    loc effmin = `r(min)'
    
    loc effmax = `r(max)'
        
    twoway (line diff relative, lcol(black)), xli(-1, lpat(solid) lcol(black)) yli(0, lcol(red)) ///
        ylab(, noticks nogrid) ///
        xlab(, noticks nogrid) ///
        scheme(white_tableau) ///
        legend(off) ///
        yti("Treatment Effect") ///
        xti("Relative Event Time, $treat_lab") name("Differences_2") ///
        note("Cohen's D: `D'" "ATT: `ATT'", position(3)) 
    
    macro drop treat_lab outlab int_date
    end
    with the code of interest here being the very last routine. Again, all I'd like to do here is simply generate uncertainty estimates for the counterfactual.

    The code which runs the estimator itself for Proposition 99's dataset (publicly available from GitHub) is
    Code:
    import delim "https://raw.githubusercontent.com/synth-inference/synthdid/master/data/california_prop99.csv", clear
    
    egen id = group(state) // makes a unique ID
    
    
    cap which labmask
    if _rc {
        
        qui net inst gr0034.pkg, replace
    }
    labmask id, values(state)
    
    xtset id year, y // We now have yearly panel data
    
    lab var packspercapita "Packs Sold per 100k"
    cls
    
    scul packspercapita, tr(treated) ahead(5)
    I presume there's a way to do this. Any idea how I'd begin?
Working...
X