'SCUL': module to implement regularized synthetic control (using LASSO) estimators for single and multiple-treated unit settings

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#1

'SCUL': module to implement regularized synthetic control (using LASSO) estimators for single and multiple-treated unit settings

26 Jul 2022, 13:16

Thanks to Kit Baum, my first command scul is now available on SSC! Before I go over a little of what it can do, I want to thank Andrew Musau, FernandoRios , Bjarte Aagnes , daniel klein , Damian Clarke and many others who provided technical or substantive feedback over the course of me writing this.

SCUL stands for Synthetic Controls Using LASSO, but the name is sort of a misnomer- it is, in fact, now a very robust command which fits a generalized class of elastic net models for causal inference. Let's look at some examples. We can begin with the classic 2003 example of terrorism in the Basque Country, where we study how terrorist attacks in the Basque Country affected GDP per capita. Note that you must, in addition to the ones I check for, have tabstatmat installed to maximize the use of scul.

Code:

ssc inst scul, replace webuse set http://fmwww.bc.edu/repec/bocode/s/ webuse scul_basque, clear cls scul gdp, ahead(3) /// trdate(1975) /// trunit(5) /// lamb(lopt) /// obscol(black) /// cfcol(red) /// legpos(4)

Compared the the original SCM method, SCUL gets better pre-intervention fit (marginally), and, more importantly, does not need covariates to obtain the similar pre-intervention fit. It also obtains a very similar average treatment effect, and even selects the same donors as the original synthetic control estimator, meaning we can do causal inference in settings where we do not adjust for additional predictors.

What about settings where we don't both don't have additional predictors, and there's no easy comparison unit of interest? The original case study of Prop 99 using SCM and compared California to 38 states which did not do an anti-tobacco program. However, we could argue (as I do) that this may, in this instance, be a little absurd. California is humongous, and would be the 7th largest economy in the world were it a nation of its own. What if instead of comparing it to states, we compare it to the mainland divisions of the United States, which are comparable in population and/or size to California?

Code:

webuse scul_p99_region, clear cls scul cigsale, /// ahead(1) /// trdate(1989) /// trunit(3) /// lamb(lopt) /// obscol(black) /// cfcol(blue) /// legpos(7) q(1) cv(adaptive) //

We see that the pre-intervention fit is 1.64, and the treatment effect is -21.4. The original fit was about 1.75 and the original treatment effect adjusted for 4 covariate predictors of smoking rates per capita, returning a treatment effect of around -19. As above, we obtain very similar pre-intervention fit, as well as a similar effect even though I don't use (and don't need to use) additional covariate predictors of smoking rates.

scul also works when we have multiple treated units that are treated across multiple points in time, something which has only recently been addressed in the SCM literature. Note that the treatment must be once-treated, always treated. To those of us who live in the United States and have high gas prices, this case study may amuse you. Georgia, Connecticut, and Maryland all passed gas tax holidays recently. To see how these tax holidays impacted prices, we do

Code:

webuse set http://fmwww.bc.edu/repec/bocode/g/ webuse Gas_Holiday, clear loc int_time = td(24mar2022) // td(18mar2022) MD // td(24mar2022) GA // td(02apr2022) CT cls scul regular, /// ahead(28) /// trdate(`int_time') /// trunit(11) /// lamb(lopt) /// obscol(black) /// cfcol(red) /// legpos(7) /// before(28) after(28) /// multi tr(treat) /// donadj(et) /// intname("Gas Holiday") /// rellab(-28(7)28) cv(adaptive)

Where we generated the average treatment effects on the treated for all treated units, which is balanced in event-time one month pre and post the gas tax holiday.

This is a little of what it can do. Please, do post here if you notice any bugs or suggestions, and please do ask any questions you might have (I've already noticed a few bugs!). Happy causal inferenc-ing.
Tags: None

4 likes
Maxence Morlet

Join Date: Mar 2021

Posts: 648
#2

27 Jul 2022, 01:21

Congratulations Jared! I'm impatient to find a dataset on which to apply scul.
Comment

Luis Pecht

Join Date: May 2017
Posts: 146

27 Jul 2022, 18:16

Thanks Jared for this addittion.

Using the following simple toy dataset

Code:

clear
input byte(y3_1 s t d)
10  1  1 0
10  1  2 0
10  1  3 0
10  1  4 0
10  1  5 0
10  1  6 0
13  1  7 1
13  1  8 1
13  1  9 1
13  1 10 1
10  2  1 0
10  2  2 0
10  2  3 0
10  2  4 0
10  2  5 0
10  2  6 0
10  2  7 1
10  2  8 1
10  2  9 1
10  2 10 1
10  3  1 0
10  3  2 0
10  3  3 0
10  3  4 0
10  3  5 0
10  3  6 0
10  3  7 1
10  3  8 1
10  3  9 1
10  3 10 1
10  4  1 0
10  4  2 0
10  4  3 0
10  4  4 0
10  4  5 0
10  4  6 0
10  4  7 0
10  4  8 0
10  4  9 0
10  4 10 0
10  5  1 0
10  5  2 0
10  5  3 0
10  5  4 0
10  5  5 0
10  5  6 0
10  5  7 0
10  5  8 0
10  5  9 0
10  5 10 0
10  6  1 0
10  6  2 0
10  6  3 0
10  6  4 0
10  6  5 0
10  6  6 0
10  6  7 0
10  6  8 0
10  6  9 0
10  6 10 0
10  7  1 0
10  7  2 0
10  7  3 0
10  7  4 0
10  7  5 0
10  7  6 0
10  7  7 0
10  7  8 0
10  7  9 0
10  7 10 0
10  8  1 0
10  8  2 0
10  8  3 0
10  8  4 0
10  8  5 0
10  8  6 0
10  8  7 0
10  8  8 0
10  8  9 0
10  8 10 0
10  9  1 0
10  9  2 0
10  9  3 0
10  9  4 0
10  9  5 0
10  9  6 0
10  9  7 0
10  9  8 0
10  9  9 0
10  9 10 0
10 10  1 0
10 10  2 0
10 10  3 0
10 10  4 0
10 10  5 0
10 10  6 0
10 10  7 0
10 10  8 0
10 10  9 0
10 10 10 0
end

SDID (from ssc) reports as expected

Code:

. sdid y3_1 s t d, vce(bootstrap)
Bootstrap replications (50). This may take some time.
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................     50


Synthetic Difference-in-Differences Estimator

-----------------------------------------------------------------------------
        y3_1 |     ATT     Std. Err.     t      P>|t|    [95% Conf. Interval]
-------------+---------------------------------------------------------------
   treatment |   1.00000    0.87317     1.15    0.252    -0.71138     2.71138
-----------------------------------------------------------------------------
95% CIs and p-values are based on Large-Sample approximations.
Refer to Arkhangelsky et al., (2020) for theoretical derivations.

however SCUL (from ssc), does not.

Code:

. scul y3_1, ahead(3) trdate(7) trunit(1) lamb(lopt)
-------------------------------------------------------------------------------------------------------------------------------------------------------
Algorithm: Synthetic LASSO, Single Unit Treated
-------------------------------------------------------------------------------------------------------------------------------------------------------
First Step: Data Setup
-------------------------------------------------------------------------------------------------------------------------------------------------------
Checking that setup variables make sense.
Setup successful!! All variables s (ID), t (Time) and y3_1 (Outcome) pass.

All are numeric, not missing and non-constant.
-------------------------------------------------------------------------------------------------------------------------------------------------------
Making our treatment variable...
-------------------------------------------------------------------------------------------------------------------------------------------------------
2 invalid name
3 invalid name
4 invalid name
5 invalid name
6 invalid name
7 invalid name
8 invalid name
9 invalid name
10 invalid name

Treatment is measured from        7 to       10
-------------------------------------------------------------------------------------------------------------------------------------------------------
               Treated Unit: 1
-------------------------------------------------------------------------------------------------------------------------------------------------------

               Control Units: 9 total donor pool units
-------------------------------------------------------------------------------------------------------------------------------------------------------

               Specifically: , , , , , , , , ,
-------------------------------------------------------------------------------------------------------------------------------------------------------
Second Step: Data Reorganizing
-------------------------------------------------------------------------------------------------------------------------------------------------------
Reshaping...
Done!

-------------------------------------------------------------------------------------------------------------------------------------------------------
Third Step: Estimation
-------------------------------------------------------------------------------------------------------------------------------------------------------
Optimizing with LASSO... This could take quite a while...


>1 invalid name
r(198);

Am I using it properly?

Luis
Stata 17.1

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

27 Jul 2022, 20:43

You have multiple treated units, but this is not the biggest problem. We have 10 time periods here. Not good: the model can't learn the pre-intervention DGP at all, not a chance, there's no conceivable way it'll walk forward 3 steps in any kind of way that makes sense. i suspect LASSO can't even begin to estimate the model because the pre-period is as good as non-existent for all practical purposes. Let me offer a different simulation, if I may (note I jut came up with this 2 minutes ago).

Code:

clear* set obs 100 egen id = seq(), f(1) t(20) expand 10 // time periods 5*10=50 time periods each qbys id : g time = _n qui su time loc time_periods = r(max) loc int_date = `time_periods'/2 set seed 1000 qbys id: g y = .3*mod(time,`time_periods'+1)-mod(time,10)*sin(time/_pi)+mod(time,10)*cos(time/_pi) replace y = y+rnormal(0, 2) if id !=10 // donors, noise xtset id time, g cls scul y, ahead(5) trdate(`int_date') /// trunit(10) lamb(lopt) scheme(gg_tableau) /// cv(adaptive) obscol(black) /// cfcol(blue) legpos(5) g treat = cond(id == 10 & time >= 50,1,0) sdid y id time treat, vce(placebo)

Here, we actually have a decent pre-intervention period. I recommend a pre-intervention period of at least 10 periods, ideally more. SCUL is predicated on cross validation, time-series cross validation to be precise, so any pre-period with less than 10 is almost guaranteed to not work simply because of the mechanics of the estimator itself. Here we have what, 7? The above example simulates a better scenario. Here, there's no treatment effect, so ideally any estimator we use should not detect one. SCUL gets an ATT closer to 0, SDID here gets an ATT of -0.88709. Still pretty close, not the end of the world.

I emphasize it much more in my paper on the subject (which I'll gladly send you if you want) that for scul to work, many time periods are needed. In the examples above, we have 20/19 pre intervention periods, much more with the gas holiday dataset. i.e., a much more realistic application for this kind of causal analysis. Luis Pecht
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#5

27 Jul 2022, 21:24

Consider a slightly different application. Say we wanna predict the sales of Walmart departments over a time period- note that this is much like a simulation, where no intervention is introduced (far as I know), and thus no practical effect size should be implied by our estimator.

Code:

import delim "https://raw.githubusercontent.com/SethiNik/Walmart-Store-sales-Forecasting/master/data/train.csv", clear cls keep if inrange(store,1,5) g week = wofd(date(date, "YMD")), a(store) drop date rename week date format %tw date tempvar dup storename id2 duplicates tag store dep date, generate(`dup') drop if `dup'==1 g `storename' = "Department " egen id = group(store dept) egen `id2' = concat(`storename' id) replace weekly = weekly /1000 labvars we date "Weekly Sales (in Thousands)" "Week" labmask id, values(`id2') xtset id date, w cls loc int_time = tw(2010w30) loc unit = 5 cls scul weekly, /// ahead(3) /// trdate(`int_time') /// trunit(`unit') /// lambda(lopt) /// intname(Treatment) /// cv(adaptive) /// q(1) obscol("0 123 139") /// cfcol("255 132 116") /// scheme(white_tableau)

Note my use of labmask and labvars. Anyways, I picked the 5th department in the sample to be the target unit. We construct the counterfactual prediction from 316 donors- scul selects a very sparse set of donors, in fact it selects 3. The predictions are imperfect, but if I were the store manager and the store's economist came to me with predictions like this, I'd keep them on the team. scul pretty much predicts the sales within a few thousands of dollars of error, which is pretty negligible since these places do 80 90 grand a week. It even predicts pretty well seasonal fluctuations (i.e., Christmas) and subtle seasonal variations, even though it hasn't seen these data yet. The point I'm trying to make here is that so long as you have suitable/similarly sufficient donors in the donor pool and a decent pre-intervention period, your predictions will (in general, crazy things happen sometimes) be pretty sensible.
Comment
Joe Zonda

Join Date: Jan 2021

Posts: 64
#6

10 Aug 2022, 03:36

Jared Greathouse, I attempted to use scul. It looks cool.

However, I have the following questions:
(1). If I wanted to use the pre-treatment values of the endogenous variable, how would I specify it? Or does scul natively use the entire pre-treatment values of the depvar?
(2). I noticed that after the command, there is no data saved in any matrix or scalar nor is the data in memory changed to contain. How can I access the resulting data e.g. the treated and synthetic units or effect (the difference between the two)?
-- I particularly ask because I might use the results as a robustness for the classic synth results and might want to plot them on a single graph so as to visually depict the difference/similarities. or indeed, if I would like to modify the look of the graph components say the confidence bands.

(3). Minor issue: the option " squerr" (as written in help) seems to have a typo... It should be " sqerr". I hope the next version will rectify this.

Last edited by Joe Zonda; 10 Aug 2022, 03:40.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#7

10 Aug 2022, 06:22

I'll respond in detail when I'm home, I just got back to Atlanta from Puerto Rico (super long day yesterday!!!!) so I'll return here to this post in an hour.
2 likes
Comment
Joe Zonda

Join Date: Jan 2021

Posts: 64
#8

10 Aug 2022, 09:04

Alright, will wait.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#9

10 Aug 2022, 10:01

I attempted to use scul. It looks cool.

Thank you! Okay so let's get cracking. Consider the West Germany example.

Code:

cd E:\Test\sculex u "http://fmwww.bc.edu/repec/bocode/s/scul_Reunification.dta", clear cls scul gdp, /// ahead(5) /// trdate(1990) /// trunit(7) /// scheme(gg_tableau) /// lambda(lopt) loc sculmse: di %6.3f e(MSE) conf f "scul_West Germany.dta" cls tempfile adh_germany allsynth gdp gdp(1981/1990) /// infrate(1981/1990) /// trade(1981/1990) /// industry(1981/1990) /// schooling(1980/1985) /// invest60(1980/1985) /// invest70(1980/1985) /// invest80(1980/1985), /// trperiod(1990) /// trunit(7) /// nested allopt fig keep(`adh_germany') loc scmmse: di %6.3f e(RMSPE)[1,1] u `adh_germany', clear drop _Co_Number _W_Weight rename (_time _Y_s _Y_treated) (year cf_adh real) sa `adh_germany', replace u `adh_germany', clear qui mer 1:1 year using "scul_West Germany", keepusing(cf) nogen labvars real cf_adh cf year "West Germany" "ADH (2015)" "SCUL West Germany" "Year" cls tsset year, y twoway (tsline real, lcolor(black) lwidth(thick)) /// (tsline cf_adh, lcolor(blue)) /// (tsline cf, lcolor(red)), /// legend(ring(0) pos(9) region(fcol(none))) /// tli(1990, lwidth(thick) lcol(gs11) lpat(solid)) /// yti(GDP per Capita) xsize(4) ysize(4) /// caption("SCUL MSE: `sculmse', ADH MSE: `scmmse'.")

I made a new directory, and the

Code:

conf f "scul_West Germany.dta"

command checks to see if the file is within your working directory (which it should be, post estimation). Every single time you run SCUL, you'll have a graph and a dataset to work with. It should be in your current working directory (and there's no situation where this should not happen, you'll always get a dataset in return). Next I use normal SCM, and after a little more manipulation, I merge them all into one file such that we can compare the effect sizes produced by each estimator. My graph gives their pre-intervention trend RMSPE (I could do the ATT, but let's keep it simple). Both perform pretty well, SC ADH gets better pre-fit, but I don't really care since mine gets slightly worse fit (even though SCUL only uses outcome data), and a very similar treatment effect as we see with the effect sizes (the lines, of course). You can use the return list and ereturn list commands to get things like the ATT for SCUL (as well as its upper and lower bounds), the number of donors (as well as the specific donor IDs), amongst other things cvlasso provides and useful thing I code for.

To directly answer you, the entire pre-period is specified by default. Now I think I'll say something that might get me in trouble, but I'll say it because I believe it's the truth: specifying lags is stupid, and it's ALWAYS been stupid with SCM studies. To be clear, I'm not saying you are or the authors are stupid, I mean the concept is ill posed, at best. Why? Even if there's a mechanical reason for including lags of the outcome (which there is, as normal SCM matches on averages), my biggest issue with it is that it's all arbitrary. Take the GDP lags above, or those for investment. Why did I specify those? I don't know, ADH never say why, that's just what they used. If an alien from Proxima Centauri D came to me and asked "Why use 1989/1990 and not 1977/1988 for GDP", I'd go "Yeah, why not, it's not like there's some objective formula that exists to choose where we lag what!". SCUL is intended to take away this arbitrariness. No specification of lags, no silliness-- you could choose 20 different lag specifications, especially as the time series increases in length, and there'd be no metric you could use to say why you chose those. So no, this'll never be a feature you need to work with. look at some newer SCM methods, they sort of take a similar approach as the one I do here- no lags, just lots of machine-learning ninjutsu, to simplify slightly.

Also, thanks for pointing out the typo. Any further bugs or suggestions, from anyone, I'd like to hear of them. I already have some pretty good quality of life edits for SCUL in mind, so this sort of feedback is great. Joe Zonda
Comment
Joe Zonda

Join Date: Jan 2021

Posts: 64
#10

10 Aug 2022, 11:58

Perfect, got the data saved in the working directory. Didn't bother to check the working directory at first because no where in the help (current version) does it say that the data is saved as such. So, was a little confused. Also, I appreciate the clarification that scul indeed takes the entire pre-treatment depvar for predictors.
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

#11

21 Aug 2022, 06:42

Thanks to Dr. Baum, I've updated scul pursuant to some popular demands I've gotten. Before I give examples, I'll go over the changes I made. Firstly, I've committed to the flames the trunit and trdate options, and have forced users to specify treatment variables. Looking back at it, I should've done this all along, but that's water under the bridge now. An additional feature I've added is the ability to specify in-time placebos.

Code:

// Prop 99 Division
loc int_time = 1989

u "http://fmwww.bc.edu/repec/bocode/s/scul_p99_region", clear
qui xtset
local lbl: value label `r(panelvar)'

loc unit ="California":`lbl'
qui xtset
g treat = cond(`r(panelvar)'==`unit' & `r(timevar)' >= `int_time',1,0)
cls

scul cigsale, ///
        ahead(1)  ///
        treated(treat) ///
        obscol(black) ///
        cfcol(blue) ///
        legpos(7) cv(adaptive)

Much cleaner than before. Much easier to follow, much more consistent, furthermore, with the multiple treatment setup.

In time placebos

Code:

cls
//West Germany
u "http://fmwww.bc.edu/repec/bocode/s/scul_Reunification.dta", clear
loc int_time = 1990
cls
qui xtset
local lbl: value label `r(panelvar)'


loc unit ="West Germany":`lbl'

g treat = cond(`r(panelvar)'==`unit' & `r(timevar)' >=`int_time',1,0)

// ssc inst labvars
labvars gdp treat "GDP per Capita" "Reunification"

scul gdp, ///
        tr(treat) ///
        ahead(8)  ///
        cfcol(red) obscol(black) cv(adaptive) ///
        legpos(9) plat times(1/2) //

Okay that about does it for now. As usual, any questions, comments, suggestions or otherwise, and you know where to find me.

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

21 Aug 2022, 15:37

The working paper that goes along with the command is here http://ssrn.com/abstract=4196189.

All comments are welcome. I know the reviewers will have some!
Comment

Filippo Santi

Join Date: Apr 2018
Posts: 19

#13

24 Jul 2023, 08:11

Dear Jared Greathouse
,I hoped for a STATA adaptation to the SCUL package by Hollingsworth and Wing. I am using your scul command on a project and it works perfectly for my application. However, I noticed something I do not quite understand. I should get the donor's weights matrix, and I have been able to retrive it as long as I avoid including covariates. Once I include them (and I have good reasons for doing so), the weights matrix does not show up anymore in the output, and even the

Code:

return list

command returns empty. I tried to dig into the ado file, and noticed that the matrix is only formatted if and only if the covariates vector is empty. I am not a pro STATA user, so I wimply tried to comment out the

Code:

if "`covs'" == "" {}

part, without any success in getting the matrix back in the return list.

Thus, I was wondering:

Is there a reason why the Weights matrix is not shown if covariates are included? (I am here assuming that the program still compute the weights, as it would be otherwise impossible to estimate the counterfactual, but I might be missing something)
If there is no reason for the matrix not be built, how can I possibly get it?

Potentially, I could include the weights I get from the estimation I run without the covariates, but in a different subset of my data, the absence of covariates creates some problem, and the command is not able to comput the weights without controls (and the synthetic control is a flat line)

Thank you in advance for any help you could give me. I add below a reproducible example and the code I have been using.

A dataset extract:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str9 ccode int year float(treated ln_TRD_SPX5_to_TOT ln_INTRNL_SRV_TRD ln_TRADE_tot ln_wdi_gdpcap_ipo)
"AUS" 1995 0  .1384325  5.174122  4.982551  9.925627
"AUS" 1996 0 .13957375  5.274744  5.080733  9.999756
"AUS" 1997 0 .14753668  5.351512  5.153032 10.070953
"AUS" 1998 0 .15133615  5.333697  5.111801  9.974849
"AUS" 1999 0 .16142426  5.353427  5.113552  9.937875
"AUS" 2000 0 .16747344  5.408334  5.190883  9.992175
"AUS" 2001 0 .16156487  5.362988  5.152009  9.887493
"AUS" 2002 0  .1574709  5.407364    5.1311  9.917991
"AUS" 2003 0  .1591129  5.596806   5.25467 10.073522
"AUS" 2004 0 .15953654  5.809136  5.470931 10.335952
"AUS" 2005 0  .1531978   5.96485  5.678297 10.447633
"AUS" 2006 0  .1509239  6.058424  5.804719  10.50703
"AUS" 2007 0  .1535222  6.163985  5.952317  10.62193
"AUS" 2008 0 .14661182  6.298547  6.123363  10.81336
"AUS" 2009 0  .1466771   6.32982  6.107611 10.664558
"AUS" 2010 0 .14623652  6.412998  6.177568 10.861598
"AUS" 2011 0 .13916384  6.600785  6.420983  11.04448
"AUS" 2012 0 .13603862  6.700502  6.507198 11.127935
"AUS" 2013 0 .13873579  6.701013  6.493268 11.129607
"AUS" 2014 0 .14271712  6.689445  6.451045 11.043153
"AUS" 2015 0  .1567767  6.607912  6.324774  10.94573
"AUS" 2016 0  .1576011  6.555122   6.26787 10.817307
"AUS" 2017 0 .15448692  6.616724  6.358147 10.895575
"AUS" 2018 0  .1507788   6.62519  6.410253 10.954464
"IND" 1995 0 .15564086 4.3476915 4.4625335  5.925934
"IND" 1996 0 .15428096 4.3895707 4.4779887  5.992908
"IND" 1997 0 .15537778  4.488587  4.586501  6.030442
"IND" 1998 0 .15498887  4.501211  4.638839  6.024681
"IND" 1999 0  .1541304  4.629128 4.7406073  6.091223
"IND" 2000 0  .1509985  4.678391  4.859101  6.093648
"IND" 2001 0 .15280584  4.701492   4.85148   6.11127
"IND" 2002 0  .1539343  4.771554  5.016175  6.152401
"IND" 2003 0 .14965594  4.939606  5.208447  6.300499
"IND" 2004 0 .16004042  5.156138  5.568575   6.43792
"IND" 2005 0 .15291157  5.319071   5.84431  6.567389
"IND" 2006 0 .15628107  5.468663  6.070767  6.688372
"IND" 2007 0 .14934537  5.699933  6.303335   6.93121
"IND" 2008 0 .14633222  5.782196  6.521214  6.902244
"IND" 2009 0  .1540964  5.807845  6.399886  7.000913
"IND" 2010 0 .15291445  6.042837  6.708336   7.20907
"IND" 2011 0 .13843356  6.193277   6.92636  7.279734
"IND" 2012 0 .13775297  6.193655  6.954993  7.268933
"IND" 2013 0 .15850705  6.228973  6.949116  7.271744
"IND" 2014 0 .16727883  6.313966  6.913279  7.352995
"IND" 2015 0  .1909515  6.397838  6.814438  7.372227
"IND" 2016 0 .19866353  6.470286  6.832092  7.447332
"IND" 2017 0 .20114757  6.620467  6.983977  7.580173
"IND" 2018 0  .1968486  6.679762  7.109052  7.588515
"MYS" 1995 0  .3569256 4.1728168  5.046846  8.390749
"MYS" 1996 0  .3533147 4.2978773  5.116061 8.4920435
"MYS" 1997 0  .3577668  4.293026  5.131904  8.456873
"MYS" 1998 0   .334369  3.952908  4.888626  8.104653
"MYS" 1999 0  .3361156  4.060217  5.018161  8.171702
"MYS" 2000 0  .3385835 4.2624655  5.180812 8.3159485
"MYS" 2001 0 .33297285 4.2255073  5.091779  8.279474
"MYS" 2002 0  .3222238 4.2930083  5.148199  8.337613
"MYS" 2003 0   .314206   4.37716  5.204141    8.4019
"MYS" 2004 0  .3054753  4.502559  5.399849  8.502149
"MYS" 2005 0  .2956621 4.6255727  5.506574  8.619357
"MYS" 2006 0  .2746339 4.6583357  5.599312  8.722273
"MYS" 2007 0 .29187807  4.859227  5.746563  8.874159
"MYS" 2008 0 .28374293  5.023409   5.88381  9.029345
"MYS" 2009 0  .3029237   4.94321  5.670465  8.877505
"MYS" 2010 0  .2985265   5.13579  5.875208 9.0916815
"MYS" 2011 0 .28820822  5.260337  6.012776   9.23116
"MYS" 2012 0 .28830087  5.287704  6.027023 9.2688465
"MYS" 2013 0  .2855973  5.299576  6.022099  9.280678
"MYS" 2014 0 .28037688  5.271785  6.038914  9.309864
"MYS" 2015 0 .28866565  5.231025  5.868577  9.179941
"MYS" 2016 0 .29238778  5.252415   5.83251  9.164992
"MYS" 2017 0 .29309598  5.310029  5.941815  9.208419
"MYS" 2018 0 .28922772  5.442111  6.033988  9.312451
"NZL" 1995 1 .13985094  3.318149   3.58293  9.764307
"NZL" 1996 1 .13970208  3.433826 3.6540446   9.84137
"NZL" 1997 1 .14314899  3.453132  3.655085  9.768538
"NZL" 1998 1 .14839382  3.299552  3.512569  9.598283
"NZL" 1999 1 .15019757  3.358965  3.610118  9.637125
"NZL" 2000 1 .14879969  3.267285  3.619312  9.520916
"NZL" 2001 1 .14492182 3.2511106  3.588703  9.538482
"NZL" 2002 1 .14821795  3.406447  3.658946    9.7336
"NZL" 2003 1 .15246516 3.7052276  3.868251  9.994913
"NZL" 2004 1 .15166484  3.913582 4.0814347  10.14334
"NZL" 2005 1  .1524581 4.0329814  4.173696 10.231066
"NZL" 2006 1 .14950614  3.998353   4.17048 10.190754
"NZL" 2007 1 .15151624  4.181711  4.359285  10.38841
"NZL" 2008 1 .15201586  4.199509  4.425223   10.3499
"NZL" 2009 1 .16136934   4.12489 4.1769133 10.247445
"NZL" 2010 1 .15837157   4.28898  4.421518 10.424593
"NZL" 2011 1 .15882704  4.413127 4.5804214 10.555516
"NZL" 2012 1  .1590551 4.4653664 4.5875773 10.595994
"NZL" 2013 1 .15589607 4.5282817 4.6475043 10.668435
"NZL" 2014 1  .1609025  4.589844  4.677648 10.704904
"NZL" 2015 1 .16965294 4.4673977  4.545743  10.56183
"NZL" 2016 1  .1716074  4.525423  4.559406 10.598113
"NZL" 2017 1 .16725087 4.5958996 4.6739635 10.667233
"NZL" 2018 1 .16832873  4.613315  4.717971 10.674786
"PHL" 1995 0 .14371745  3.447858 4.0787807    7.1093
"PHL" 1996 0 .25633624  3.682708  4.302178  7.196766
"PHL" 1997 0 .28799555  3.705791 4.4487867  7.166351
"PHL" 1998 0  .3064651  3.498944  4.177185   6.90876
end

The code I have been using:

Code:

encode ccode, gen(ccode_id)
xtset ccode_id year

qui levelsof YearAdequacy if ccode == "NZL", local(adqtreatment)
qui xtset
local lbl: value label `r(panelvar)'     // Store name of variable
loc unit ="NZL":`lbl'                     // (Gets value assigned to the variable I am interested into: in this case, the one assigned to NZL!)
loc plac ="AUS":`lbl'                     // (Gets value assigned to the variable I am interested into: in this case, the one assigned to NZL!)
di "Variable: `r(panelvar)' - Unit: `unit' - Treatment Year: `adqtreatment' - Placebo: `plac'"
generate treat = cond(`r(panelvar)'==`unit' & `r(timevar)' >=`adqtreatment',1,0)

// HERE I GET THE DONOR WEIGHTS MATRIX

scul     ln_TRADE_serv, ///
        ahead(5) times(3 5) treat(treat)  ///
        legpos(11)  obscol(black) cfcol(blue) 

return list
/*
here the output from return list
macros:
             r(donors) : "2,3,5,6,7"

matrices:
            r(Weights) :  5 x 1
*/

mat list r(Weights)

// HERE I DO NOT GET THE DONOR WEIGHTS MATRIX

scul     ln_TRADE_serv, ///
        covs(ln_TRD_SPX5_to_TOT ln_INTRNL_SRV_TRD ln_TRADE_tot ln_wdi_gdpcap_ipo) ///
        ahead(5) times(3 5) treat(treat)  ///
        legpos(11)  obscol(black) cfcol(blue) 

mat list r(Weights) // r(111) matrix r(Weights) not found

return list // returns empty, not even the r(donors) macro

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#14

24 Jul 2023, 20:33

I'll be super honest, I won't include additional predictors for SCUL when it's finally ready for Stata Journal. No real need for it, usually. I explicitly designed it to not include unit weights when the user specifies predictors... because then I'd need to do bilevel optimization, and I just don't wanna do that.

I'm not saying it's impossible to do the bilevel optimization, I'm just saying I lack the Matas skills to do that! If I'm honest.

The weights are just the LASSO regression coefficients. So, they're still there to impute the counterfactual. But, I just didn't have SCUL report them.

The SC is a flat line? I'm sorry, could you describe your case more? This is most unusual.
Comment

Announcement