Implementing an Iterative Loop While Updating What We're Looping Over

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

Implementing an Iterative Loop While Updating What We're Looping Over

11 Jul 2024, 06:19

Hey everyone. Suppose we're interested in implementing an algorithm which selects the "optimal" set of controls for a certain treated unit for policy analysis. Take the dataset below (by the way I'm working with Stata 17)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(hongkong australia newzealand austria canada denmark)
 .062   .04048913   .04724391  -.01308351   .01006395  -.01229182
 .059   .03785692   .03875869 -.007580798   .02126387 -.003092842
 .058   .02250948   .08991753  .000542671  .018919427 -.007764421
 .062   .02874655   .06975085  .001180751   .02531683 -.004048589
 .079   .03399039   .06019911   .02551085   .04356715    .0310944
 .068   .03791937   .06255518  .019941313   .05022538      .06428
 .046   .05228941   .04292477  .017087875   .06512183   .04595546
 .052  .031070895   .04760897  .023035197   .06733068   .05516641
 .037  .008696091  .022149924  .025292696    .0509212   .04805718
 .029  .006773674  .023302693  .021849955   .03152506  .011953605
 .012   .00302829   .03045487  .018319173  .018179957   .02080968
 .015  .010981606   .03069211   .01345693  .015165864  .008303516
 .025   .03818205   .04089288  .015387368  .007820651  .010101924
 .036   .03452006   .03584749  .017335817  .011510062   .03085883
 .047   .03667319   .03766511  .013595447   .02166007   .04011321
 .059   .03898745  .025593406  .004195063   .02470353   .02569367
 .058  .025748033  .005592703 -.001534697  .035775363  .029461836
 .072   .05186257  .034571752 -.002021203   .03868772    .0398559
 .061   .05928891   .03353216    .0161481   .03672911   .01764238
 .014   .06342068  .018211482  .017984418    .0380428   .03139775
-.032   .06243018  .018747104   .03047369  .033733726  .033021096
-.061   .03949186 -.017352613  .032317627  .028866187 -.010453694
-.081   .04318146 -.019380176   .02719671    .0189473   .02242801
-.065  .035846256   .02692859   .02283163   .02248398   .01477213
-.029       .0409   .03889057   .02242066   .03785257  .003396192
 .005   .03857499   .05541473  .028941363   .04818999  .029012986
 .039   .02692009   .06664848   .03839999  .063958384  .010058184
 .083   .03297373   .04101127   .03883952   .06480881  .026324544
 .107   .03959653   .06054202  .036271237   .06725128   .03857407
 .075   .04687658  .034375284  .030074896   .07295554  .030016804
 .076  .024391215    .0207593  .016178379  .065640524   .03717389
 .063  .005838784   .03254855  .011539886   .05321779  .036737546
 .027  .000732203  .015890202  .005441154   .04056079   .01424755
 .015 -.002025588   .05496934 -.005665725  .007522337  .007846387
-.001  .028212607  .036681067 -.004335946 -.017121905   .01250594
-.017   .03998247   .04987184 -.000616657 -.014710952  -.00049339
 -.01   .03867529   .05044868  .003416942 -.011813972  -.00188065
 .005   .04083484  .011498967  .010456725   .01313668   .01491368
 .028  .032215476   .04828115   .01278433    .0296833  .002654192
 .048   .03258446  .015707554  .010514015   .03784149 -.001912172
 .041   .02694338   .02053765  .007017912   .03301423  .005125017
-.009   .02464757    .0400604  .008199689  .015919374  -.01868332
 .038   .03993078   .04220926  .005746135  .025793217 -.005911494
 .047   .05529072   .05431711  .008809593   .02063149  .017165432
 .077   .05989317  .066612795  .013646583   .02710546   .02884403
  .12   .05748491   .06818556  .017229607   .04896003   .03726172
 .066   .04655641   .05111898   .02444402   .05043372  .036128163
 .079  .030096613   .04093498   .02429197   .04781632   .03432884
 .062   .03160237   .01501046   .02370696   .03985261  .020981267
 .071   .04588263   .02387351    .0257305   .03162305   .05209525
 .081   .04553443  .011341385  .026474627   .03574734   .04374102
 .069   .05498263  .008914476  .032616325    .0503335   .02875244
  .09   .04806678  .018647296   .03832044   .04947587   .04931609
 .062   .02698179 -.009260945  .035103742   .04119911   .03880127
 .064  .032730877  .011500795   .03722008  .031677015   .04183601
 .066   .03857545  .036755715   .03898238    .0200051   .02980916
 .055    .0580129   .03946248  .036197655   .03071206  .033133514
 .062   .05951871   .05829326   .03257025   .03982709 -.007168933
 .068   .05664859   .05114701   .03155845   .03474158  .013516975
 .069   .04582468   .04590492   .01909501   .03812844   .02379412
 .073  .027523303  .031214973  .017430725   .02921731 -.005199719
end

In our case, hongkong is our treated unit and the other columns are our control group. The way the first step of the algorithm proceeds is as follows:

Code:

clear *



* Example generated by -dataex-. For more info, type help dataex
clear
input float(hongkong australia newzealand austria canada denmark)
 .062   .04048913   .04724391  -.01308351   .01006395  -.01229182
 .059   .03785692   .03875869 -.007580798   .02126387 -.003092842
 .058   .02250948   .08991753  .000542671  .018919427 -.007764421
 .062   .02874655   .06975085  .001180751   .02531683 -.004048589
 .079   .03399039   .06019911   .02551085   .04356715    .0310944
 .068   .03791937   .06255518  .019941313   .05022538      .06428
 .046   .05228941   .04292477  .017087875   .06512183   .04595546
 .052  .031070895   .04760897  .023035197   .06733068   .05516641
 .037  .008696091  .022149924  .025292696    .0509212   .04805718
 .029  .006773674  .023302693  .021849955   .03152506  .011953605
 .012   .00302829   .03045487  .018319173  .018179957   .02080968
 .015  .010981606   .03069211   .01345693  .015165864  .008303516
 .025   .03818205   .04089288  .015387368  .007820651  .010101924
 .036   .03452006   .03584749  .017335817  .011510062   .03085883
 .047   .03667319   .03766511  .013595447   .02166007   .04011321
 .059   .03898745  .025593406  .004195063   .02470353   .02569367
 .058  .025748033  .005592703 -.001534697  .035775363  .029461836
 .072   .05186257  .034571752 -.002021203   .03868772    .0398559
 .061   .05928891   .03353216    .0161481   .03672911   .01764238
 .014   .06342068  .018211482  .017984418    .0380428   .03139775
-.032   .06243018  .018747104   .03047369  .033733726  .033021096
-.061   .03949186 -.017352613  .032317627  .028866187 -.010453694
-.081   .04318146 -.019380176   .02719671    .0189473   .02242801
-.065  .035846256   .02692859   .02283163   .02248398   .01477213
-.029       .0409   .03889057   .02242066   .03785257  .003396192
 .005   .03857499   .05541473  .028941363   .04818999  .029012986
 .039   .02692009   .06664848   .03839999  .063958384  .010058184
 .083   .03297373   .04101127   .03883952   .06480881  .026324544
 .107   .03959653   .06054202  .036271237   .06725128   .03857407
 .075   .04687658  .034375284  .030074896   .07295554  .030016804
 .076  .024391215    .0207593  .016178379  .065640524   .03717389
 .063  .005838784   .03254855  .011539886   .05321779  .036737546
 .027  .000732203  .015890202  .005441154   .04056079   .01424755
 .015 -.002025588   .05496934 -.005665725  .007522337  .007846387
-.001  .028212607  .036681067 -.004335946 -.017121905   .01250594
-.017   .03998247   .04987184 -.000616657 -.014710952  -.00049339
 -.01   .03867529   .05044868  .003416942 -.011813972  -.00188065
 .005   .04083484  .011498967  .010456725   .01313668   .01491368
 .028  .032215476   .04828115   .01278433    .0296833  .002654192
 .048   .03258446  .015707554  .010514015   .03784149 -.001912172
 .041   .02694338   .02053765  .007017912   .03301423  .005125017
-.009   .02464757    .0400604  .008199689  .015919374  -.01868332
 .038   .03993078   .04220926  .005746135  .025793217 -.005911494
 .047   .05529072   .05431711  .008809593   .02063149  .017165432
 .077   .05989317  .066612795  .013646583   .02710546   .02884403
  .12   .05748491   .06818556  .017229607   .04896003   .03726172
 .066   .04655641   .05111898   .02444402   .05043372  .036128163
 .079  .030096613   .04093498   .02429197   .04781632   .03432884
 .062   .03160237   .01501046   .02370696   .03985261  .020981267
 .071   .04588263   .02387351    .0257305   .03162305   .05209525
 .081   .04553443  .011341385  .026474627   .03574734   .04374102
 .069   .05498263  .008914476  .032616325    .0503335   .02875244
  .09   .04806678  .018647296   .03832044   .04947587   .04931609
 .062   .02698179 -.009260945  .035103742   .04119911   .03880127
 .064  .032730877  .011500795   .03722008  .031677015   .04183601
 .066   .03857545  .036755715   .03898238    .0200051   .02980916
 .055    .0580129   .03946248  .036197655   .03071206  .033133514
 .062   .05951871   .05829326   .03257025   .03982709 -.007168933
 .068   .05664859   .05114701   .03155845   .03474158  .013516975
 .069   .04582468   .04590492   .01909501   .03812844   .02379412
 .073  .027523303  .031214973  .017430725   .02921731 -.005199719
end



cls

* Creates the r-squared frame indexed to each individual unit

mkf rsquare

* We only need one row

frame rsquare: set obs 1

* Our time variable

g time = _n, b(hongkong)

* Gets the list of variable names

qui ds

** Our time column

loc temp: word 1 of `r(varlist)'

loc time: disp "`temp'"

** Our treated unit

loc t: word 2 of `r(varlist)'

loc treated_unit: disp "`t'"

loc a: word 3 of `r(varlist)'

* Our first control unit

loc donor_one: disp "`a'"


local nwords :  word count `r(varlist)'

loc b: word `nwords' of `r(varlist)'

* Our last control unit

loc last_donor: disp "`b'"

*** Step 1: Initial Selection Loop

/* We begin by looping over our controls in regression. */

qui foreach i of var `donor_one'-`last_donor' {
cap drop cfp

constraint define 1 `i' = 1


qui cnsreg `treated_unit' `i' if `time' < 45, constraint(1)

// Calculating our rsquared statistic for the i-th model

qui predict cfp if e(sample) 
qui corr `treated_unit' cfp if e(sample)

frame rsquare: g `i' = r(rho)^2

cap drop cfp


}


** Step 1b: now we select the unit with the highest r-squared statistic

frame rsquare {

qui ds
loc donors `r(varlist)'
qui egen max_value = rowmax(*)

qui gen max_var = ""

* Loop through each column and update max_var
qui foreach var of varlist `donors' {
    replace max_var = "`var'" if `var' == max_value
}

loc colmax : di max_var[1]

loc U: di "`colmax'"


di "First selected unit is `U'"

// In this case it's canada

}

frame drop rsquare
cls

We begin by looping over our control units in DID regression (outcome only). We then calculate the predicted counterfactual for this pre-intervention period and calculate the R-squared statistic. We save the r2 to a separate frame. The first unit to be selected is the one which maximizes the r-squared statistic, which we may access in the rsquare frame. We then put this unit up to the front of our control group, and proceed.

Now, using the new control group (which for now consists of just canada), we take the average of canada and the other control units individually, estimate difference-in-differences with each of the remaining control units, and then find the unit which maximizes the r-squared statistic. In this case, it's New Zealand.

Code:

// Step 2: DID Step

*******
*******

order `U', a(`treated_unit')

mkf rsquare

frame rsquare: set obs 1

loc newdonors : list donors - U

di "`newdonors'"


local nwords :  word count `newdonors'

loc temp: word 1 of `newdonors' // Time

loc donor_one: disp "`temp'"


loc last_donor: word `nwords' of `newdonors'
di "The last donor in the set is `last_donor'"

/* In this step, we loop through the REMAINING donors.
In this case, australia and austria. */


foreach i of var `donor_one'-`last_donor' {

// These must be created each time

cap drop cfp
cap drop ym

* We take the average of controls, using the U selected group and the new donor `v'
egen ym = rowmean(`U' `i')

constraint define 1 ym = 1

qui cnsreg `treated_unit' ym if `time' < 45, constraint(1)

qui predict cfp if e(sample) 

qui corr `treated_unit' cfp if e(sample)

qui frame rsquare: g `i' = r(rho)^2

cap drop cfp

    if `i'==`last_donor' {
        frame rsquare {
        cap drop max_value
        cap drop max_var
        qui ds
        loc donors `r(varlist)'
        egen max_value = rowmax(*)

        gen max_var = ""

        * Loop through each column and update max_var
        qui foreach var of varlist `donors' {
            replace max_var = "`var'" if `var' == max_value
        }

        loc colmax : di max_var[1]
        di "Our next optimal donor is: `colmax'"
        local selectedunit `U' `colmax'
        loc newdonors : list donors - selectedunit
        di "Here are our new controls: `newdonors'"
        }
            
            
    }


}

Here's where my question lies: Now, we must implement step 2 for the remaining 3 controls (australia austria denmark). That is, we must see which of these three, when added to canada and new zealand, maximizes the r2, adding it to the macro/set U. Then, we must see which of the 2 maximize the r2 statistic, and then we simply add the final control unit to set U, such that we have all units included in the set "U". How might I do this? Or at the very least, what might a good starting point be? My initial reaction is to use a while loop which continues to loop until some condition is fulfilled. Maybe, I should, at the end of the loop for step 2, check how many words there are in the `newdonors'. Once there's 0 words in `newdonors', this means that there are no more control units and then the while loop can conclude. Is that a reasonable starting point?

Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3848

11 Jul 2024, 08:06

Not sure I understood all of that. Try

Code:

local treated_unit hongkong

local U // void

local predictors australia newzealand austria canada denmark

tempvar ym cfp

tempname max_r2

while ("`predictors'" != "") {
    
    scalar `max_r2' = 0
    
    foreach var of local predictors {
        
        quietly {
            
            egen `ym' = rowmean(`U' `var')
            
            constraint 1 `ym' = 1
            
            cnsreg `treated_unit' `ym' if _n < 45 , constraint(1)
            
            predict `cfp' if e(sample)
            correlate `treated_unit' `cfp' if e(sample)
            
            if r(rho)^2 > scalar(`max_r2') {
                
                scalar `max_r2' = r(rho)^2
                local new_U `var'
                
            }
            
            drop `ym'
            drop `cfp'
            
        }
        
    }
    
    local U `U' `new_U'
    
    local predictors : list predictors - new_U

}

display "`U'"

Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

11 Jul 2024, 08:30

Thank you! Sorry for the sort of confusing explanation. Here is the solution I devised. It also comports with my Python code:

Code:

clear *

import delim "https://raw.githubusercontent.com/jgreathouse9/GSUmetricspolicy/main/data/GDP.csv", clear

//drop china
cls

* Creates the r-squared frame indexed to each individual unit
mkf rsquare

* We only need one row
frame rsquare: set obs 1

* Our time variable
gen time = _n


* Gets the list of variable names
qui ds

* Our treated unit
local treated_unit hongkong

* Get control units list
local control_units australia austria canada denmark finland france germany italy japan korea mexico netherlands newzealand norway switzerland unitedkingdom unitedstates singapore philippines indonesia malaysia thailand taiwan china

* Initial selection loop
foreach i of varlist `control_units'{
    cap drop cfp
cap drop ym
cap drop tss rss
    constraint define 1 `i' = 1
    qui cnsreg `treated_unit' `i' if time < 45, constraint(1)
    qui predict cfp if e(sample)
    * Calculate the mean of observed values
    qui summarize `treated_unit'
    local mean_observed = r(mean)

    * Calculate the Total Sum of Squares (TSS)
    qui generate double tss = (`treated_unit' - `mean_observed')^2
    qui summarize tss
    local TSS = r(sum)

    * Calculate the Residual Sum of Squares (RSS)
    qui generate double rss = (`treated_unit' - cfp)^2
    qui summarize rss
    local RSS = r(sum)

    * Calculate and display R-squared
    loc r2 = 1 - (`RSS' / `TSS')
    frame rsquare: gen `i' = `r2'
    cap drop cfp
}


frame rsquare {

qui ds
loc donors `r(varlist)'
qui egen max_value = rowmax(*)

qui gen max_var = ""

* Loop through each column and update max_var
qui foreach var of varlist `donors' {
    replace max_var = "`var'" if `var' == max_value
}

loc optimal_control_1 : di max_var[1]

}

* Initialize U and D
local U `optimal_control_1'
local D: list control_units - U

* Iterate through the remaining control units
qui while `"`D'"' != "" {
    local next_control ""
    local max_r2 -1
    
    foreach j of local D {
        local current_set `U' `j'
        cap drop cfp
    cap drop ym
    cap drop tss rss
    egen ym = rowmean(`U' `j')
    //di "`U' `j'"
        constraint define 1 ym = 1
        qui cnsreg `treated_unit' ym if time < 45, constraint(1)
        qui predict cfp if e(sample)
    * Calculate the mean of observed values
    qui summarize `treated_unit'
    local mean_observed = r(mean)

    * Calculate the Total Sum of Squares (TSS)
    qui generate double tss = (`treated_unit' - `mean_observed')^2
    qui summarize tss
    local TSS = r(sum)

    * Calculate the Residual Sum of Squares (RSS)
    qui generate double rss = (`treated_unit' - cfp)^2
    qui summarize rss
    local RSS = r(sum)

    * Calculate and display R-squared
    loc r2 = 1 - (`RSS' / `TSS')
        if `r2' > `max_r2' {
            local next_control `j'
            local max_r2 `r2'
        }
        cap drop cfp
    }
    
    * Add the best control unit to U
    local U `U' `next_control'
    
    * Remove the selected unit from D
    local D: list D - next_control

}
drop ym
order time, first
order `U', a(hongkong)
cls

* Initialize an empty local to build the variable list
local varlist
local best_r2 = -1
local best_model = ""
* Loop through each variable in the list
foreach x of local U {
* Drop previous variables
    cap drop cf 
    cap drop ymean 
    cap drop tss
    cap drop rss
    * Add the current variable to the list
    local varlist `varlist' `x'
    
    * Display the current variable list (for debugging purposes)
    //display "`varlist'"
 
    
    * Generate the row mean for the current set of variables
    egen ymean = rowmean(`varlist')
    
    * Define the constraint
    constraint define 1 ymean = 1
    
    * Run the constrained regression
    qui cnsreg `treated_unit' ymean if time < 45, constraints(1)
    
    qui predict cf if e(sample)
    
    * Calculate the mean of observed values
    qui summarize `treated_unit'
    local mean_observed = r(mean)

    * Calculate the Total Sum of Squares (TSS)
    qui generate double tss = (`treated_unit' - `mean_observed')^2
    qui summarize tss
    local TSS = r(sum)

    * Calculate the Residual Sum of Squares (RSS)
    qui generate double rss = (`treated_unit' - cf)^2
    qui summarize rss
    local RSS = r(sum)

    * Calculate and display R-squared
    loc r2 = 1 - (`RSS' / `TSS')
    
    * Check if the current R-squared is the highest
    if (`r2' > `best_r2') {
        local best_r2 = `r2'
        local best_model = "`varlist'"
    }
}

* Display the best model and its R-squared value
di "Model with the highest R2 is with variables `best_model' having R2 = `best_r2'"

keep time `treated_unit' `best_model'

egen ymean = rowmean(`best_model')

* Define the constraint
constraint define 1 ymean = 1

* Run the constrained regression
qui cnsreg `treated_unit' ymean if time < 45, constraints(1)

qui predict cf

keep time `treated_unit' cf

twoway (connected hongkong time, mcolor(black) msymbol(diamond) lcolor(black) lwidth(medthick) connect(direct)) (connected cf time, mcolor(red) msymbol(triangle) lcolor(red) lwidth(medium)), legend(order(1 "Hong Kong" 2 "FDID Hong Kong") ring(0))

It basically selects the best DID model (using r2 as the determining factor)

Comment

daniel klein

Join Date: Mar 2014

Posts: 3848
#4

11 Jul 2024, 08:37

Out of curiosity, does this rather complex code produce results different from the code I posted in #2? It seems as if the whole frame and variable ordering in the datasets is unnecessary overhead.
1 like
Comment

Jared Greathouse

Join Date: Sep 2021
Posts: 2170

11 Jul 2024, 09:03

yeah this worked! The only issue that I see, for some reason, is the r-squared statistic. When I run the full example,

Code:

clear *

import delim "https://raw.githubusercontent.com/jgreathouse9/GSUmetricspolicy/main/data/GDP.csv", clear

//drop china
cls

* Creates the r-squared frame indexed to each individual unit
mkf rsquare

* We only need one row
frame rsquare: set obs 1

* Our time variable
gen time = _n

* Our treated unit
local treated_unit hongkong

* Get control units list
local predictors australia austria canada denmark finland france germany italy japan korea mexico netherlands newzealand norway switzerland unitedkingdom unitedstates singapore philippines indonesia malaysia thailand taiwan china

local treated_unit hongkong

local U // void

//local predictors australia newzealand austria canada denmark

tempvar ym cfp

tempname max_r2

while ("`predictors'" != "") {
    
    scalar `max_r2' = 0
    
    foreach var of local predictors {
        
        quietly {
            
            egen `ym' = rowmean(`U' `var')
            
            constraint 1 `ym' = 1
            
            cnsreg `treated_unit' `ym' if _n < 45 , constraint(1)
            
            predict `cfp' if e(sample)
            correlate `treated_unit' `cfp' if e(sample)
            
            if r(rho)^2 > scalar(`max_r2') {
                
                scalar `max_r2' = r(rho)^2
                local new_U `var'
                
            }
            
            drop `ym'
            drop `cfp'
            
        }
        
    }
    
    local U `U' `new_U'
    
    local predictors : list predictors - new_U

}

display "`U'"

I get "malaysia newzealand norway thailand singapore mexico japan china philippines australia korea indonesia unitedstates unitedkingdom canada switzerland taiwan denmark germany netherlands italy austria france finland", but that's not the right answer (in the sense that it does not match the order of the Python code). This has to do, apparently, with the calculation of the R squared statistic. When I calculate R squared a different way

Code:

clear *

import delim "https://raw.githubusercontent.com/jgreathouse9/GSUmetricspolicy/main/data/GDP.csv", clear

//drop china
cls

* Creates the r-squared frame indexed to each individual unit
mkf rsquare

* We only need one row
frame rsquare: set obs 1

* Our time variable
gen time = _n

* Our treated unit
local treated_unit hongkong

* Get control units list
local predictors australia austria canada denmark finland france germany italy japan korea mexico netherlands newzealand norway switzerland unitedkingdom unitedstates singapore philippines indonesia malaysia thailand taiwan china

local treated_unit hongkong

local U // void

//local predictors australia newzealand austria canada denmark

tempvar ym cfp

tempname max_r2

while ("`predictors'" != "") {
    
    scalar `max_r2' = 0
    
    foreach var of local predictors {
        cap drop rss
    cap drop tss
        
        quietly {
            
            egen `ym' = rowmean(`U' `var')
            
            constraint 1 `ym' = 1
            
            cnsreg `treated_unit' `ym' if _n < 45 , constraint(1)
            
            predict `cfp' if e(sample)
        
            * Calculate the mean of observed values
        qui summarize `treated_unit'
        local mean_observed = r(mean)

        * Calculate the Total Sum of Squares (TSS)
        qui generate double tss = (`treated_unit' - `mean_observed')^2
        qui summarize tss
        local TSS = r(sum)

        * Calculate the Residual Sum of Squares (RSS)
        qui generate double rss = (`treated_unit' - `cfp')^2
        qui summarize rss
        local RSS = r(sum)

        * Calculate and display R-squared
        loc r2 = 1 - (`RSS' / `TSS')
        
            
            if `r2' > scalar(`max_r2') {
                
                scalar `max_r2' = `r2'
                local new_U `var'
                
            }
            
            drop `ym'
            drop `cfp'
            
        }
        
    }
    
    local U `U' `new_U'
    
    local predictors : list predictors - new_U

}

display "`U'"

I get the answer that matches the original matlab code and my Python code.

The order matters in this instance because, per my rather messy code above, we will iteratively estimate DD over each of these donors in optimal order of assignment. That is to say, we begin with Phillipines, then we add Singapore, then Thailand... until the end. In this case, that's 24 models. We select the model of these 24 with the best R2, and use that as our final model. So, your code did what I attempted, but for some reason the R2 produces different values compared to this one. The correct answer, in this instance, is the first 9 optimal controls

Comment

daniel klein

Join Date: Mar 2014

Posts: 3848
#6

11 Jul 2024, 12:23

I did not want to imply that order is irrelevant. I just did (and do) not see the need to order within the loop(s). Likewise, the values of treated_unit never change so you should not recalculate TSS repeatedly within the loop; and you certainly do not want to do it within the inner loop. Also, you only need one call to summarize if you

Code:

summarize `treated_unit' scalar TSS = r(Var) * (r(N)-1)

Note that I use a scalar not a local to maintain precision. I don't think that is necessary for what you are doing but I noticed that you used double precision for the variables. You do not do that with predict, though.

One more thing: Do you really need the e(sample) restriction inside the loop? Shouldn't the regressions (and sums of squares) not all be based on the same observations?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#7

11 Jul 2024, 14:52

Likewise, the values of treated_unit never change so you should not recalculate TSS repeatedly within the loop; and you certainly do not want to do it within the inner loop... Note that I use a scalar not a local to maintain precision.

Oh yeah that's pretty true. And yes, the scalar method does maintain precision, so I'll change this then.

The e(sample) restriction simply means we are calculating for only the sample we just estimated with right? In this case, we are fitting the model in the pre-period only (or, t < 44). If I didn't use e(sample), that would be predicting the post-intervention counterfactual, right?
Comment
daniel klein

Join Date: Mar 2014

Posts: 3848
#8

11 Jul 2024, 16:29

Originally posted by Jared Greathouse View Post

The e(sample) restriction simply means we are calculating for only the sample we just estimated with right? In this case, we are fitting the model in the pre-period only (or, t < 44). If I didn't use e(sample), that would be predicting the post-intervention counterfactual, right?

I see. But then you should estimate the TSS and RSS on that sample, too, right? Also, the period is constant over all regressions so I would start with

Code:

mark sample if time < 45

Then, to make sure that all regressions include the same non-missing observations, I would add

Code:

markout sample hongkong australia austria canada ...

After all, you don't want to compare r-squared values for regressions based on different samples because of missing values, do you?

With my variable sample (which could be a temporary variable if needed) in place, I would

Code:

preserve keep if sample ...

and then start the algorithm on the reduced dataset, which should also result in slight speed gains compared to repeatedly evaluating the if expression. The cost is preserve, of course.

EDIT: I just realized that the RSS is calculated for the sample only, because the predictions are restricted to e(sample), leading to out-of-sample missing values for the differences in rss that summarize will then ignore. But that's pretty implicit code. Also, TSS is still based on the whole (wrong) sample.

Last edited by daniel klein; 11 Jul 2024, 16:36.
Comment

Announcement

Implementing an Iterative Loop While Updating What We're Looping Over

Comment

Comment

Comment

Comment

Comment

Comment

Comment