I am interested in writing a program to make model selection in Abadie, Diamond and Hainmueller's -synth- command more principled. Synthetic control chooses a set of optimal weights that create a synthetic version of the treatment unit in the pre and post-treatment period using only the pre-treatment data series for matching. But, there is presently no guidance on which variables to use for model selection, and in my experience, it's in the area of model selection that the potential for p-hacking can occur. Different models may have different post-treatment dynamics, but the reader only sees the one model that the author chose, and oftentimes without any discussion of how that model was ultimately selected. If anything, it's based on an eyeball test. I am wanting to create an automatic procedure which does two things:
1) Choose the model with the lowest pre-treatment root mean squared prediction error
2) Calculate a distribution of test statistics that are slight perturbations of a main preferred model to see how robust the main preferred model is to slight variations in the model selected
To do either of these, I need to write a program that will (a) create all combinations and (b) loop through those combinations, all within the -synth- syntax. I am having trouble creating the entire combination sequence. Here's my example. Assume the treatment happens in period T, and there are two pre-treatment periods. Assume one matching covariate, Y. That yields the following four matching covariates: Y, Y(1), Y(2) and Y(1&2). The number of models is [Y], [Y Y(1)], [Y Y(2)], [Y Y(1&2)], and so on.
The synth syntax is fairly straightforward. It's:
. synth outcome depvar, stuff
where depvar can be entered for certain years, certain combination of years, or alone. What I need is to create a loop that estimates many synth models, but which exchanges depvar for each new model based on one of the combinations previously calculated. I'm having trouble figuring out how to create the combinations first of all to then use for a foreach loop based on all subsequent models, so wanted to get some advice.
Apologies for the length of the post. I wanted to be as thorough as possible.
1) Choose the model with the lowest pre-treatment root mean squared prediction error
2) Calculate a distribution of test statistics that are slight perturbations of a main preferred model to see how robust the main preferred model is to slight variations in the model selected
To do either of these, I need to write a program that will (a) create all combinations and (b) loop through those combinations, all within the -synth- syntax. I am having trouble creating the entire combination sequence. Here's my example. Assume the treatment happens in period T, and there are two pre-treatment periods. Assume one matching covariate, Y. That yields the following four matching covariates: Y, Y(1), Y(2) and Y(1&2). The number of models is [Y], [Y Y(1)], [Y Y(2)], [Y Y(1&2)], and so on.
The synth syntax is fairly straightforward. It's:
. synth outcome depvar, stuff
where depvar can be entered for certain years, certain combination of years, or alone. What I need is to create a loop that estimates many synth models, but which exchanges depvar for each new model based on one of the combinations previously calculated. I'm having trouble figuring out how to create the combinations first of all to then use for a foreach loop based on all subsequent models, so wanted to get some advice.
Apologies for the length of the post. I wanted to be as thorough as possible.