Good morning Stata-friends,
I am currently puzzling with a rather complicated setting and I did not find any matching thread within the last hour.
So my issue is the following:
- This would not be too difficult, BUT there is one big problem: unfortunately, one of the control variables is missing for all observations within sample1.
So if I run my loop, it always breaks when reaching the incomplete time series (e.g. volume_5), because STATA does not run a regression if no single observation of that volume-variable exists.
- What I would like to do, is to exclude the missing variable from any regression in sample1.
- But as all my control variables are stored in a local "controlvars" (I would like to avoid having to delete it manually), I do not know how to exclude missing variable volume5 from this local under the condition that I run sample1.
1st level of loop: Sample splits (5year sample 1, 5year sample 2, sample 3)
2nd level of loop: Different sets of control variables (some control variables occur in all regressions [e.g. "lagged_return"], some other variables only occur once per sample period [e.g. "volume_1" - "volume_20"])
So the regression looks like this.
The overall code structure is similar to this one (but far more complicated, in reality I got another level of nested loop, but this does not matter here):
This works perfectly fine, if I only include sample2 or 3.
But using sample1 is impossible, as volume_5 is completely missing in sample5. So during the fifth regression, my code always breaks [error r(2000)].
So I tried everything I could imagine to get rid of volume_5 if nsample is 1.
So far, nothing worked.
Those are my attempts, hopefully someone can come up with something that works. I also tried them at different locations within the code, without success.
Is there anything I could do?
I am getting a little desperate about this.
Would highly appreciate any feedback.
Best regards,
Carlos
I am currently puzzling with a rather complicated setting and I did not find any matching thread within the last hour.
So my issue is the following:
- I have time series data of stock returns and numerous control variables
- I want to conduct a number of regressions
- However, I would like to conduct a sample split - so, I would like to run my code for different sample periods
- Within each sample period, I am then running several regressions with different control variables (so I got a nested loop)
- For this, I load (= use file.dta, clear) my datafile again within every iteration of my sampleperiod, drop the unnecessary period and only use the remaining period for regressions.
- This would not be too difficult, BUT there is one big problem: unfortunately, one of the control variables is missing for all observations within sample1.
So if I run my loop, it always breaks when reaching the incomplete time series (e.g. volume_5), because STATA does not run a regression if no single observation of that volume-variable exists.
- What I would like to do, is to exclude the missing variable from any regression in sample1.
- But as all my control variables are stored in a local "controlvars" (I would like to avoid having to delete it manually), I do not know how to exclude missing variable volume5 from this local under the condition that I run sample1.
1st level of loop: Sample splits (5year sample 1, 5year sample 2, sample 3)
2nd level of loop: Different sets of control variables (some control variables occur in all regressions [e.g. "lagged_return"], some other variables only occur once per sample period [e.g. "volume_1" - "volume_20"])
So the regression looks like this.
Code:
reg return lagged_return other_vars volumei
Code:
use data.dta, clear *1st level of loop - conduct all regressions for various sample periods forvalues nsample = 1/3 { *Enables create different samples use data.dta, clear *Generate dummy for sample gen sample1 = 1 if date < td(01jan2010) gen sample2 = 1 if date >= td(01jan2010) gen sample3 = 1 // full sample *Only use sample period drop if sample`nsample' != 1 *Define Control variables (store in locals, as there are many variables)*Store Low-level alternations of volume local volume_1 volume_1a volume_1b volume_1c local volume_2 volume_2a volume_2b volume_2c*Store higher level volume variable set local volume `volume_1' `volume_2' `volume_3'-`volume_20' local other_vars any_varlist*2nd level of loop - regress return on a set of variables and one variable volume(i) per iteration foreach volume_i of local volume { local controlvars `volume_i' `other_vars' lagged_return reg return `controlvars' } }
But using sample1 is impossible, as volume_5 is completely missing in sample5. So during the fifth regression, my code always breaks [error r(2000)].
So I tried everything I could imagine to get rid of volume_5 if nsample is 1.
So far, nothing worked.
Those are my attempts, hopefully someone can come up with something that works. I also tried them at different locations within the code, without success.
Code:
*Tried to overwrite the local for volume_5 with blanks if I am in sample1 if "`nsample'" == 1 local volume_5 // type mismatch r(109) if `nsample' == 1 local volume_5 // r(2000) local volume_5 if "`nsample'" == "1" // r(2000) local volume_5 if "`nsample'" == 1 // r(2000) *Tried to drop the variables related to volume_5 if I am in sample1 drop volume5 if `nsample' = 1 // variable volume_5 not found r(111)
I am getting a little desperate about this.
Would highly appreciate any feedback.
Best regards,
Carlos
Comment