Hi,
I need some help on how I can accomplish the following using stata.
I have panel data containing observations for >100 firms (panels) over a number of years. On a conceptual level I would like to run a Pooled OLS univariate regression for each subsample based on a median sample split for a given variable. Subsequently, I want to obtain the R2 of these two regressions and calculate the difference in R2. Then I want to obtain a simulated distribution of differences in R2 based on bootstrapped regressions. So essentially I want to use the bootstrap command in stata to create pseudo sample splits and produce a distribution of differences in R2 between the subsample regressions. The goal is to test whether the actual difference in R2 is different from the simulated distribution of differences in R2.
To get the median split I have created a dummy, which =1 for observations > median value and =0 for observations <= median value. The regression I want to run looks like this: reg y x if dummy=1/0, cluster(Panel_ID). This regression will be used to obtain the actual difference in R2 between subsamples. Additionally, I want to obtain a simulated distribution of differences in R2 from the bootstrap without replacement. I was thinking of using something like the following for the bootstrap command: boostrap r2=e(r2), reps(2000) *size(=n from original regression of each subsample)*: reg y x, cluster(Panel_ID).
Based on the actual difference in R2 and the simulated distribution of differences in R2, I want to test whether the actual difference in R2 is different from the bootstrapped distribution of differences in R2 using a bootstrap test.
Below I have summarized my questions/problems:
(1) How can I save the R2 of each initial regression and calculate the difference in R2 (=actual difference in R2)?
(2) How can I create a simulated distribution of differences in R2 between pseudo sample split subsamples? For the bootstrap command I want to set *size()* to the n that is observed for each subsample regression performed earlier (which should approx. be the same, therefore I want my pseudo subsamples to have the same size()). Any suggestions for setting size() are welcome. Furthermore,I want to save the obtained differences in R2 for testing (without losing my original dataset).
(3) Lastly, I want to test whether the actual difference in R2 is different from the simulated distribution of differences in R2 using a bootstrap test. How can I conduct such a test?
Any help and suggestions would be much appreciated.
Many thanks, Ali
I need some help on how I can accomplish the following using stata.
I have panel data containing observations for >100 firms (panels) over a number of years. On a conceptual level I would like to run a Pooled OLS univariate regression for each subsample based on a median sample split for a given variable. Subsequently, I want to obtain the R2 of these two regressions and calculate the difference in R2. Then I want to obtain a simulated distribution of differences in R2 based on bootstrapped regressions. So essentially I want to use the bootstrap command in stata to create pseudo sample splits and produce a distribution of differences in R2 between the subsample regressions. The goal is to test whether the actual difference in R2 is different from the simulated distribution of differences in R2.
To get the median split I have created a dummy, which =1 for observations > median value and =0 for observations <= median value. The regression I want to run looks like this: reg y x if dummy=1/0, cluster(Panel_ID). This regression will be used to obtain the actual difference in R2 between subsamples. Additionally, I want to obtain a simulated distribution of differences in R2 from the bootstrap without replacement. I was thinking of using something like the following for the bootstrap command: boostrap r2=e(r2), reps(2000) *size(=n from original regression of each subsample)*: reg y x, cluster(Panel_ID).
Based on the actual difference in R2 and the simulated distribution of differences in R2, I want to test whether the actual difference in R2 is different from the bootstrapped distribution of differences in R2 using a bootstrap test.
Below I have summarized my questions/problems:
(1) How can I save the R2 of each initial regression and calculate the difference in R2 (=actual difference in R2)?
(2) How can I create a simulated distribution of differences in R2 between pseudo sample split subsamples? For the bootstrap command I want to set *size()* to the n that is observed for each subsample regression performed earlier (which should approx. be the same, therefore I want my pseudo subsamples to have the same size()). Any suggestions for setting size() are welcome. Furthermore,I want to save the obtained differences in R2 for testing (without losing my original dataset).
(3) Lastly, I want to test whether the actual difference in R2 is different from the simulated distribution of differences in R2 using a bootstrap test. How can I conduct such a test?
Any help and suggestions would be much appreciated.
Many thanks, Ali
Comment