Hello everyone
I need help with my project for a research in which I use Stata for the first time and therefore do not know how to use it so well and i hope my question is not too long . I have been despairing of this for some time. I would like to be able to reproduce the research results of Brav et al. (https://www.sciencedirect.com/scienc...26tMaSI7sdaczn ), so that I am sure that I have proceeded correctly and then continue my construction for further investigations with additional variables.
I am concerned here with table 7, panels b and c in the research paper. For my part, I have already been able to reduce the data from the dataset cut to the individual requirements that were given. In total there are almost 72000 observations (so no dataex is possible) for panel b, where I then estimate 2 parameters in the next step with an ols regression for the formula.
Here my steps so far for Panel B, Panel C is quite different but not much:
And here begins the problem where I would have several questions. On the one hand my calculated values in the first two subperiods are a bit low, but here it also seems that my estimated earning value(beta2) is too high (or beta1 too low), because after converting my regression results in my head I have a higher estimated value for beta2 than for beta1 although this should not be according to the research paper. However, the third and last sub-period for me is completely beyond the scope, because here my beta1 has a much larger value than in the first two sub-periods. According to Brav et al. this should not be the case.
There I ask myself the question, what mistake did I make here that I can not reflect the results, especially the third subperiod? Should I not use the pooled OLS regression or the crosssectional OLS regression, as it is mentioned in the description of the table? If yes, how do I do that? The whole thing confuses me extremely because the number of observations i have correspond quite closely to those in the paper.
Here you can see, that the coefficient for the third regression beta1(prior_year_dvpsx_f) is much higher than the other one. This shouldnt't be the case.
Also what is the difference if i insert vce(robust) in my regression. I saw a big difference in my f-value and t-value but what this changes mean?
Furthermore, I come to the second question, namely the conversion of my regression results into a table that corresponds to the research results, i.e. how can I create a table with the regression results in 3 different subperiods with the years and the SD,median,25th and 75th percentile for SOA,TP and adjusted R squared, where SOA is -beta1 and TP is -beta2/beta1?
Would be super helpful to see answers that can move me forward here and finally continue the investigations as well.
Greetings
Steffen
I need help with my project for a research in which I use Stata for the first time and therefore do not know how to use it so well and i hope my question is not too long . I have been despairing of this for some time. I would like to be able to reproduce the research results of Brav et al. (https://www.sciencedirect.com/scienc...26tMaSI7sdaczn ), so that I am sure that I have proceeded correctly and then continue my construction for further investigations with additional variables.
I am concerned here with table 7, panels b and c in the research paper. For my part, I have already been able to reduce the data from the dataset cut to the individual requirements that were given. In total there are almost 72000 observations (so no dataex is possible) for panel b, where I then estimate 2 parameters in the next step with an ols regression for the formula.
Here my steps so far for Panel B, Panel C is quite different but not much:
Code:
egen firmid = group(gvkey) egen timeid = group(fyear) duplicates report firmid timeid duplicates tag firmid timeid, gen(isdup) drop if isdup tsset firmid timeid sum fyear, d return list //generate 3 new variable gen prior_year_dvpsx_f = L.dvpsx_f gen earning = epspx gen deltadvpsx = dvpsx_f - prior_year_dvpsx_f // drop if missing drop if earning==. drop if dvpsx_f==. //generate subperiod generate fy = . //generate 3 subperiod replace fy=1 if fyear <= 1964 replace fy=2 if fyear > 1964 & fyear <= 1983 replace fy=3 if fyear > 1983 & fyear <= 2002 //Count the number of firmid in subperiod bysort firmid: egen counter1=count(firmid) if fyear <= 1964 bysort firmid: egen counter2=count(firmid) if fyear > 1964 & fyear <= 1983 bysort firmid: egen counter3=count(firmid) if fyear > 1983 & fyear <= 2002 //Drop obs. if data not available whe same like subperiod drop if fy == 1 & counter1 < 15 drop if fy == 2 & counter2 < 19 drop if fy == 3 & counter3 < 19 // Summarize each subperiod obvservation sum counter1 if fy == 1 & counter1==15, d sum counter2 if fy == 2 & counter2==19, d sum counter3 if fy == 3 & counter3==19, d // Regression reg deltadvpsx prior_year_dvpsx_f earning if fy==1 reg deltadvpsx prior_year_dvpsx_f earning if fy==2 reg deltadvpsx prior_year_dvpsx_f earning if fy==3
There I ask myself the question, what mistake did I make here that I can not reflect the results, especially the third subperiod? Should I not use the pooled OLS regression or the crosssectional OLS regression, as it is mentioned in the description of the table? If yes, how do I do that? The whole thing confuses me extremely because the number of observations i have correspond quite closely to those in the paper.
Code:
. reg deltadvpsx prior_year_dvpsx_f earning if fy==1 Source | SS df MS Number of obs = 7,210 -------------+---------------------------------- F(2, 7207) = 1530.15 Model | 713.729123 2 356.864562 Prob > F = 0.0000 Residual | 1680.83347 7,207 .233222349 R-squared = 0.2981 -------------+---------------------------------- Adj R-squared = 0.2979 Total | 2394.56259 7,209 .332162934 Root MSE = .48293 ------------------------------------------------------------------------------------ deltadvpsx | Coefficient Std. err. t P>|t| [95% conf. interval] -------------------+---------------------------------------------------------------- prior_year_dvpsx_f | -.3149017 .0058851 -53.51 0.000 -.3264382 -.3033653 earning | .1047265 .0025142 41.65 0.000 .0997979 .1096552 _cons | .1120818 .0101159 11.08 0.000 .0922517 .1319118 ------------------------------------------------------------------------------------ . reg deltadvpsx prior_year_dvpsx_f earning if fy==2 Source | SS df MS Number of obs = 32,144 -------------+---------------------------------- F(2, 32141) = 3662.12 Model | 1016.22144 2 508.110722 Prob > F = 0.0000 Residual | 4459.48791 32,141 .13874764 R-squared = 0.1856 -------------+---------------------------------- Adj R-squared = 0.1855 Total | 5475.70936 32,143 .170354645 Root MSE = .37249 ------------------------------------------------------------------------------------ deltadvpsx | Coefficient Std. err. t P>|t| [95% conf. interval] -------------------+---------------------------------------------------------------- prior_year_dvpsx_f | -.2169661 .0028232 -76.85 0.000 -.2224997 -.2114324 earning | .0737496 .0010292 71.66 0.000 .0717322 .0757669 _cons | .0330613 .0029897 11.06 0.000 .0272013 .0389212 ------------------------------------------------------------------------------------ . reg deltadvpsx prior_year_dvpsx_f earning if fy==3 Source | SS df MS Number of obs = 33,464 -------------+---------------------------------- F(2, 33461) = 8296.23 Model | 56096.8386 2 28048.4193 Prob > F = 0.0000 Residual | 113127.127 33,461 3.38086511 R-squared = 0.3315 -------------+---------------------------------- Adj R-squared = 0.3315 Total | 169223.966 33,463 5.05704707 Root MSE = 1.8387 ------------------------------------------------------------------------------------ deltadvpsx | Coefficient Std. err. t P>|t| [95% conf. interval] -------------------+---------------------------------------------------------------- prior_year_dvpsx_f | -.658861 .0051149 -128.81 0.000 -.6688864 -.6488356 earning | 7.94e-07 8.72e-06 0.09 0.927 -.0000163 .0000179 _cons | .3111938 .0103593 30.04 0.000 .2908892 .3314985 ------------------------------------------------------------------------------------
Also what is the difference if i insert vce(robust) in my regression. I saw a big difference in my f-value and t-value but what this changes mean?
Furthermore, I come to the second question, namely the conversion of my regression results into a table that corresponds to the research results, i.e. how can I create a table with the regression results in 3 different subperiods with the years and the SD,median,25th and 75th percentile for SOA,TP and adjusted R squared, where SOA is -beta1 and TP is -beta2/beta1?
Would be super helpful to see answers that can move me forward here and finally continue the investigations as well.
Greetings
Steffen
Comment