Hi,
I have time series trading data on an aggregate level for two distinct groups of investors (group A and group B). I want to estimate the effect of variable X_t on their daily trading volume VOL_t. The two groups have been constructed based on a criterion that depends to a certain extent on a daily trading signal. This signal can be represented through a dummy variable D_t which is 1 if the signal is observed on day t and zero else. Investors of group A trade relatively frequently according to that signal and investors of group B just trade randomly according to that signal.
My hypothesis is that explanatory variable X_t only has a positive effect on VOL_t for investors of group A on signal days (i.e., D_t = 1). It should have no effect on VOL_t for group A investors if D_t = 0. Also, it should have no (or at least a smaller) effect on VOL_t for group B investors.
I have some issues to construct a model for this.
My attempt was to fit a regression model for VOL_t of group A and a model for VOL_t of group B of the form:
VOL_t = a + b1 * D_t + b2 * X_t + b3 * (D_t * X_t) + e_t
where (D_t * X_t) is the interaction term between X_t and D_t.
If I estimate this model separately for group A and group B trading volume, I get the results hypothesized above.
Results for group A investors:
Results for group B investors:
However, I am not sure if this is really what is considered a good specification because VOL_t is correlated with D_t by construction for group A (even though it is quite low: corr(VOL_t,D_t) = 0.04 for group A and corr(VOL_t,D_t) = −0.01 for group B).
So, my question is: Can I interpret a positive and significant coefficient b3 in the regression model for group A as an indication for higher trading volume if X_t is higher on days with D_t = 1? Or can this result be simply spurious due to the relation between VOL_t and D_t for group A?
If that is the case: How could I do better regarding the design of the regression model?
Thanks already for any input!
I have time series trading data on an aggregate level for two distinct groups of investors (group A and group B). I want to estimate the effect of variable X_t on their daily trading volume VOL_t. The two groups have been constructed based on a criterion that depends to a certain extent on a daily trading signal. This signal can be represented through a dummy variable D_t which is 1 if the signal is observed on day t and zero else. Investors of group A trade relatively frequently according to that signal and investors of group B just trade randomly according to that signal.
My hypothesis is that explanatory variable X_t only has a positive effect on VOL_t for investors of group A on signal days (i.e., D_t = 1). It should have no effect on VOL_t for group A investors if D_t = 0. Also, it should have no (or at least a smaller) effect on VOL_t for group B investors.
I have some issues to construct a model for this.
My attempt was to fit a regression model for VOL_t of group A and a model for VOL_t of group B of the form:
VOL_t = a + b1 * D_t + b2 * X_t + b3 * (D_t * X_t) + e_t
where (D_t * X_t) is the interaction term between X_t and D_t.
If I estimate this model separately for group A and group B trading volume, I get the results hypothesized above.
Results for group A investors:
Code:
| Robust VOL_A | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------------+---------------------------------------------------------------- D | .776241 .0581441 13.35 0.000 .6619864 .8904956 X | .0012703 .0045436 0.28 0.780 -.007658 .0101986 D_times_X | .1556122 .0475769 3.27 0.001 .0621225 .2491019
Results for group B investors:
Code:
| Robust VOL_B | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------------+---------------------------------------------------------------- D | -.1899324 .0453358 -4.19 0.000 -.2790081 -.1008567 X | .0048146 .0095859 0.50 0.616 -.0140198 .023649 D_times_X | .0717573 .0398264 1.80 0.072 -.0064935 .1500082
So, my question is: Can I interpret a positive and significant coefficient b3 in the regression model for group A as an indication for higher trading volume if X_t is higher on days with D_t = 1? Or can this result be simply spurious due to the relation between VOL_t and D_t for group A?
If that is the case: How could I do better regarding the design of the regression model?
Thanks already for any input!
Comment