Dear Statalisters;
I am trying to replicate a study examining the effects of online review sentiments on product sales. I opted to employ the Arellano-Bond estimator due to the existence of the autoregressive term in my model, however, I have faced some complications regarding which I appreciate if anyone can help.
Thanks for your time,
Mohammadreza
I am trying to replicate a study examining the effects of online review sentiments on product sales. I opted to employ the Arellano-Bond estimator due to the existence of the autoregressive term in my model, however, I have faced some complications regarding which I appreciate if anyone can help.
- Consulting "How to do xtabond2", David Roodman recommends applying the estimator to "small T, large N" panels. In this case, I have a quite unbalanced panel. Originally, I have 3252 observations from 721 groups (each of the groups can have up to 24 observations). When I run the regression, the output displays that there are 919 observations from 179 groups and the average number of observations per group is 5.13. Thus, I am not sure if my case fits the mentioned criteria?
positive_reviews, negative_reviews, and neutral_reviews are the independent variables of interest and the others are controls. My aim is to consider a time lag for all the variables on the right-hand side of the model in order to see how the review characteristics can affect sales in the next period. Saying this, I am not sure about my specification. For instance, I know that l.ln_sales_rank, which is the lagged value of the dependent variable, should be considered inside gmm() option since it's a predetermined regressor due to its correlation with the error term at t-1. But I am not sure about the other regressors. Ignoring the time lag, many of them can be endogenous, e.g. average_rating since there might be some omitted variable (not controlled even through considering the fixed effects and time dummies) that can affect both the rating and the sales. Here, since the lagged regressor are being used I am not sure whether these variables should be regarded as the predetermined ones or it is safe to consider them as exogenous? I am also concerned regarding the number of instruments that my model already has, which could be even more if I consider more variables as gmmstyle. I also took into account using the collapse sub-option in order to reduce the number of instruments, but then I lose the significance of some of the regressors (num_reviews_week and average_review_length which are the only significant ones, as can be observed in the following table, would become statistically insignificant).Code:xi: xtabond2 ln_sales_rank l.(ln_sales_rank positive_reviews negative_reviews neutral_reviews price_week num_reviews_week average_review_length /// average_rating variance verified_purchase_reviewer_ratio real_name_reviewer_ratio) /// i.week, /// gmm(l.ln_sales_rank, lag(3 7)) /// iv(l.positive_reviews l.negative_reviews l.neutral_reviews l.price_week l.num_reviews_week l.average_rating l.variance l.average_review_length l.verified_purchase_reviewer_ratio l.real_name_reviewer_ratio i.week) twostep robust artests(3)
Running the regression, I get this output:
Code:Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 919 Time variable : week Number of groups = 179 Number of instruments = 143 Obs per group: min = 1 Wald chi2(33) = 33112.98 avg = 5.13 Prob > chi2 = 0.000 max = 23 -------------------------------------------------------------------------------------------------- | Corrected ln_sales_rank | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------------------------+---------------------------------------------------------------- ln_sales_rank | L1. | .8100065 .0779161 10.40 0.000 .6572938 .9627192 | positive_reviews | L1. | .0019072 .0017885 1.07 0.286 -.0015982 .0054127 | negative_reviews | L1. | -.0020933 .0053439 -0.39 0.695 -.0125671 .0083805 | neutral_reviews | L1. | -.003801 .007102 -0.54 0.593 -.0177206 .0101186 | price_week | L1. | -.0003462 .000278 -1.25 0.213 -.000891 .0001987 | num_reviews_week | L1. | -.000059 .0000237 -2.48 0.013 -.0001055 -.0000125 | average_review_length | L1. | -.0000695 .0000349 -1.99 0.046 -.0001379 -1.09e-06 | average_rating | L1. | -.0610533 .0324566 -1.88 0.060 -.124667 .0025604 | variance | L1. | .0061693 .0192085 0.32 0.748 -.0314786 .0438173 | verified_purchase_reviewer_ratio | L1. | -.1352071 .1256495 -1.08 0.282 -.3814757 .1110614 | real_name_reviewer_ratio | L1. | .0627493 .1463514 0.43 0.668 -.2240942 .3495928 | _Iweek_2 | .1604324 .1593392 1.01 0.314 -.1518666 .4727315 _Iweek_3 | -.2233768 .1548639 -1.44 0.149 -.5269045 .080151 _Iweek_4 | -.0554688 .1272416 -0.44 0.663 -.3048577 .1939201 _Iweek_5 | -.0671169 .1248162 -0.54 0.591 -.3117522 .1775183 _Iweek_6 | -.0353557 .1631954 -0.22 0.828 -.3552127 .2845014 _Iweek_7 | -.0212126 .1426943 -0.15 0.882 -.3008883 .2584632 _Iweek_8 | -.1200616 .1199549 -1.00 0.317 -.3551689 .1150457 _Iweek_10 | -.2037809 .1445179 -1.41 0.159 -.4870308 .0794689 _Iweek_11 | -.0786615 .1284169 -0.61 0.540 -.3303539 .173031 _Iweek_12 | -.1797863 .1432378 -1.26 0.209 -.4605272 .1009546 _Iweek_13 | -.1344445 .1179098 -1.14 0.254 -.3655435 .0966545 _Iweek_14 | -.2300001 .1329529 -1.73 0.084 -.4905831 .0305828 _Iweek_15 | -.2334663 .1317367 -1.77 0.076 -.4916656 .024733 _Iweek_16 | -.1267967 .1259974 -1.01 0.314 -.373747 .1201537 _Iweek_17 | -.0585595 .1324213 -0.44 0.658 -.3181004 .2009815 _Iweek_18 | -.0994895 .1148128 -0.87 0.386 -.3245184 .1255395 _Iweek_19 | -.093771 .1442427 -0.65 0.516 -.3764814 .1889394 _Iweek_20 | -.2194562 .128235 -1.71 0.087 -.4707921 .0318798 _Iweek_21 | -.0435126 .1226647 -0.35 0.723 -.283931 .1969059 _Iweek_22 | -.0810823 .1454678 -0.56 0.577 -.3661941 .2040294 _Iweek_23 | -.1068008 .1159472 -0.92 0.357 -.3340532 .1204516 _Iweek_24 | -.248748 .129242 -1.92 0.054 -.5020577 .0045616 _cons | 1.468944 .480055 3.06 0.002 .5280532 2.409834 -------------------------------------------------------------------------------------------------- Instruments for first differences equation Standard D.(L.positive_reviews L.negative_reviews L.neutral_reviews L.price_week L.num_reviews_week L.average_rating L.variance L.average_review_length L.verified_purchase_reviewer_ratio L.real_name_reviewer_ratio _Iweek_2 _Iweek_3 _Iweek_4 _Iweek_5 _Iweek_6 _Iweek_7 _Iweek_8 _Iweek_9 _Iweek_10 _Iweek_11 _Iweek_12 _Iweek_13 _Iweek_14 _Iweek_15 _Iweek_16 _Iweek_17 _Iweek_18 _Iweek_19 _Iweek_20 _Iweek_21 _Iweek_22 _Iweek_23 _Iweek_24) GMM-type (missing=0, separate instruments for each period unless collapsed) L(3/7).L.ln_sales_rank Instruments for levels equation Standard L.positive_reviews L.negative_reviews L.neutral_reviews L.price_week L.num_reviews_week L.average_rating L.variance L.average_review_length L.verified_purchase_reviewer_ratio L.real_name_reviewer_ratio _Iweek_2 _Iweek_3 _Iweek_4 _Iweek_5 _Iweek_6 _Iweek_7 _Iweek_8 _Iweek_9 _Iweek_10 _Iweek_11 _Iweek_12 _Iweek_13 _Iweek_14 _Iweek_15 _Iweek_16 _Iweek_17 _Iweek_18 _Iweek_19 _Iweek_20 _Iweek_21 _Iweek_22 _Iweek_23 _Iweek_24 _cons GMM-type (missing=0, separate instruments for each period unless collapsed) DL2.L.ln_sales_rank ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -3.65 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = 2.45 Pr > z = 0.014 Arellano-Bond test for AR(3) in first differences: z = -0.54 Pr > z = 0.589 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(109) = 115.52 Prob > chi2 = 0.316 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(109) = 71.92 Prob > chi2 = 0.998 (Robust, but weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(89) = 68.23 Prob > chi2 = 0.950 Difference (null H = exogenous): chi2(20) = 3.69 Prob > chi2 = 1.000 iv(L.positive_reviews L.negative_reviews L.neutral_reviews L.price_week L.num_reviews_week L.average_rating L.variance > L.average_review_length L.verified_purchase_reviewer_ratio L.real_name_reviewer_ratio _Iweek_2 _Iweek_3 _Iweek_4 _Iweek > _5 _Iweek_6 _Iweek_7 _Iweek_8 _Iweek_9 _Iweek_10 _Iweek_11 _Iweek_12 _Iweek_13 _Iweek_14 _Iweek_15 _Iweek_16 _Iweek_17 > _Iweek_18 _Iweek_19 _Iweek_20 _Iweek_21 _Iweek_22 _Iweek_23 _Iweek_24) Hansen test excluding group: chi2(77) = 38.13 Prob > chi2 = 1.000 Difference (null H = exogenous): chi2(32) = 33.79 Prob > chi2 = 0.381
- Also, I was wondering whether I am eligible to use the two-step GMM with this number of observations?
- My last question concerns using the xtdpd command. Since the null hypothesis is rejected for AR(2), would it be better to use this command instead of xtabond2 in terms of preserving the observations as xtdpd allows for the autocorrelation?
Thanks for your time,
Mohammadreza