I'm using Stata 17 and, based on a post about testing for serial correlation in panel data after differencing, I think I've discovered an important bug in using the differencing operator. First, I know that factor notation is not allowed with differencing. Can someone from Stata explain why? There is no reason to exclude that, and I suspect this is partly the source of misunderstanding that some people have about whether it is okay to difference dummy variables in an equation. (Answer: Yes, because differencing with panel data is often done for estimating an equation that starts in levels.) It would be a big improvement in Stata 18 to simply differencing anything that appears in D.(), whether it is an interaction of continuous variables, discrete variables, or combinations. And something like i.year should be allowed, too.
But not allowing factor notation is not the same as a bug. A real bug is that Stata drops interaction terms among continuous variables when using differencing if one of the variables doesn't change across time. Here's my Stata output, using airfare.dta that comes with my MIT Press book:
Note that fixed effects has no trouble with c.concen#c.ldist_dm but differencing drops this term. The mistake stems from redefining the difference of the interaction to the interaction of the differencing. So what should appear is the interaction between D.concen and ldist_dm, but Stata changes it to cD.concen#cD.ldist_dm. Why is Stata doing this? The variable ldist_dm doesn't change across time but concen does and so I can easily include their interaction in the levels equation. I know how to fix this by using D.() differently, but it shouldn't need "fixing" because there's nothing wrong with the differencing command that I did use. Stata should not be changing my model. For the same reason, Stata should allow things like i.x1#c.x2 and simply difference this term without differencing each term and then forming the interaction. It shouldn't matter whether one of x1 and x2 changes across time.
But not allowing factor notation is not the same as a bug. A real bug is that Stata drops interaction terms among continuous variables when using differencing if one of the variables doesn't change across time. Here's my Stata output, using airfare.dta that comes with my MIT Press book:
Code:
. sum ldist Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- ldist | 4,596 6.696482 .6593177 4.553877 7.909857 . gen ldist_dm = ldist - r(mean) . xtreg lfare concen c.concen#c.ldist_dm y98 y99 y00, fe vce(cluster id) Fixed-effects (within) regression Number of obs = 4,596 Group variable: id Number of groups = 1,149 R-squared: Obs per group: Within = 0.1429 min = 4 Between = 0.3048 avg = 4.0 Overall = 0.2411 max = 4 F(5,1148) = 104.09 corr(u_i, Xb) = -0.6841 Prob > F = 0.0000 (Std. err. adjusted for 1,149 clusters in id) ------------------------------------------------------------------------------------- | Robust lfare | Coefficient std. err. t P>|t| [95% conf. interval] --------------------+---------------------------------------------------------------- concen | .1661329 .0484029 3.43 0.001 .0711647 .261101 | c.concen#c.ldist_dm | -.2498619 .0828545 -3.02 0.003 -.4124252 -.0872987 | y98 | .0230874 .0041459 5.57 0.000 .014953 .0312218 y99 | .0355923 .0051452 6.92 0.000 .0254972 .0456874 y00 | .0975745 .0054655 17.85 0.000 .0868511 .1082979 _cons | 4.93797 .0317998 155.28 0.000 4.875578 5.000362 --------------------+---------------------------------------------------------------- sigma_u | .50598297 sigma_e | .10605257 rho | .95791776 (fraction of variance due to u_i) ------------------------------------------------------------------------------------- . reg D.(lfare concen c.concen#c.ldist_dm y98 y99 y00), nocons vce(cluster id) note: cD.concen#cD.ldist_dm omitted because of collinearity. Linear regression Number of obs = 3,447 F(4, 1148) = 118.18 Prob > F = 0.0000 R-squared = 0.0952 Root MSE = .12508 (Std. err. adjusted for 1,149 clusters in id) --------------------------------------------------------------------------------------- | Robust D.lfare | Coefficient std. err. t P>|t| [95% conf. interval] ----------------------+---------------------------------------------------------------- concen | D1. | .1759764 .0430367 4.09 0.000 .0915371 .2604158 | cD.concen#cD.ldist_dm | 0 (omitted) | y98 | D1. | .0227692 .0041573 5.48 0.000 .0146124 .030926 | y99 | D1. | .0364365 .005153 7.07 0.000 .026326 .0465469 | y00 | D1. | .0978497 .0055468 17.64 0.000 .0869666 .1087328 ---------------------------------------------------------------------------------------
Comment