Regression gives different results depending on the order of the independent variables

Ryan Sandler

Join Date: May 2014

Posts: 28
#1

Regression gives different results depending on the order of the independent variables

30 Apr 2019, 07:47

I am running what is essentially a difference in differences regression on a large dataset with a lot of fixed effects. Bizarrely (maddeningly, even), I get a slightly different coefficient on my main treatment effect depending on the order I provide the list of independent variables. The regression has individual level data, with county fixed effects, month fixed effects, and state-specific trends. The variable "treatment" is equal to 1 if a state-level policy has gone into effect in the person's state as of the current month. There is a separate set of trends for New York City, as NYC implemented its own policy.

Here's my code and output

Code:

. local conditions if month<=695 & state_cd <= 56 & birthyear_last!=. . qui xtreg success_14 treated i.month i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt sc RACE Hispanic EDUCATION Median_income i.bankrank birthyear_last `conditions' , fe vce(cluster state_cd) . disp _b[treated] .0064373 . qui xtreg success_14 treated sc RACE Hispanic EDUCATION Median_income i.bankrank i.month i.state#c.open_dt 1.NYC#c.open_dt i.state#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt i.state#c.open_dt#c.open_dt#c.open_dt 1.NYC#c.open_dt#c.open_dt#c.open_dt birthyear_last `conditions' , fe vce(cluster state_cd) . disp _b[treated] .00803661

I realize those are long regression commands, but if you look closely you'll see that they both have the same list of variables, just in a difference order. Both versions drop a few factor levels of i.month for collinearity, but they both drop the same ones. I get the same result using areg instead of xtreg. The regressions each take more than an hour to run, so trying different variations is cumbersome. The problem doesn't replicate if I use a random 0.5% subsample of my data. I'm running out of ideas here--anyone know what's going on? Really just want to know which version is more likely to be the "right" coefficient.

I'm running Stata/MP 15.1 on a Linux server with Red Hat 6.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35431
#2

30 Apr 2019, 07:51

My wild guess is that the local macro is being seen in one instance but not the other. Please confirm that the number of observations used is exactly the same.
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#3

30 Apr 2019, 07:56

Originally posted by Nick Cox View Post

My wild guess is that the local macro is being seen in one instance but not the other. Please confirm that the number of observations used is exactly the same.

I did check that, forgot to add that to the initial post. Same overall N, same result for

Code:

tab treated if e(sample)

as well.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#4

30 Apr 2019, 08:04

Hi Ryan,
Perhaps the problem is with what variables are being omitted first. My guess is that many of the coefficients have different results, but that overall the model has the same goodness of fit statistics (F stat and R2's).
So, are the same variables omitted for both models?
Fernando
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#5

30 Apr 2019, 08:20

Hi Fernando,

Your guess is partially correct--I was trying to reduce the number of things to look at in my question, but many of the coefficients are slightly different between the two versions. However, it does look like the goodness of fit measures are slightly different (hadn't paid much attention to that before). xtreg reports an R2 for within, between and overall, and while the within and overall R2 are the same, the version that gives me ~0.08 for the treatment effect has a between R2 of 0.0261, and the version that gives ~0.064 has a between R2 of 0.0356.

F stats are reported as missing in both--not sure if that's indicative of a problem, or just a sign that the time fixed effects and trends are collectively strong predictors.

Last edited by Ryan Sandler; 30 Apr 2019, 08:34.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#6

30 Apr 2019, 08:35

Before I can say more about your problem. Do you mind showing the results you get from the two regressions you describe before.
That way, will be somewhat easier to spot anything odd.
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#7

30 Apr 2019, 09:18

Sure, I've attached as a .tex file as the full output from each is quite long.
Attached Files

weird_results.txt (50.7 KB, 1 view)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2430
#8

30 Apr 2019, 09:33

Hi Ryan,
After looking at your results, two thoughts come to mind.
1. I think you are over-fitting your model. While i understand why you want to include so many dummies, and interactions, they may be more noise than signal in your model.
2. You have too few clusters. Actually, i think there is some debate on when to cluster or not cluster data. However, because you are clustering data, the number of clusters is what is used to obtain the number of degrees of freedom in your model. So, you have couple of hundred of explanatory variables, and only 50 clusters. That, wihout counting the fixed effects, which will also have an impact on your effective number of degrees of freedom. That is why the F-statistic is not showing up.
3. I saw that many of your variables have very small coefficients. Like -1e-10. That makes me think that what you are experiencing is a problem of precision. If you change ALL your variables to double precision, you may solve this problem.
HTH
Fernando
1 like
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#9

30 Apr 2019, 09:49

Hi Fernando,

Thanks for taking a look, but your diagnosis doesn't sound right to me. As I noted in the original post, the problem doesn't replicate when I use a subsample of my data, which doesn't fit with either overfitting or precision being at issue. Also, doesn't Stata automatically convert everything to double precision when it does internal calculations? Or is that just for collapse and similar commands?

On clusters, I'm pretty sure this is correct (and in any case, that would only affect standard errors). My treatment is assigned at the state level, and this is a case where even Abadie, Athey, Imbens and Wooldridge would say that clustering is correct and appropriate. Generally speaking 50 clusters is plenty to avoid getting too small SEs, per Bertrand et. al, and Cameron et al.
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#10

30 Apr 2019, 09:56

Ah, yes, on the precision point, see here, point 1.5 (https://blog.stata.com/2012/04/02/th...-to-precision/)
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#11

30 Apr 2019, 13:46

Do you get the same results if you run the exact same model multiple times? I have sometimes had issues where I get slightly different results rerunning a dofile with a complicated model multiple times. Usually this can be "solved" by putting in a sort command right before running the regression (the sort needs to be on a variable or combination of variables that uniquely identifies each observation). However, it's a sign of a model that is having computational issues. In that case neither is really the right coefficient and you're going to be better off figuring out what's causing the instability in your estimates.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#12

30 Apr 2019, 21:25

You can probably get some improvement by using double instead of float, as mentioned in #8 above, especially because you have quadratic terms in your interactions. For example, there are differences in the ANOVA table when inputing the small dataset posted in this Statalist thread recently just by switching from the default float to double for the response variable (output below, do-file attached). And that dataset has fewer than 100 observations, and that model has no continuous-variable quadratic interaction terms. (As an aside, for that reason, despite the official pronouncements on the precision needs for most data, I suspect that most Stata users would benefit from

Code:

set type double, permanently

immediately after installation of the software, and be done with it. I don't fault StataCorp for the design decisions it made in 1985 in consideration of the hardware constraints that prevailed then, but I do recommend that Stata 16 change the default data storage type to something more modern, if not eliminate float altogether.)

You could also get some benefit by rescaling your predictors so that they are all of order 1, or so.

But I think that your problem reflects underflow in the cumulations that are involved in computation over the nearly four million observations, and arises from the pathological model that you're trying to fit to the dataset. You have a model that warns corr(u_i, Xb) = -1.0000 and has a rho of 1.0000. You have a model that has more than 100 regression coefficients in the neighborhood of 10^-6 to 10^-12, nearly all of which are highly "statistically significant". I'm not sure whether this reflects the scaling problem or is just noise, but I recommend that if you still get these kinds of magnitude after rescaling (or if your predictors are already scaled) you consider simplifying your model to get more down-to-earth diagnostics in the regression table's header and footer.

.ÿ
.ÿquietlyÿinputÿstr8ÿidÿstr19ÿdateÿbyte(siteÿdayÿrun)ÿfloatÿt3

.ÿ
.ÿanovaÿt3ÿsiteÿ/ÿday|siteÿ/ÿrun|day|site

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ96ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿ0.7390
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿÿ.035074ÿÿÿÿAdjÿR-squaredÿ=ÿÿ0.4835

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSourceÿ|ÿPartialÿSSÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿÿMSÿÿÿÿÿÿÿÿFÿÿÿÿProb>F
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿModelÿ|ÿÿ.16721602ÿÿÿÿÿÿÿÿÿ47ÿÿÿ.00355779ÿÿÿÿÿÿ2.89ÿÿ0.0002
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿsiteÿ|ÿÿ.06881893ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿ.03440947ÿÿÿÿÿÿ8.99ÿÿ0.0015
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿday|siteÿ|ÿÿ.08037207ÿÿÿÿÿÿÿÿÿ21ÿÿÿ.00382724ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿday|siteÿ|ÿÿ.08037207ÿÿÿÿÿÿÿÿÿ21ÿÿÿ.00382724ÿÿÿÿÿÿ5.10ÿÿ0.0001
ÿÿÿÿÿÿÿÿÿÿÿÿrun|day|siteÿ|ÿÿ.01802502ÿÿÿÿÿÿÿÿÿ24ÿÿÿ.00075104ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿ|ÿÿ.05905011ÿÿÿÿÿÿÿÿÿ48ÿÿÿ.00123021ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿTotalÿ|ÿÿ.22626613ÿÿÿÿÿÿÿÿÿ95ÿÿÿ.00238175ÿÿ

.ÿ
.ÿdropÿ_all

.ÿquietlyÿinputÿstr8ÿidÿstr19ÿdateÿbyte(siteÿdayÿrun)ÿdoubleÿt3

.ÿ
.ÿanovaÿt3ÿsiteÿ/ÿday|siteÿ/ÿrun|day|site

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ96ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿ0.7390
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿÿ.035074ÿÿÿÿAdjÿR-squaredÿ=ÿÿ0.4835

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSourceÿ|ÿPartialÿSSÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿÿMSÿÿÿÿÿÿÿÿFÿÿÿÿProb>F
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿModelÿ|ÿÿ.16721563ÿÿÿÿÿÿÿÿÿ47ÿÿÿ.00355778ÿÿÿÿÿÿ2.89ÿÿ0.0002
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿsiteÿ|ÿÿ.06881875ÿÿÿÿÿÿÿÿÿÿ2ÿÿÿ.03440938ÿÿÿÿÿÿ8.99ÿÿ0.0015
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿday|siteÿ|ÿÿ.08037188ÿÿÿÿÿÿÿÿÿ21ÿÿÿ.00382723ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿday|siteÿ|ÿÿ.08037188ÿÿÿÿÿÿÿÿÿ21ÿÿÿ.00382723ÿÿÿÿÿÿ5.10ÿÿ0.0001
ÿÿÿÿÿÿÿÿÿÿÿÿrun|day|siteÿ|ÿÿÿÿ.018025ÿÿÿÿÿÿÿÿÿ24ÿÿÿ.00075104ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿ|ÿÿÿÿÿ.05905ÿÿÿÿÿÿÿÿÿ48ÿÿÿ.00123021ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿ-------------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿTotalÿ|ÿÿ.22626563ÿÿÿÿÿÿÿÿÿ95ÿÿÿ.00238174ÿÿ

.
Attached Files

single_double.do (13.1 KB, 1 view)
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#13

30 Apr 2019, 21:51

One more thing to consider: you might want to center those variables that are involved in the quadratic interaction terms. With a long series such as you seem to have, you could be running into near collinearity between the main effect and the corresponding quadratic term, which will degrade precision in the intermediary computations as the condition number worsens.
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#14

01 May 2019, 06:44

Sarah, Joseph, thanks for your thoughts. I think there's something to both of your suggestions--some new facts from things I left running overnight:
The coefficients from xtreg do change very slightly when the exact same model is run more than once in a row. The difference is much less than the difference between the two variable orders (the differences are on the 5th significant digit, rather than the 1st), but clearly there's some degree of instability here. Not clear whether this is precision per se--the most likely culprit, open_dt, is stored as an int, not a float (it's a date). Worth trying, perhaps, but see the next point.

I tried running the regressions using the user-written reghdfe, and also saved my data and ported it over to R and ran the regression using the felm() function from the lfe package. Both variable orders in both programs give a coefficient of .00883879, with no variation in multiple runs of reghdfe (at least for as many digits as stata displayed).

So, it seems like a coefficient closer to 0.008 is more "right" here (and, perhaps, exactly 0.0088 is "right"). I'd still like to get a better idea of what's causing this. The fact that reghdfe doesn't have consistency problems is hard to square with a precision or underflow problem (of course, I may also have a precision and underflow problem). The issue with just using the reghdfe command is that it won't allow more variables than clusters, so I either need to figure out a way to get the same coefficient from xtreg/areg, or convince myself that clustering by, say, county is adequate (pretty sure not, but...).
Comment
Ryan Sandler

Join Date: May 2014

Posts: 28
#15

01 May 2019, 10:07

One more tidbit. With the insight that something seems to be unstable, possibly due to colinearity, I tried dropping the NYC specific trend, on the theory that that was potentially overkill (since there's already a NY state trend)--and now the treatment coefficient is consistent regardless of the variable order using xtreg. reghdfe gives a very slightly different treatment coefficient, but they are identical out to 5 significant digits and I'm comfortable chalking that up to differences in the underlying algorithms.

I can only guess that this issue has something to do with the internal test for multicolinearity, but I remain puzzled.
Comment

Announcement

Regression gives different results depending on the order of the independent variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment