I have a huge financial dataset (panel) from an orderbook. The dataset contains several variables (three) and there is a theory how these three variables should behave. For simplicity let me call the variables var1, var2 and var3. The theory says
but the theory is silent about what the constant is. Now, I performed two set of commands and the result is different and I do not understand both.
First. Since I have a theory I generated
and plotted the resulting variable (in fact I subtracted the mean but this is not important at the moment):

Now, this looks like a very good normal distribution and it still does if I increase the widths of the histogram. Nevertheless, a formal Kolomogoroff-Smirnoff test fails but this is due to the fact that I have about 16 mio observations.
Second. On the other hand, if I run a regression
I get into trouble. The result is
which yields a different relation than the one from the theory above. Performing the usual steps after regression shows that homoskedasticity does not seem to hold. Instead of a formal test I plotted the residuals again, this time as
and obtained this picture which is definitively not normal,

In my opinion my first approach is sufficient because I get a convincing result. My coauthor says that the literature follows the second path and will not accept my first approach. Is there anybody who understands what I am talking about and what is wrong here?
Code:
var1 - var2 + var3 = const
First. Since I have a theory I generated
Code:
gen res = var1 - var2 + var3
Now, this looks like a very good normal distribution and it still does if I increase the widths of the histogram. Nevertheless, a formal Kolomogoroff-Smirnoff test fails but this is due to the fact that I have about 16 mio observations.
Second. On the other hand, if I run a regression
Code:
regress var1 var2 var3
Code:
var1 | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- var2 | -.0483928 .0001229 -393.61 0.000 -.0486337 -.0481518 var3 | .5579093 .0002678 2083.23 0.000 .5573844 .5584342 _cons | -5.389484 .0013784 -3909.99 0.000 -5.392186 -5.386783
Code:
gen res2 = var1+0.0483928*var2-0.55790*var3
In my opinion my first approach is sufficient because I get a convincing result. My coauthor says that the literature follows the second path and will not accept my first approach. Is there anybody who understands what I am talking about and what is wrong here?
Comment