I am trying to make a point that when we use big data, traditional pricing variables such as credit score matters less. So I run two separate regressions, one where big data is used, and one without. I expect the R2 to be smaller for the big data case because the traditional variables explain less my outcome variable (let's say, your interest rate on a loan). But strangely i get an R2 that is larger, BUT the weird thing is, when I run the regressions and plot residuals by (predict res, residuals), the big data regression has a higher standard deviation. How is this possible? wouldn't a larger R2 lead to a lower std of residuals? am i missing something here?
-
Login or Register
- Log in with
Comment