Odd bias in lpoly regression line

Leah Bevis

Join Date: Oct 2015
Posts: 125

Odd bias in lpoly regression line

29 Aug 2017, 08:32

While toying with a simulation, I noticed that lpoly was systematically failing to recognize the true linear relationship between two variables. This is visually obvious when the lpoly line is compared to a scatterplot or to the lowess line, as I demonstrate in a simpler example below. The problem must stem from the lpoly default settings, as the "localp" command, which I understand to be a particular set-up of lpoly, does not suffer from the same problem.

Can anyone explain which default settings within lpoly are causing this systematic bias? I want to be sure that I avoid this problem if using lpoly in future. Thank you!

Code:

clear
drop _all
set obs 10000
set seed 12345

gen larea = rnormal(-1.5, 1.2)       /* original variable */
gen error = rnormal(0, .4)           /* error to be added around larea */

gen larea_me_big = -.3 + .8*larea + error /* larea is tilted, and noise is added */
gen larea_me_sm = -.15 + .9*larea + error /* smaller tilt, same noise */
gen larea_me_rnd = larea + error          /* only noise is added, no tilt */

/* In each of the plots below, lpoly is missing the correct and obvious linear
relationship between the noisy variable and the original variable, larea */
two (scatter larea_me_big larea, mcolor(gray*.5) msize(tiny)) ///
    (lowess larea_me_big larea, lcolor(blue)) ///
    (lpoly larea_me_big larea, lcolor(green)) ///
    (line larea larea, lcolor(black)), ///
    legend(order(1 "data" 2 "lowess fit" 3 "lpoly fit" 4 "45 degrees"))

two (scatter larea_me_sm larea, mcolor(gray*.5) msize(tiny)) ///
    (lowess larea_me_sm larea, lcolor(blue)) ///
    (lpoly larea_me_sm larea, lcolor(green)) ///
    (line larea larea, lcolor(black)), ///
    legend(order(1 "data" 2 "lowess fit" 3 "lpoly fit" 4 "45 degrees"))
    
two (scatter larea_me_rnd larea, mcolor(gray*.5) msize(tiny)) ///
    (lowess larea_me_rnd larea, lcolor(blue)) ///
    (lpoly larea_me_rnd larea, lcolor(green)) ///
    (line larea larea, lcolor(black)), ///
    legend(order(1 "data" 2 "lowess fit" 3 "lpoly fit" 4 "45 degrees"))
    
/* yet localp DOES recognize the correct linear relationship */
localp larea_me_rnd larea

Tags: lpoly lowess localp

Nick Cox

Join Date: Mar 2014

Posts: 35432
#2

29 Aug 2017, 08:53

Good news: Nothing odd here, and all is documented. lpoly has default degree 0; it's not even trying to fit linear trends locally. In contrast localp (SSC, as you are asked to explain) has default degree 1 and it really is.

FWIW, I sometimes think the lpoly defaults were chosen to oblige users to think about what they want (a feature, really) as they often don't produce "nice" results for me. Hence localp -- although that is not especially smart. It just formalizes some of the author's experiences.
Comment
Leah Bevis

Join Date: Oct 2015

Posts: 125
#3

29 Aug 2017, 14:50

Thanks so much --- I'm embarrassed that I didn't understand the default properly. I do know that degree zero gives "local mean smoothing." I understood this, clearly incorrectly, to be the local (kernel-smoothed) mean of y over x. But this is what deg(1) seems to give. (I've now tried both.)

Can you possibly point me towards a resource that explains minimization under deg(0)? I understand the lpoly minimization the problem to be something along the lines of

Sum_i^N K([x_i - x_o]/h) * [y_i - alpha - beta(x_i - x_0)^d]^2

So I see that d=1 gives local OLS, and d=0 would minimize

Sum_i^N K([x_i - x_o]/h) * [y_i - alpha - beta]^2

But I don't see why such a minimization would result in the over-estimation/under-estimation in the plot below...

Code:

clear drop _all set obs 10000 set seed 12345 gen larea = rnormal(-1.5, 1.2) /* original variable */ gen error = rnormal(0, .4) /* error to be added around larea */ gen larea_error= larea + error /* add noise */ gen larea_bins = round(larea, .1) /* bins for x-variable */ bysort larea_bins: egen larea_error_mns=mean(larea_error) two (scatter larea_error larea, msize(small) mcolor(eltblue)) /// (scatter larea larea, msize(small) mcolor(gray)) /// (lpoly larea_error larea, deg(0) lcolor(orange)) /// (lpoly larea_error larea, deg(1) lcolor(pink)) /// (scatter larea_error_mns larea, msize(tiny) mcolor(yellow)), /// legend(order(1 "variable with error" 2 "original variable" /// 3 "lpoly deg 0" 4 "lpoly deg 1" /// 5 "variable with error meaned by bins"))

Apologies for the follow-on, and thanks for your time!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#4

29 Aug 2017, 15:09

No beta term in the second equation, I think. See [R] lpoly for a formal statement.
Comment
Leah Bevis

Join Date: Oct 2015

Posts: 125
#5

16 Apr 2018, 21:09

Thanks, Nick. I'm sorry to obnoxiously add to this chain after such a long pause... but I'm still unclear on what is happening under degree(0). The minimizing equation looks like it should be taking the (kernel) mean of y within the bandwidth. And indeed, the help for lpolyci says the following: " The default is degree(0), meaning local-mean smoothing." However, in the example that I posted directly above, you can see that lpoly deg(0) is NOT capturing the variable mean. Not in the least; the line produced actually falls above all points on the LHS of the graph, and and below all points on the RHS of the graph. Can you explain what is going on here?

Last edited by Leah Bevis; 16 Apr 2018, 21:12.
Comment

Announcement

Odd bias in lpoly regression line

Comment

Comment

Comment

Comment