How to deal with non-linear relationships in linear models

Evelyn Mare

Join Date: Sep 2017

Posts: 26
#1

How to deal with non-linear relationships in linear models

28 Jan 2019, 08:40

Dear all,

I estimated a linear regression model on multiply imputed data. I regress income on a range of explanatory factors, amongst which a competence indicator. I assume that the effect of this competence differs between men and women so I included an interaction effect. My basic model syntax is

Code:

mi est, post: reg income i.sex##c.competences controls, vce(robust)

Now I assume that competences per se have a non-linear relationship with income (the difference of high and very high competences for income should not be as big). To examine that I estimated an interaction effect with the competence itself with:

Code:

mi est, post: reg income i.sex c.competences##c.competences controls, vce(robust)

And this interaction is significant, so it should have a non-linear effect.

I am now a bit unsure how to deal with this relationship. Thus far I have come across two different approaches

1.) Include this interaction along side my main interaction of interest in the model

Code:

mi est, post: reg income i.sex##c.competences c.competences##c.competences controls, vce(robust)

2.) Explicitly model the non-linear relationship. Here I came across the command

Code:

nl

but I am not sure what to do with my main interaction of interest (gender x competences) then

Does anybody have a good tipp here? I would be really grateful.

Cheery, Evelyn
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3824
#2

28 Jan 2019, 08:50

I think your description implies the following model

Code:

regress income i.sex##c.competences##c.competences controls

that is, a three-way-interaction (including all lower-order terms).

Note that you must account for that (or any other tested) interaction during the imputation process, too; otherwise, those non-linearities are biased against zero.

Edit:

I do not believe that you need nl. The latter is actually for models that are non-linear in parameters. Linear regression is linear in parameters, not necessarily in predictors.

Best
Daniel

Last edited by daniel klein; 28 Jan 2019, 08:55.
2 likes
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

28 Jan 2019, 09:07

I gather there is an alternative to the problem, instead of adding interaction terms. Please read this text.

Best regards,

Marcos
Comment
Evelyn Mare

Join Date: Sep 2017

Posts: 26
#4

28 Jan 2019, 10:16

Dear Daniel & Marcos,

thank you so much for your help!

Two questions, if I may:

Thank you Daniel ! This would then imply that the non-linearities differ between men and women right? I tested that with unimputed data and the three-way interaction is not significant.However, in the three-way modell all interactions loose their significance, even the two-way conditional main effect of c.competences##c.competences. I assume that this might also be due to sample size, I "only" have 979 cases with complete information (1,500 after imuptation in total). So I am not sure if the insignificance is due to "low power", or if it really isn't there and that "solves" my problem of non-linearity (even though I think that this shouldn't be the case)?

Thank you, too @ Marcos! If I gathered that correctly then this would imply the "just another variable" approach. So if I stick with the three-way interaction, this would mean that I do this for all of the interactions terms (one three-way variable and two two way variables for the conditional main effects), right?

Thank you so much,

Evelyn
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#5

28 Jan 2019, 11:56

Originally posted by Evelyn Mare View Post

However, in the three-way modell all interactions loose their significance, even the two-way conditional main effect of c.competences##c.competences.

In such models, you are typically only interested in the significance test of the highest order term. Power might well be an issue here. You do not say a lot about your research questions. It sounds, however, as if you are testing a theory here. If there is a theory, that theory should guide you. I would also try and visualize the (estimated) relationships.

Originally posted by Evelyn Mare View Post

So I am not sure if the insignificance is due to "low power", or if it really isn't there

If by "it" you are referring to the non-linear relationship, then, from a statistical point of view, the latter cannot be shown. Remember: absence of evidence is not evidence of absence, meaning that just because you cannot reject the null does by no means make the alternative "true". Again, if your theory states that there is a non-linear relationship, you might want to keep it in the model, anyway.

Originally posted by Evelyn Mare View Post

So if I stick with the three-way interaction, this would mean that I do this for all of the interactions terms (one three-way variable and two two way variables for the conditional main effects), right?

In general, yes. However, if your sex variable has no missing values (as is often the case), you should try to separate your imputation models (and perhaps the analysis models) by sex. That will allow all coefficients to vary between the sexes. Note that if you go with the just-another-variable approach, visualizing the relationships with imputed data will be hard. You might want to see if the (in my view simpler) passive approach really produces results that make a difference.

Best
Daniel

Last edited by daniel klein; 28 Jan 2019, 12:01.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

29 Jan 2019, 12:55

Thank you, too @ Marcos! If I gathered that correctly then this would imply the "just another variable" approach. So if I stick with the three-way interaction, this would mean that I do this for all of the interactions terms (one three-way variable and two two way variables for the conditional main effects), right?

Daniel gave an insightful reply to this question. As a matter of fact, as Daniel underlined, the article I shared points directly to the "by" strategy. Hopefully that helps.

Best regards,

Marcos
Comment
Evelyn Mare

Join Date: Sep 2017

Posts: 26
#7

30 Jan 2019, 06:19

Thank you both!

Dear Daniel, thank you! That is exactly what I am struggeling with at the moment. From my theoretical perspective, I "purely" assumed that the measured competence would 'pay off' differently for men than for women --> a two way interaction which the models and plots indicated. I've done this using linear regression models and not accounting for any non-linearities. Then I realized that the competence per se might not have a linear effect on income, which I examined by estimating the two-way interaction between competence and competence and this was significant as well. In the three-way interaction model all interactions loose their significance and I am not sure how to interpret that or what to do with that finding.

Do you think it is plausible if I take from this that...:
- pay offs differ between men and women (theoretical argument, two-way interaction between competences x sex)
- while the effect of competences per se is non-linear (methodological argument, two-way interaction between competences x competences)
- this non-linearity does not seem to differ between men and women (insignificant three-way interaction?)

Or would you take something else from this or specify the models entirely different? I am really sorry to come to you with this again.

Thank you Marcos, I will try the by option.

Thank you both,

Evelyn mare
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#8

30 Jan 2019, 07:22

Originally posted by Evelyn Mare View Post

Do you think it is plausible if I take from this that...:
- pay offs differ between men and women (theoretical argument, two-way interaction between competences x sex)
- while the effect of competences per se is non-linear (methodological argument, two-way interaction between competences x competences)
- this non-linearity does not seem to differ between men and women (insignificant three-way interaction?)

Probably not. I guess you either have an (linear) interaction effect or you have a non-linear main effect. I do not think you can have both without the three-way interaction. Note that if the main effects are not linear, then you might observe a spurious interaction effect. Patrick Royston and Willi Sauerbrei had a presentation on this at the German Stata Users Group Meeting a couple of years ago. To be honest, I never really got around to get deeper into this. Scrolling through the presentation again, I ask myself how we can know what is "true": the non-linear relationship or the interaction effect? That is, if a true non-linear main effect can produce spurious interactions, is the reverse also possible? Can a true interaction effect produce spurious non-linearity?

Anyway, I think I would definitely want to visualize those relationships to get a better feeling for what is going on.

Best
Daniel
Comment
Evelyn Mare

Join Date: Sep 2017

Posts: 26
#9

04 Feb 2019, 08:31

Dear Daniel,

thank you so much, that was super helpful and I really liked the slides!!

I spent some time trying to assess what I kind of relationship I would theoretically expect and I googled around a bit to see if there is a way to determine the "true" nature of the relationship. I also plotted the data. I am still a bit unsure/confused about what might be going on and I thought I would post again in case you are willing/have time to help me shed some light in the dark (if not that's fine, too)

-from a theoretical point:
in prior literature I didn't find any argument why the effect of the competence on income should be non-linear and in the studies that I know of (or have acutely in mind) it is always modelled in a linear way. However, I find it intuitively feasible that the competence should have logarithmic effect because competence differences in people with high competences shouldn't make them that much more productive in contrast to competence differences between people with low & medium competences. I also found one illustration where an author drew a logarithmic function for this competence, but didn't explain or elaborate on this shape and gave arguments for the effectiveness of higher competences.

To add to the problem, from a theoretical point I am also not sure if I would expect the same non-linearity for men and women. My general argument is that women might get less income if competences are low but eventually might reach the income of men. About the shape of this slope I am unsure and could imagine multiple scenarios, linear relationships that just start at different points (lower starting point and then steeper slope, still linear) , but also linear for men and curvilinear for women or curvilinear for both?

-examining the relationship empirically:
here I didn't make as much progress. I just found the helpful slides from Richard Williams (slides). And followed his reasoning on 'incrementel F tests to test wether poynomial terms belong in the model'.

I then (somewhat mindlessly) followed his instructions with

HTML Code:

gen x2=competences^2 gen x3=competences^3 gen x4=competences^4 nestreg: reg income competence (x2 x3 x4)

And from how I read the outcome, polynomial terms increase the modelfit only minimally

Code:

Block Residual Change

Block F df df Pr > F R2 in R2

1 1249.47 1 5621 0.0000 0.1819

2 44.52 3 5618 0.0000 0.2009 0.0190

- plots
I am also a bit unsure by the plots.
The general plot seems a bit inconclusive (at least for me), here I just scattered the competence over income

When I do this separately for gender, I could imagine a linear relationship if I squint really hard?? I also tried the curvefit command and -at least compared to the logarithmic function - the linear line seems to fit the distribution better, even though here, too I find the distribution in itself a bit inconclusive.

Code:

curvefit income sex if sex==1, f(1 2) curvefit income sex if sex==2, f(1 2)

Men:

Women:

So Im confused/unsure what to do next. I am sorry to spam you with this...

Thank you,

Evelyn

Last edited by Evelyn Mare; 04 Feb 2019, 08:37.
Comment

	Block	Residual		Change
Block	F	df	df	Pr > F	R2	in R2

1	1249.47	1	5621	0.0000	0.1819
2	44.52	3	5618	0.0000	0.2009	0.0190

Announcement

How to deal with non-linear relationships in linear models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment