A variable becomes insignificant when I add new variables

Serena Menny

Join Date: Nov 2022

Posts: 60
#1

A variable becomes insignificant when I add new variables

28 May 2024, 10:33

Hello,

I ran a probit regression with 8 predictor variables, which I divided into 4 models. The first model contains 2 predictor variables, x1 and x2. The second model adds a new variable, x3. The third model adds three variables: x4, x5, and x6. Finally, the last model adds 2 more variables, x7 and x8, thus the final model includes all 8 variables.

When I add 𝑥4, x5, and x6, x3 loses some significance, with its p-value increasing from 0.000 to 0.062. Then, in the final model, when I add x7 and x8, x3 loses its significance completely, with a p-value of 0.243.

All VIFs are below 2, indicating there is no high correlation between the variables. I am trying to understand how to explain these changes.

I'd really appreciate your help.
Thank you,
Tags: None
Maxence Morlet

Join Date: Mar 2021

Posts: 634
#2

28 May 2024, 11:20

You're addressing omitted variable bias and removing correlation between included regressors and the error term, is my guess. Turns out x3 to x8 may be confounders in your model.
1 like
Comment
Serena Menny

Join Date: Nov 2022

Posts: 60
#3

28 May 2024, 11:37

Originally posted by Maxence Morlet View Post

You're addressing omitted variable bias and removing correlation between included regressors and the error term, is my guess. Turns out x3 to x8 may be confounders in your model.

Thank you for your answer.

How can I address that, if I may ask? I mean, what are the solutions when I have confounders?
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 634
#4

28 May 2024, 11:53

Precisely what you've been doing; include them in your model to reduce confounding variation. Explanatory variables do not have to be significant. Perhaps your explanatory variables of interest do not have a causal effect on your outcome, that is a possibility.
1 like
Comment
Serena Menny

Join Date: Nov 2022

Posts: 60
#5

28 May 2024, 12:07

Originally posted by Maxence Morlet View Post

Precisely what you've been doing; include them in your model to reduce confounding variation. Explanatory variables do not have to be significant. Perhaps your explanatory variables of interest do not have a causal effect on your outcome, that is a possibility.

Okay. Thank you for your help. It is much appreciated!
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 398
#6

28 May 2024, 13:21

You may also consider using Stata's margins command to explore how your key explanatory variable's association with the outcome differs at various levels of other predictors. This can help you to understand the model results and better explain what might be going on. If you have not read it previously, I always suggest you read Richard Williams excellent Stata Journal article on margins.
Comment
Serena Menny

Join Date: Nov 2022

Posts: 60
#7

29 May 2024, 07:11

Originally posted by Erik Ruzek View Post

You may also consider using Stata's margins command to explore how your key explanatory variable's association with the outcome differs at various levels of other predictors. This can help you to understand the model results and better explain what might be going on. If you have not read it previously, I always suggest you read Richard Williams excellent Stata Journal article on margins.

Thank you for the reference!
Indeed, I've been using margins to interpret the average marginal effects, but I'm still encountering some issues. I'll look into the document
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4889
#8

29 May 2024, 12:08

I am a huge fan of this paper by Trent Mize et al:

https://journals.sagepub.com/doi/ful...81175019852763

it has a lot of citations now and I bet it would get zillions more if they publicly released easy to use ado files. The code can be found on Mize’s personal web site.

Abstract

Many research questions involve comparing predictions or effects across multiple models. For example, it may be of interest whether an independent variable’s effect changes after adding variables to a model. Or, it could be important to compare a variable’s effect on different outcomes or across different types of models. When doing this, marginal effects are a useful method for quantifying effects because they are in the natural metric of the dependent variable and they avoid identification problems when comparing regression coefficients across logit and probit models. Despite advances that make it possible to compute marginal effects for almost any model, there is no general method for comparing these effects across models. In this article, the authors provide a general framework for comparing predictions and marginal effects across models using seemingly unrelated estimation to combine estimates from multiple models, which allows tests of the equality of predictions and effects across models. The authors illustrate their method to compare nested models, to compare effects on different dependent or independent variables, to compare results from different samples or groups within one sample, and to assess results from different types of models.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 18.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4889
#9

29 May 2024, 12:32

One other point I will add. When you say the effect of a variable is statistically insignificant that actually means its DIRECT effect is insignificant. That does not mean it has NO causal effect. It could still have an INDIRECT causal effect.

For example, suppose that race affects education and education affects income, but there is no direct effect of race on income. This would mean that there are racial differences in income (hence race and income are correlated). But, these are due to the fact that race affects education which in turn affects income.

Put another way, education is the MECHANISM by which race affects income. In this case, you’d be making a huge mistake if you argued race had nothing to do with income.

You may also want to check out this recent article of mine:

https://www.sciencedirect.com/scienc...49089X22001132
Abstract

Social scientists are often interested in seeing how the estimated effects of variables change once other variables are controlled for. For example, a simple analysis may reveal that income differs by race – but why does it differ? To answer such a question, a researcher might estimate a model where race is the only independent variable, and then add variables such as education to subsequent models. If the original estimated effect of race declines, this may be because race affects education, which in turn affects income. What is not universally realized is that the interpretation of such nested models can be problematic when logit or probit techniques are employed with binary dependent variables. Naïve comparisons of coefficients between models can indicate differences where none exist, hide differences that do exist, and even show differences in the opposite direction of what actually exists. We discuss why problems occur and illustrate their potential consequences. Proposed solutions, such as Linear Probability Models, Y-standardization, the Karlson/Holm/Breen method, and marginal effects, are explained and evaluated.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 18.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment
Serena Menny

Join Date: Nov 2022

Posts: 60
#10

30 May 2024, 08:01

Originally posted by Richard Williams View Post

One other point I will add. When you say the effect of a variable is statistically insignificant that actually means its DIRECT effect is insignificant. That does not mean it has NO causal effect. It could still have an INDIRECT causal effect.

For example, suppose that race affects education and education affects income, but there is no direct effect of race on income. This would mean that there are racial differences in income (hence race and income are correlated). But, these are due to the fact that race affects education which in turn affects income.

Put another way, education is the MECHANISM by which race affects income. In this case, you’d be making a huge mistake if you argued race had nothing to do with income.

You may also want to check out this recent article of mine:

https://www.sciencedirect.com/scienc...49089X22001132
Abstract

Social scientists are often interested in seeing how the estimated effects of variables change once other variables are controlled for. For example, a simple analysis may reveal that income differs by race – but why does it differ? To answer such a question, a researcher might estimate a model where race is the only independent variable, and then add variables such as education to subsequent models. If the original estimated effect of race declines, this may be because race affects education, which in turn affects income. What is not universally realized is that the interpretation of such nested models can be problematic when logit or probit techniques are employed with binary dependent variables. Naïve comparisons of coefficients between models can indicate differences where none exist, hide differences that do exist, and even show differences in the opposite direction of what actually exists. We discuss why problems occur and illustrate their potential consequences. Proposed solutions, such as Linear Probability Models, Y-standardization, the Karlson/Holm/Breen method, and marginal effects, are explained and evaluated.

Thank you so much for the references and your answer! I will check for the indirect effect
Comment

Announcement

A variable becomes insignificant when I add new variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment