Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefficient changes sign after adding a new variable

    Hello,

    I ran a probit regression on my three independant variables separately and I found a positive coefficient for each variable (0.148***, 0.314*** and 0.204***). But then when I run a regression on the three variables together, the first variable changes its sign (-0.0831*,0.246***, and 0.155***). I tested for multicollinearity but I didn't find any strong correlation (the highest correlation equals 0.542), I also tested VIF and all values are less than 2.
    Does this still mean that I have a multicollinearity problem?

    I don't really think that the change of sign makes any conceptual sense. At least, I can't see it!


    To account for this, I aggregated the variables into one unique variable and I found a coefficient=0.353***. Can I do this?

    Thank you,

  • #2
    It does not require an extremely high correlation between any of the variables for this to happen, just a non-zero correlation. And, for that matter, a correlation of 0.542 is actually rather large. The whole point of adding covariates to a model is to correct for omitted variable bias. If that had no effect on the coefficients of the other variables, there would never be any reason to do it. Evidently there is enough correlation among these three variables that, when used in isolation, the first variable "takes credit" for part of the effects of the other two variables with which it shares variance.

    No, you do not have a multicollinearity problem. A multicollinearity problem refers to the situation where correlated independent variables end up having very wide confidence intervals because the data cannot effectively distinguish between their effects. While you do not show full output, the fact that all of your coefficients are carrying "significance stars," suggests that the confidence intervals are narrow enough for most purposes. What you have is simply a bona fide problem of omitted variable bias in the separate variable analysis. Alternatively, the separate variable analyses may be valid and the multi-variable model invalid if the first variable is a collider of the relationships between the second or third variables to the outcome variable. Since you don't say what these variables are, I cannot even guess which way it goes.

    I aggregated the variables into one unique variable and I found a coefficient=0.353***. Can I do this?
    Evidently you can; you just did. The question is whether this is a sensible, meaningful thing to do. That depends on what these variables are, how you went about combining them into one, and whether that process makes sense for those constructs. Can you assign a simple name to the aggregated variable that appropriately describes what it means, what it represents in the real world? That would be my first test.
    Last edited by Clyde Schechter; 11 Nov 2022, 17:16.

    Comment


    • #3
      In addition to Clyde Schechter 's excellent points I add that if you want help with interpretation you need to tell us what the variables represent. Even then it might be that only specialists can help out much with a story and, yet further, they may want a better idea of your variables' marginal and joint distributions.

      With proper names supplied, a plot of

      Code:
      graph matrix x1 x2 x3 y
      might help. As y is binary further graphs are often needed, such as smooths of y against each of the predictors (in your terms, independent variables).

      Member Jeff Wooldridge has a good section on multicollinearity in his introductory text, largely to the effect that it is usually a needless worry. In most observational set-ups, some moderate correlations between predictors are entirely expected and it is the job of a model to adjust for them.
      Last edited by Nick Cox; 11 Nov 2022, 19:03.

      Comment


      • #4
        My guess is that suppressor effects are present. Since we don’t know what the vars are we can’t say whether suppressor effects are theoretically plausible in this case.

        A less obvious possibility — are the cases being analyzed the same in each regression? If not, that could contribute to inconsistencies in results.

        there are ways to formally test whether you can add three variables together. I wouldn’t combine them without doing a formal test.

        Theoretically and substantively, it can be quite interesting if you have suppressor effects. On the surface it looks like the effect of a variable is positive, when in reality you see it is negative when proper controls are applied. Unless the sign flip is totally unreasonable, I’d probably find the sign flip exciting.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Clyde Schechter Nick Cox Richard Williams Thank you for replying!

          My first variable (the one that changes its sign) is called PERCEPTION, it is a sum of 5 variables. Each variable is measured with a 7-point Likert scale.PERCEPTION explains the participants' feelings towards environmental issues (whether they are sad about the ecological crisis, whether they feel that it is a serious matter,etc).

          My second variable is called BEHAVIOUR, it is a sum of 4 variables. Each variable is measured with a 7-point Likert scale. BEHAVIOUR explains if the participants actually do acts that protect the environment (recycling, efforts to protect natural ressources, etc)

          My third and last variable is called POLITICAL. It is a sum of 2 variables. Each variable is measured with a 7-point Likert scale. POLITICAL explains if the paricipants have a poitical orientation towards environmental issues (green vote, assuming that protecting the environment is more important than job creation)

          What I noticed is: it is by adding BEHAVIOUR to PERCEPTION that the latter changes its sign from positive to negative but it becomes non significant (-0.0144). If I add POLITICAL to PERCEPTION, the latter keeps its positive sign but it is also insignificant (0.0092). And then when I proceed to put the three variables together, PERCEPTION is negative but becomes significant (-0.0831*).


          The aggregated variable (of PERCEPTION, BEHAVIOUR and POLITICAL) would be called PREFERENCES. It would decribe the general environmental preferences of the participants. I don't really want to do this alternative.
          And this is the plot:
          Click image for larger version

Name:	Sans titre.png
Views:	1
Size:	30.4 KB
ID:	1689147


          Thank you,
          Last edited by Serena Menny; 12 Nov 2022, 05:05.

          Comment


          • #6
            t
            Last edited by Serena Menny; 12 Nov 2022, 05:06.

            Comment


            • #7
              Perception and behavior appear to be fighting for market share. The correlation only has to be moderate for that to happen.

              Comment


              • #8
                It hardly surprises me that the effect of perception declines once behavior is controlled. Perception affects behavior, and behavior affects green investment.

                I also wonder how you conceptually separate behavior from green investment. Isn't green investment just another form of behavior? Why does it get to be the dependent variable while other behaviors are independent?

                I'm mildly surprised that perception goes mildly negative, but it isn't stunning given how these variables are inter-related.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Perception and behavior appear to be fighting for market share. The correlation only has to be moderate for that to happen.
                  Nick Cox Thank you for this!

                  Richard Williams I separated them because I wanted to explore whether being sustainable in the day-to-day life affects the decision to invest more responsibly towards the environment. That is if "green" behaviour (in non financial matters) translates into making "green" financial decisions as well.

                  That is what I tried to do with perception as well (If feeling sad for the environment makes the participant goes green in his financial decisions)

                  So to interpret the change in the sign, I would just say that this is due to the fact that perception affects behaviour? Is there any way I can account for the suppression effect?

                  Thank you,

                  Last edited by Serena Menny; 12 Nov 2022, 09:05.

                  Comment


                  • #10
                    Is there any way I can account for the suppression effect?
                    I think all you need to do is paraphrase the first line of Richard Williams' #8. That's the long and short of it.

                    Comment


                    • #11
                      Clyde Schechter Okay. Thank you all so much!

                      Comment


                      • #12
                        On a related note, comparing logit and probit coefficients across nested models is problematic. I have a paper on this forthcoming in Social Science Research called "Comparing Logit & Probit Coefficients Between Nested Models".

                        Until Dec. 23, 2022, the preprint is available for free at

                        https://authors.elsevier.com/a/1g0vh,17RoZLRi

                        If you are reading this 10 years from now and can't directly access the published version, an earlier working paper version is at

                        https://papers.ssrn.com/sol3/papers....act_id=4105726

                        Abstract:

                        Social scientists are often interested in seeing how the estimated effects of variables change once other variables are controlled for. For example, a simple analysis may reveal that income differs by race – but why does it differ? To answer such a question, a researcher might estimate a model where race is the only independent variable, and then add variables such as education to subsequent models. If the original estimated effect of race declines, this may be because race affects education, which in turn affects income. What is not universally realized is that the interpretation of such nested models can be problematic when logit or probit techniques are employed with binary dependent variables. Naïve comparisons of coefficients between models can indicate differences where none exist, hide differences that do exist, and even show differences in the opposite direction of what actually exists. We discuss why problems occur and illustrate their potential consequences. Proposed solutions, such as Linear Probability Models, Y-standardization, the Karlson/Holm/Breen method, and marginal effects, are explained and evaluated.



                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        StataNow Version: 19.5 MP (2 processor)

                        EMAIL: [email protected]
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Originally posted by Richard Williams View Post
                          On a related note, comparing logit and probit coefficients across nested models is problematic. I have a paper on this forthcoming in Social Science Research called "Comparing Logit & Probit Coefficients Between Nested Models".

                          Until Dec. 23, 2022, the preprint is available for free at

                          https://authors.elsevier.com/a/1g0vh,17RoZLRi

                          If you are reading this 10 years from now and can't directly access the published version, an earlier working paper version is at

                          https://papers.ssrn.com/sol3/papers....act_id=4105726

                          Abstract:

                          Social scientists are often interested in seeing how the estimated effects of variables change once other variables are controlled for. For example, a simple analysis may reveal that income differs by race – but why does it differ? To answer such a question, a researcher might estimate a model where race is the only independent variable, and then add variables such as education to subsequent models. If the original estimated effect of race declines, this may be because race affects education, which in turn affects income. What is not universally realized is that the interpretation of such nested models can be problematic when logit or probit techniques are employed with binary dependent variables. Naïve comparisons of coefficients between models can indicate differences where none exist, hide differences that do exist, and even show differences in the opposite direction of what actually exists. We discuss why problems occur and illustrate their potential consequences. Proposed solutions, such as Linear Probability Models, Y-standardization, the Karlson/Holm/Breen method, and marginal effects, are explained and evaluated.


                          Thank you for the reference! Much appreciated

                          Comment

                          Working...
                          X