Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When should we exclude an independent variable out of the regression?

    In my case, I have two questions that I want to ask here relating to correlations among variables:
    1. Whether it makes sense if we test the impact of laws on two outcome variables that are correlated to each other (around 0.6)
    2. I have a table of correlation of some independent variables as below
    Click image for larger version

Name:	1.PNG
Views:	1
Size:	12.1 KB
ID:	1626533


    Note: The variables names on the first row are equal to those on the column, for example (workpl~s=workplaces, sorry for the inconvenience due to the character limitation in STATA)

    I am wondering which variables should be excluded from the regression? Is it okay to retain all of them in the regression. From my understanding, between two variables having high correlation, we should exclude one. High correlation is defined by having a correlation value higher than 0.5 and lower than 1.

  • #2
    What are some characteristics of your dataset? If you have a ton of data (many observations) I'd say leave them all in, sure the standard errors will be higher but that's just the nature of the beast. If you have less data, which I suspect may be the case, you might want to remove one of gvmresponse and containment. There are no hard and fast rules about when to remove covariates, it's all dependent on the specific circumstances, though 0.97 is quite high as correlations go.

    Comment


    • #3
      This is a question for an intro to econometrics textbook. I will be honest with you: Someone who does differences-in-differences analysis should really know the answer to this.

      With that said: if correlation coefficients are around .7, that's when you should be concerned about violating one of the Gauss Markov assumptions. Phuc Nguyen

      Comment


      • #4
        Jackson Monroe Agreed. Once you're around .8 or .9, either you've made data collection/management errors, or you're just measuring the same thing.

        Comment


        • #5
          The Gauss-Markov theorem mainly concerns the expected values and covariances of the errors and is required for OLS to yield the best (i.e., minimum variance) linear unbiased estimate. Correlations among the predictors do not affect those properties of OLS and certainly do not violate any assumptions (in the sense that you cannot choose to ignore the violation). Mathematically, the inverse of the predictor matrix, which is required to estimate the coefficient vector, is not defined in the case of collinearity. Here, collinearity is a linear combination of two (or more) predictors, which in turn implies a correlation coefficient of 1. Modern software, such as Stata, should have no problems, whatsoever, inverting a predictor matrix with correlations up to .97 as reported here. If Stata cannot invert the matrix, it will drop collinear predictors.

          Edit: By the way, omitting predictors that are correlated with both the outcome and other predictors does violate the Gauss-Markov assumption \(Cov(\epsilon, X) = 0\) and renders the estimator biased.

          How to interpret high correlations among the predictors from a substantive perspective is another, much more relevant, topic.
          Last edited by daniel klein; 07 Sep 2021, 01:06.

          Comment


          • #6
            Originally posted by daniel klein View Post
            The Gauss-Markov theorem mainly concerns the expected values and covariances of the errors and is required for OLS to yield the best (i.e., minimum variance) linear unbiased estimate. Correlations among the predictors do not affect those properties of OLS and certainly do not violate any assumptions (in the sense that you cannot choose to ignore the violation). Mathematically, the inverse of the predictor matrix, which is required to estimate the coefficient vector, is not defined in the case of collinearity. Here, collinearity is a linear combination of two (or more) predictors, which in turn implies a correlation coefficient of 1. Modern software, such as Stata, should have no problems, whatsoever, inverting a predictor matrix with correlations up to .97 as reported here. If Stata cannot invert the matrix, it will drop collinear predictors.

            Edit: By the way, omitting predictors that are correlated with both the outcome and other predictors does violate the Gauss-Markov assumption \(Cov(\epsilon, X) = 0\) and renders the estimator biased.

            How to interpret high correlations among the predictors from a substantive perspective is another, much more relevant, topic.
            So, daniel klein , do you mean that I just let all the variables there in my regression even the correlation is high? If anything to do, STATA will do automatically?

            Comment


            • #7
              Originally posted by Phuc Nguyen View Post
              So, daniel klein , do you mean that I just let all the variables there in my regression even the correlation is high? If anything to do, STATA will do automatically?
              Well, yes and no. Stata handles the mathematics/statistics. You will have to interpret your data and results with respect to your theoretical assumptions and research questions.

              Comment


              • #8
                A model should be correctly specified. If all the variables belong in the model, then all the variables should be in the model. At least if the sample size is big enough.

                Of course, the model may not be correctly specified. Or there could be problems with the data. As Jared says, if the correlation is super-high, maybe you've measured the same thing twice. If so, you may want to choose one of the measures, or combine measures into a scale. If X1 and X2 are highly correlated, maybe X2 is a scale and X1 was used in computing it.

                In any event, I think you have to know something about your measures and why they are correlated. You can't just look at the correlation matrix and say this one stays, that one goes.

                Some tips for dealing with multicollinearity are on p. 4 of

                https://www3.nd.edu/~rwilliam/stats2/l11.pdf
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 19.5 MP (2 processor)

                EMAIL: [email protected]
                WWW: https://www3.nd.edu/~rwilliam

                Comment

                Working...
                X