a high correlation coefficient between the dependent variable and a control variable

David Lu

Join Date: May 2016

Posts: 105
#1

a high correlation coefficient between the dependent variable and a control variable

29 Jun 2016, 08:46

Dear all,

I have an control variable which has a high correlation coefficient of 0.6985 with the dependent variable.It's cross-sectional data, what things should I concern about the high correlation coefficient ? Can I still get it into the model (it's a very important control variable)?

Code:

pwcorr dep ctrl1 ctrl2 iv1 iv2,sig

Best,
David

Cross-post at :
http://stats.stackexchange.com/quest...a-control-vari

Last edited by David Lu; 29 Jun 2016, 08:49.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17456
#2

29 Jun 2016, 09:23

David.
yes, you can.
However, taking a look at the -pwcorr- outcome table, your (regression?) model will probably suffer from multicollinearity.

Kind regards,
Carlo
(StataNow 18.5)
Comment
David Lu

Join Date: May 2016

Posts: 105
#3

29 Jun 2016, 09:31

Originally posted by Carlo Lazzaro View Post

David.
yes, you can.
However, taking a look at the -pwcorr- outcome table, your (regression?) model will probably suffer from multicollinearity.

Dear Carlo,

Yes, you've already forseen the pitfall. I got an insignificant regression with incorporating this variable into the model. After droping it, the model got significant coefficients. In that case, what should I do?

Plan 1: find alternative variable to replace the existing one. Yes, it has one, but if I use it, my sample size will be much smaller. Then, the issue of biasness emerges.
Plan 2: drop it directly. If I do that, how can I control the effect of that variable?

Thanks,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17456
#4

29 Jun 2016, 09:41

David:
- do not put statistical significance as your totem; instead, consider what Others did in the past when presented with the same topic in your research field;
- go for a more parsimoniuos/different model (your Plan 1). The biasedness issue will emerge if your missing values are informative (again, p<<whatever> is not the threshold that split good and bad);
- multicollinearity includes more than one variable. It's up to you what to drop. If you drop, you cannot control for,
- as a final remark, I would also check whether your regression model suffers from endogeneity (you may end up with a variable included within residuals which is correlated with -depvar- and one -indepvar-).

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment
David Lu

Join Date: May 2016

Posts: 105
#5

29 Jun 2016, 10:08

Originally posted by Carlo Lazzaro View Post

David:
- do not put statistical significance as your totem; instead, consider what Others did in the past when presented with the same topic in your research field;
- go for a more parsimoniuos/different model (your Plan 1). The biasedness issue will emerge if your missing values are informative (again, p<<whatever> is not the threshold that split good and bad);
- multicollinearity includes more than one variable. It's up to you what to drop. If you drop, you cannot control for,
- as a final remark, I would also check whether your regression model suffers from endogeneity (you may end up with a variable included within residuals which is correlated with -depvar- and one -indepvar-).

Dear Carlo,

Yes, you're right. Endogeneity is the most frequently mentioned issue that I cannot escape, and what I read most is based on the OLS context. Since now I use glm with a log link, how can I check or do something to alleviate the problem of endogeneity? My data is cross-sectional data. Here is the command what I use:

Code:

glm dep c.iv1t##c.iv2 c.iv1t#c.iv1t cv1 i.cv3 ,family(poisson) link(log) vce(robust)

Thanks again,
David
Comment
Nick Cox

Join Date: Mar 2014

Posts: 34785
#6

29 Jun 2016, 10:30

In the natural sciences a strong predictor could be the sign of a good theory and grounds for rejoicing.

In the social sciences, it seems, such a variable is treated with suspicion as likely to be just another version of the response and on the grounds that no theory could be that good.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17456
#7

29 Jun 2016, 11:05

David:
set aside some famous class-room drills (i.e., ability embedded within residuals, that affects both educational attainments (-indepvar-) and wage (-depvar-)), the usual source of endogeneity examples is the literature of your research field.
Endogeneity is a strange (and dangerous) beast: sometimes it's apparent, sometimes is reported in literature and sometimes again seems to be in the beholder's eyes (unfortunately, the beholder in point is often a reviewer/discussant).
As per your code, you seem to have a count regression model; I'm not clear with what you're going to test with the second interaction, that does not seem to include the conditional main effect of the predictors: I do not see any detail dealing with endogeneity issue, though.

Kind regards,
Carlo
(StataNow 18.5)
Comment

Announcement

a high correlation coefficient between the dependent variable and a control variable

Comment

Comment

Comment

Comment

Comment

Comment