Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • a high correlation coefficient between the dependent variable and a control variable

    Dear all,

    I have an control variable which has a high correlation coefficient of 0.6985 with the dependent variable.It's cross-sectional data, what things should I concern about the high correlation coefficient ? Can I still get it into the model (it's a very important control variable)?

    Code:
    pwcorr dep ctrl1 ctrl2 iv1 iv2,sig
    Click image for larger version

Name:	16年06月29日1502_1.png
Views:	1
Size:	6.1 KB
ID:	1347370


    Best,
    David

    Cross-post at :
    http://stats.stackexchange.com/quest...a-control-vari
    Last edited by David Lu; 29 Jun 2016, 07:49.

  • #2
    David.
    yes, you can.
    However, taking a look at the -pwcorr- outcome table, your (regression?) model will probably suffer from multicollinearity.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      David.
      yes, you can.
      However, taking a look at the -pwcorr- outcome table, your (regression?) model will probably suffer from multicollinearity.
      Dear Carlo,

      Yes, you've already forseen the pitfall. I got an insignificant regression with incorporating this variable into the model. After droping it, the model got significant coefficients. In that case, what should I do?

      Plan 1: find alternative variable to replace the existing one. Yes, it has one, but if I use it, my sample size will be much smaller. Then, the issue of biasness emerges.
      Plan 2: drop it directly. If I do that, how can I control the effect of that variable?

      Thanks,
      David

      Comment


      • #4
        David:
        - do not put statistical significance as your totem; instead, consider what Others did in the past when presented with the same topic in your research field;
        - go for a more parsimoniuos/different model (your Plan 1). The biasedness issue will emerge if your missing values are informative (again, p<<whatever> is not the threshold that split good and bad);
        - multicollinearity includes more than one variable. It's up to you what to drop. If you drop, you cannot control for,
        - as a final remark, I would also check whether your regression model suffers from endogeneity (you may end up with a variable included within residuals which is correlated with -depvar- and one -indepvar-).
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          David:
          - do not put statistical significance as your totem; instead, consider what Others did in the past when presented with the same topic in your research field;
          - go for a more parsimoniuos/different model (your Plan 1). The biasedness issue will emerge if your missing values are informative (again, p<<whatever> is not the threshold that split good and bad);
          - multicollinearity includes more than one variable. It's up to you what to drop. If you drop, you cannot control for,
          - as a final remark, I would also check whether your regression model suffers from endogeneity (you may end up with a variable included within residuals which is correlated with -depvar- and one -indepvar-).
          Dear Carlo,

          Yes, you're right. Endogeneity is the most frequently mentioned issue that I cannot escape, and what I read most is based on the OLS context. Since now I use glm with a log link, how can I check or do something to alleviate the problem of endogeneity? My data is cross-sectional data. Here is the command what I use:

          Code:
          glm dep c.iv1t##c.iv2 c.iv1t#c.iv1t cv1 i.cv3 ,family(poisson) link(log) vce(robust)
          Thanks again,
          David

          Comment


          • #6
            In the natural sciences a strong predictor could be the sign of a good theory and grounds for rejoicing.

            In the social sciences, it seems, such a variable is treated with suspicion as likely to be just another version of the response and on the grounds that no theory could be that good.

            Comment


            • #7
              David:
              set aside some famous class-room drills (i.e., ability embedded within residuals, that affects both educational attainments (-indepvar-) and wage (-depvar-)), the usual source of endogeneity examples is the literature of your research field.
              Endogeneity is a strange (and dangerous) beast: sometimes it's apparent, sometimes is reported in literature and sometimes again seems to be in the beholder's eyes (unfortunately, the beholder in point is often a reviewer/discussant).
              As per your code, you seem to have a count regression model; I'm not clear with what you're going to test with the second interaction, that does not seem to include the conditional main effect of the predictors: I do not see any detail dealing with endogeneity issue, though.

              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment

              Working...
              X