Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative coefficient (school on income regression)

    Hello everyone,

    I ran a regression to see the impact of education on income. Everything is fine but the only problem is that I get a negative coefficient for the impact of the school on income.
    This is not logical because someone who is educated should make better choices in theory and should also have access to jobs with better incomes.
    Has anyone ever encountered this problem? If so, how do you solve it?

    Thanks in advance.

    Pita

  • #2
    When a particular coefficient sign appears puzzling, it could be that your ideas were wrong or at least that the effect of a predictor is more complicated than you thought.

    More commonly in my experience: a puzzling sign is often a side-effect of what else is in the model, particularly if you have one or more of

    (1) highly correlated predictors, so that predictors are fighting for market share

    (2) too many predictors compared with your sample size

    (3) some nonlinear relationships that your model is not catching well

    (4) outliers or marked skewness in one or more variables that is warping a fit

    That doesn't purport to be a complete list.

    Comment


    • #3
      Following Nick's advice, I would suggest starting with a scatter plot between income and education, and a simple regression of income on education.

      Comment


      • #4
        Thank you for your answers Fei Wang and Nick Cox

        My education variable is binary so I don't think it's right to use a scatter plot (or maybe I'm wrong)
        But when I did a simple regression of income and education, the coefficient of the education variable became more negative than before by -0.06.

        I'm a bit confused, but I read on the internet that there would be a false sign coefficient if a significant variable is missing in the regression.

        What do you think about it?

        Thanks in advance.

        Comment


        • #5
          you might want to take a look at the cited articles in #6 of https://www.statalist.org › forums › forum › general-stata-discussion › general › 1624659-fixed-effects-vs-pooled-ols

          Comment


          • #6
            As education is binary you might consider say logit or probit instead.

            If education has a negative relationship with income the implication is higher mean income for education = 0 than for education = 1 (or some equivalent fact for different codings). So what is the difference and how is education coded?

            Comment


            • #7
              Thanks for your answer Rich Goldstein and Nick Cox


              I have coded education like that:


              *First we drop people who doesn't know their education

              sort education_head
              tab education_head
              tab education_head, nol

              gen education_dummy = 1
              replace education_dummy = 0 if education_head == 0
              tab education_dummy_hhd


              After using tab and nol, I get 4 different levels :

              0 = no education
              1 = primary school
              2 = high school
              3 = university


              Please could you tell me if there is a problem?

              Thanks in advance.

              Comment


              • #8
                So it is not binary, contrary to #4. If a scatter plot does not make sense, neither does a regression!

                Comment


                • #9
                  Dear Nick Cox ,

                  Could you tell me why you think that the education variable is not binary?

                  I set : no education = 0
                  And education = 1 (primary; high school and university)



                  Thanks in advance.

                  Comment


                  • #10
                    Fei Wang’s suggestion of a scatter plot remains good. Otherwise our guessing can’t help you much.

                    Your report of 4 levels of education isn’t consistent with the report of a binary variable.
                    Last edited by Nick Cox; 05 Dec 2021, 11:31.

                    Comment


                    • #11
                      Thank you Nick Cox and sorry for my misunderstanding.

                      You and Fei Wang are right.

                      Here is what I get by doing a scatter plot.

                      I have added two other levels of education (5.others and 4.technical formation)
                      Attached Files
                      Last edited by Pita Fouta; 05 Dec 2021, 11:56. Reason: I must point out that this concerns the income of farmers

                      Comment


                      • #12
                        Here is the scatter plot I get for the binary variable
                        Attached Files

                        Comment


                        • #13
                          Education is a predictor. I got that the wrong way round in #6 — sorry, my error — but the wording “school on income regression” was perhaps pushing the wrong way too.

                          The plot shows 6 levels now, not 4 or 2.

                          Whichever version you’re using, treating education as a factor variable seems indicated to me. Taking the numeric codes literally is not needed or obviously a good choice,

                          Comment


                          • #14
                            Yes indeed Nick Cox , I added two other variables (technical formation and others) that were also present in my database.

                            That's why you see a total of 6 levels now.

                            I based my regression on the binary variable 0: Never attended school and 1: attended school.

                            However, my main problem is why the impact of this variable is negative on income (negative coefficient)? Normally when someone has studied, he or she can make better choices. Maybe it's a problem with the amount of data I have looking at the scatter plot (I have 356 observations)? I don't know.

                            Comment


                            • #15
                              A couple of things need attention.

                              1. According to #11, there are income outliers for primary (1) and high school (2) (may pull down the average income for higher education in #12). Please make sure the variable of income was cleaned properly.

                              2. If we only look at 0, 1, 2, 3 in #11, after dropping the outliers, it seems income changes non-monotonically with education: primary seems to have similar (or lower) income as compared with zero education, but incomes for high school and university seem higher than primary. You may try the regression below for the possible non-monotonic relations (education needs to be 0, 1, 2, and 3, the original scales).

                              Code:
                              reg income i.education
                              Or descriptive statistics work, too.

                              Code:
                              tabstat income, by(education)
                              Last edited by Fei Wang; 05 Dec 2021, 13:52.

                              Comment

                              Working...
                              X