Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS or Probit

    Dear community,
    I conduct a research of how individual level and collective level economic indicators affect public attitudes towards immigration. To clarify, an example of hypothesis: while state suffers from economic downturn, perception of personal financial threat increases probability of negative attitudes.

    My dataset is a survey conducted during European recession in seven countries.

    My main DV is attitude towards immigration; IV: individual level - economic evaluation, satisfaction with the state of economy, income satisfaction; national level - gdp, taxation change, unemp. rate, unemp. rate change, etc.


    I am concerned whether I should use OLS regression or probit regression. I have coded all DV and IV for both types of regressions and statistical results seems fine, however, I still not sure which type of regression to use.

    Could you advise me, please?

  • #2
    If you use ordinary linear regression, you are fitting a linear probability model. Linear probability models are always, in principle, wrong, because for sufficiently extreme values of the predictors, they predict outcome probabilities outside the 0-1 range. It is also the case that when they predict values between 0 and 1 but close to either end of the interval, they are often appreciably miscalibrated. Nevertheless, if in your data, the linear model's predictions for the actual data do not do that, and do not come close to 0 or 1, but rather are well within the center of the 0-1 interval, then these models may be satisfactory. They have the advantage that regression coefficients can be interpreted directly as marginal effects in the probability metric.

    Probit does not suffer from this 0-1 issue: it always makes predictions in 0-1. However, the coefficients of a probit regression are not marginal effects on probability. They are marginal effects on the normal ogive, which is, for most people, incomprehensible. For that reason, in some disciplines, probit regressions are little used: the results just don't lend themselves to interpretation beyond the signs of rthe effects. You can, of course, use the -margins- command to get marginal effects on outcome probability, but because the probit model is non-linear, you have to specify the values of the predictors at which you want to estimate the marginal effects.

    Another model to consider is the logistic model. Like probit, it presents no 0-1 problem. Its coefficients are interpretable as the logarithms of odds ratios, which is something that most people can, after a little practice or training, wrap their minds around.

    Comment


    • #3
      Dear Clyde, thank you for your advise! Indeed I would apply logistic model, however, in this case all predictors should be binary. In my case I should use national economic indicators which are complicated for binary coding. I was thinking to recode national indicators like GDP growth as binary (1 for positive 0 for negative) and unemployment rate, inflation rate (coded vice versa). However, this way makes me feel concerned regards its correctness.

      For now I have three different variable models for regressions as follows: |

      Summary statistics (binary model)
      Mean St.Dev min max N
      Immigration attitude (binary) .679 | .467 | 0 | 1 28408
      GDP per cap. in thousands 39.525 | 4.023 | 31.933 | 46.419 | 29054
      GDP growth rate(%)/100 .013 | .021 | -.044 | .057 | 29054
      Unemp. rate(%)/100 .086 | .04 | .037 | .199 | 29054
      Unemp. change rate (%)/100 .058 | .155 | -.131 | .366 | 29054
      Taxation per income change rate(&)/100 -.021 | .044 -.069 | .077 | 29054
      Income evaluation (binary) .47 | .499 0 | 1 | 23832
      Economic satisfaction (binary) .344 | .475 0 | 1 | 28615
      Income satisfaction (binary) .834 | .373 0 | 1 | 28824
      Age 48.527 | 18.607 | 15 | 123 | 28973
      Gender (binary) .478 | .5 | 0 | 1 | 29045
      Education (binary) .145 | .353 | 0 | 1 | 28980
      Soc. class 5 categories 1.81 | 1.412 | 0 | 4 | 26363
      Summary statistics (linear model)
      Mean St.Dev min max N
      Immigration attitudes 0 - 10 5.37 2.041 0 10 28909
      GDP per cap. in thousands 39.525 | 4.023 | 31.933 | 46.419 29054
      GDP growth rate in % 1.259 | 2.106 | -4.405 | 5.703 29054
      Unemp. rate in % 8.556 | 3.961 | 3.655 | 19.86 | 29054
      Unemp. change rate (%) 5.802 | 15.496 | -13.09 | 36.593 | 29054
      Taxation per income change rate -2.093 | 4.432 | -6.919 | 7.717 | 29054
      Income evaluation 1-10 5.324 2.83 1 10 23832
      Economic satisfaction 1-10 4.345 2.414 0 10 28615
      Income satisfaction 0-3 2.147 .787 0 3 28824
      Age 48.527 | 18.607 | 15 123 28973
      Gender (binary) .478 .5 0 1 29045
      Education (binary) .145 .353 0 1 28980
      Soc. class 5 categories 1.81 1.412 | 0 4 26363
      Summary statistics (linear model: most variables recoded for range 0-min 1-max)
      Mean St.Dev min max N
      Immigration attitude (range 0-1) .454 .22 0 1 4407
      GDP per cap. in thousands 36.214 | .2 36.016 | 36.416 | 4436
      GDP growth rate(%)/100 .007 | .01 | -.003 | .017 | 4436
      Unemp. rate(%)/100 .067 | .011 | .056 | .078 | 4436
      Unemp. change rate (%)/100 .05 | .017 | .033 | .067 | 4436
      Taxation per income change rate(&)/100 -.013 | .011 | -.024 | -.003 | 4436
      Economic satisfaction (range 0-1 .323 | .205 | 0 1 4351
      Income evaluation (range 0-1) .514 .3 .1 1 3640
      Income satisfaction (0-0.75) .535 | .204 | 0 .75 | 4389
      Age 50.159 | 18.803 | 15 123 | 4403
      Gender (binary) .441 | .497 | 0 1 4427
      Education (binary) .099 | .299 | 0 1 4420
      Soc. class 5 categories 1.704 | 1.461 | 0 4 4207
      I still debate with myself if I choose linear model, which variables I should apply: those which takes original range like 0-10 or those which were recoded by division up to 0-1 range?
      From your professional, perspective what do you think after looking at these three tables of same variables but recoded for different model, which recoding looks more appropriate?

      Kind regards,
      John
      Last edited by John Galvin; 05 Aug 2019, 08:12.

      Comment


      • #4
        Logit works just fine with dummy rhs variables. However, it is seldom a good idea to take a continuous variable like change in GNP and make it dichotomous. If you want to allow for different parameters on positive and negative growth, you could create your dummy and interact it with the continuous variable.

        Comment


        • #5
          Hi everyone, I was looking for a posts that would help me to interpret the coefficient of a probit model. Reading this posts I understood how to interpret the coefficient but now I wonder: how can I control if "the linear model's predictions for the actual data do not come close to 0 or 1"? Thaks for your attention

          Comment


          • #6
            Enrico Azzini Care to give a reproducible example? I don't know what you mean here.

            Comment


            • #7
              Enrico:
              with no details at all (please note that, being a regular poster, you should be familiar with the FAQ) it is impossible to reply positively.
              As usual, please share what you typed and what Stata gave you back. Thanks.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment

              Working...
              X