Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with an OLS regression.

    I am relatively new to Stata and regression in general and am wondering whether my OLS results show any problems that I should be aware of.

    Therefore, my question is: Does anybody see a problem with these OLS regression results? R squared seems to be relatively low, but are there any other concerns?
    Click image for larger version

Name:	OLS_Results.PNG
Views:	1
Size:	12.7 KB
ID:	1640555

    Last edited by Peter Jenkins; 12 Dec 2021, 11:36.

  • #2
    I don't see any major mechanical issue. But you need to be aware that confirmation obtained through this way means very little to your model's validity. Some other aspects can still go wrong even the regression output looks "normal."

    Comment


    • #3
      Would it be a problem if one of the two independent variables is time-invariant? Do I need to use a Fixed Effect model in this case?

      Comment


      • #4
        Peter:
        are you dealing with a cross-sectional or a panel dataset?
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Looks like

          log passengers is predicted by log fare and log distance.

          If so, I'd expect fare and distance to be strongly correlated, so the predictors are fighting for market share. Some graphs would help mightily.

          Much depends on

          1. Whether this is an assignment. so what you're expected to do is circumscribed.

          2. Whatever other predictors are available.

          3. Your expectations of the level and kind of explanation the model can provide. I'd expect the number of passengers to depend on very much more than those variables, including characteristics of origin and destination.
          Last edited by Nick Cox; 13 Dec 2021, 06:45.

          Comment


          • #6
            Thanks for your answers.

            I am dealing with panel data and the distance is a time-invariant regressor.

            My regression equation is: log(passengers) = beta1*log(price) + beta2*log(distance)

            Price and distance are indeed strongly correlated. Is that problematic for the regression?
            There are no other predictors available and I am trying to find potential problems with the model.

            Could there maybe be endogeneity problems?

            Comment


            • #7
              Peter:
              1) the strong correlation of the two predictors may cause quasi-extreme multicollinearity problem (provided that Stata does not omit one of them by default due to perfect collinearity);
              2) time-invariant predictors makes -fe- estimator wiping them out;
              3) endogeneity problems: I would exclude reverse causation in your example, but I'm not that familiar with your research field to rule out that a latent variable-driven endogeneity may exist;
              4) two predictors are, in all likelihood, insufficient to give a fair and true view of the data generating process you're investigating (unless this is an assignment).
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Thanks for your answer Carlo.

                Regarding 2):
                Would the omission of the distance in the FE model cause endogeneity?
                In other words, would the omission of distance make it part of the error term (danger of endogeneity due to a correlation of distance and price) or is it wiped out completely (no endogeneity)?

                Comment


                • #9
                  Peter:
                  1) endogeneity concerns the correlation of the systematic error -epsilon- with both regressand and (usually) one predictor;
                  2) the -fe- estimator allows a weak endogeneity between the panel-wise effect -u- (if any) and the vector of regressors;
                  3) if the endogeneity is related to time-invariant variables, it is eliminated via the -fe- machinery;
                  4) correlation between two (or more) predictors may create multicollinearity issues.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Isn't reverse causation also a possible driver of endogeneity in this case? My thought is that airlines might set their prices depending on the average number of passengers (dynamic pricing).

                    Comment


                    • #11
                      Peter:
                      unfortunately, I do not know the literature of your research field to confirm your take.
                      My amateur's take is that some business tracks with high volume of passengers allow airlines to earn ridiculous profits to subsidize other tracks with low volume of passengers that cannot be cancelled despite their negative margins.
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment

                      Working...
                      X