Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What to do when linearity assumption is violated?

    Hi everyone on Statalist,

    I am working on a project and have run into a few obstacles.

    The purpose of my project is to conduct a regression analysis, with the stock price of a company as a dependent variable. For independent variables, I have used Oil Price, Google Trends activity, average temperature deviation in the country where the company is from and USD/NOK exchange rate. These variables are all daily variables, and I have only included dates where we have values on all the above-mentioned variables.

    To verify my assumptions, I want to test for the CLRM assumptions. I tested for linearity by generating scatter plots with the different independent variables against the dependent variable, but the scatterplots do not show linearity. See a few examples of the scatterplots below.

    What can I do about this? Should I transform the variables? Does the problem lie in the fact that my Y variable is stock price?

    Click image for larger version

Name:	Skjermbilde 2019-11-21 kl. 16.54.08.png
Views:	1
Size:	52.4 KB
ID:	1525777
    Click image for larger version

Name:	Skjermbilde 2019-11-21 kl. 16.54.45.png
Views:	1
Size:	47.1 KB
ID:	1525778

    Click image for larger version

Name:	Skjermbilde 2019-11-21 kl. 16.54.57.png
Views:	1
Size:	40.8 KB
ID:	1525779
    Click image for larger version

Name:	Skjermbilde 2019-11-21 kl. 16.56.48.png
Views:	1
Size:	53.3 KB
ID:	1525780



    I am really lost and a beginner at regression analysis so please excuse me if this question is a little basic.

    Thanks a lot in advance!


  • #2
    Sunniva:
    - including only observations with observed values for all the variables (aka complete case analysis or CCA) means making-up your sample (and obtaining unreliable coefficients);
    - as far as the lack of linearity is concerned, it may well be that some of your predictors need a squared term.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      These look like weak correlations to me rather than relationships calling out for transformations. Contrary to widespread practice, I like to think of what are usually called assumptions as ideal conditions. A regression can still be helpful even if the ideal conditions are not matched exactly, and indeed they never are, outside of simulations or theoretical exercises.

      A weak correlation between the outcome and a predictor isn't really a violation of assumptions; it just means that your model may disappoint if you are expecting otherwise. For example, there could be some plausible reason for a relationship between stock price and your temperature variable, but even so there are known to be many other influences too.

      There are some puzzling holes in the data. Joining data in time order will create a mess but it may also be instructive.

      Comment


      • #4
        Dear Nick and Carlo,

        Thank you so much for the response.

        The puzzling holes in the data that you mention, is that a result of only including observations with observed values for all variables?

        In addition, do you guys think that I should perform a non-linear regression instead (due to the violation of the linearity assumption)? If so, which non-linear regression do you recommend?

        Thanks again!

        Comment


        • #5
          Sunniva:
          1) it might be, but I cannot be sure as I neither know your datasets, not the characteristics of the observations with/without missing data. I would recommend you to deal with missing data, first.
          2) I would stick with -regress-. For instance, you CCA shows a parabolic relationship between stock price and brent, that seems to reach a maximum when brent approaches 50.
          As an aside, I finf difficult to envisage a relationship between temperature and stock price, unless I figure a out a scenario in which an abrupt change in temperature makes computers go crazy in selling/buying stocks (but I might be badly wrong).
          Last edited by Carlo Lazzaro; 22 Nov 2019, 03:13.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Thanks Carlo!

            However, I have dealt with the missing data, and it doesn't seem like the results change at all.

            In addition, the parabolic relation you talk about, can we still use that in a linear regression? Or should I transform it somehow?

            What is frustrating is that I have tried to transform multiple variables using log, squared and cubed, in order to make the relationship more linear. However, non of them work.

            Best wishes

            Comment


            • #7
              Sunniva:
              you can go -regress- and include both a linear and a squared term for -brent- in the right-hand side of your regression equation via the following code, that exploits the wonderful capabilities of -fvvarlist- notation:
              Code:
              c.brent##c.brent
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Thank you again, Carlo! Should we use this code for all the variables, or just Brent since it had a parabolic relationship with stock price?

                best

                Comment


                • #9
                  Sunniva:
                  possibly, for -Google trends-, too.
                  Obviously, things may change after imputing missing data.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment

                  Working...
                  X