Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assumptions checking

    Hell everyone
    I want to check the model assumptions when is my dependent variable is categorical variable (1, 2 3 4)
    there is the example of my data
    HTML Code:
    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(code year) float log_ID_pay double disclosure_quality
      2299 2011   11.0021 4
      2086 2014   11.0021 3
       998 2008  10.47107 2
      2458 2014    10.859 3
      2041 2005  9.769957 2
       735 2017 11.289782 2
      2746 2016 10.308952 4
      2772 2016 11.407565 3
      2069 2014 11.275303 2
       998 2013   11.0021 3
      2200 2016 11.512925 3
       998 2017 10.785876 3
      2321 2016 10.532096 3
       998 2015 10.308952 3
      2299 2017   11.0021 4
       735 2010   11.0021 3
    300106 2014 10.308952 3
       735 2014 11.289782 2
      2041 2015   11.0021 3
      2696 2014 10.819778 3
       998 2016 10.397506 3
      2679 2013 10.819778 3
      2679 2012 10.596635 3
    300087 2014 10.322198 3
       998 2010 10.778956 3
       798 2015 11.289782 2
    300106 2011 10.596635 3
      2321 2015 10.819778 3
      2477 2017 11.695247 2
    300087 2017   11.0021 3
       592 2012 11.184422 3
       798 2010 10.414313 3
      2772 2015 10.778956 3
      2069 2008   11.0021 3
    300087 2012 10.596635 3
       735 2012 10.778956 2
       798 2011 11.289782 3
       735 2016 10.596635 3
      2696 2012 10.819778 3
       798 2013 11.289782 2
    300498 2016 11.226243 3
      2086 2017 10.714417 3
      2200 2011 11.138232 1
      2234 2015 10.819778 2
       592 2004 10.308952 3
      2234 2013 10.819778 3
      2069 2015 11.289782 1
      2477 2014 11.652687 3
      2679 2015 10.534094 1
      2696 2016  9.852194 3
      2234 2017 10.819778 3
      2041 2016   11.0021 3
       769 2004 10.491274 2
      2299 2014  10.79446 4
    300189 2015 11.289782 2
      2458 2013  9.903487 3
    300189 2016 11.289782 1
      2234 2011 10.308952 3
      2746 2015 10.596635 3
      2069 2012 11.289782 3
      2086 2015   11.0021 3
       798 2014 10.596635 3
       592 2015 11.184422 3
      2696 2017   11.0021 3
      2041 2011   11.0021 3
      2086 2009 10.308952 3
    300189 2011 10.341743 3
       592 2013 11.184422 3
      2477 2011   11.0021 3
    300106 2012 10.596635 3
       798 2017 11.184422 2
       769 2002 10.491274 2
      2458 2017 10.778956 3
      2458 2011 10.596635 3
       713 2008 10.819778 3
    300189 2013 11.289782 3
    300106 2015 10.308952 3
      2200 2014 11.512925 3
      2041 2006  10.12663 2
    300498 2015 10.463623 3
    300094 2015   11.0021 2
       798 2004 10.819778 2
      2234 2016 10.308952 3
      2200 2015 11.225244 3
       998 2004 10.778956 3
       998 2006 10.555813 3
       998 2011 10.203592 3
      2321 2013 10.819778 3
       592 2008 10.532096 2
    300189 2017 11.289782 1
       592 2017 11.184422 3
       713 2015 10.819778 3
      2069 2016 10.965436 2
      2299 2016 10.714417 4
      2299 2012 10.308952 3
      2299 2010 10.943765 4
      2069 2009   11.0021 3
      2299 2015   11.0021 3
      2234 2012 10.819778 3
      2041 2010 10.853213 3
    end
    [/CODE]
    So what test of residuals can I use for this dependent variable ?
    please help

    I have used P-P plot the plot looks like a snake

    please help

  • #2
    please look at the plot of my model
    Click image for larger version

Name:	image_2022-01-18_005653.png
Views:	1
Size:	77.8 KB
ID:	1645459

    Comment


    • #3
      What's the model you fitted? What command did you issue? I guess you're treating the categorical variable as if it were measured or counted.

      If the outcome variable is discrete, some multimodality in the residuals is only to be expected. You can't expect a close fit to a normal distribution.

      Your plot is not a P-P plot. It is a normal quantile plot (normal probability plot, normal scores plot, probit plot). The term quantile-quantile plot or Q-Q plot could be used.
      Last edited by Nick Cox; 17 Jan 2022, 11:12.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        What's the model you fitted? What command did you issue, I guess you're treating the categorical variable as if it were measured or counted.

        If the outcome variable is discrete, some multimodality in the residuals is only to be expected. You can't expect a close fit to a normal distribution.

        Your plot is not a P-P plot. It is a normal quantile plot (normal probability plot, normal scores plot, probit plot). The term quantile-quantile plot or Q-Q plot could be used.
        Firstly, thank you for replying

        secondly, I have use OLS regression. my variable is disclosure quality which has four levels 4= excellent, 3, 2, and 1= weak.

        Thirdly, you said "You can't expect a close fit to a normal distribution". Does that mean I am safe if I include this diagram as a method for testing the hypothesis assumptions (test of residuals)?

        please explain

        Comment


        • #5
          Sorry I use the following code
          HTML Code:
           predict resid_dq, residuals
          qnorm resid_dq,

          Comment


          • #6
            Thanks for the details, but you haven't answered my question fully. What predictor variables did you use?

            Otherwise there is no "safe" here independently of who is judging this and what criteria they will use. I won't be examining or reviewing your work but I would want a serious discussion of why you are using plain regression rather than say ordinal logit.

            OLS is an estimation procedure, not a model flavour.

            Here's an analogue of what you may have done which you can reproduce. The last trick of showing the outcome values on your normal quantile plot may help your interpretation.


            Code:
            sysuse auto
            regress rep78 price weight
            predict res, res
            qnorm res
            qnorm res, ms(none) mla(rep78) mlabpos(0)

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Thanks for the details, but you haven't answered my question fully. What predictor variables did you use?

              Otherwise there is no "safe" here independently of who is judging this and what criteria they will use. I won't be examining or reviewing your work but I would want a serious discussion of why you are using plain regression rather than say ordinal logit.

              OLS is an estimation procedure, not a model flavour.

              Here's an analogue of what you may have done which you can reproduce. The last trick of showing the outcome values on your normal quantile plot may help your interpretation.


              Code:
              sysuse auto
              regress rep78 price weight
              predict res, res
              qnorm res
              qnorm res, ms(none) mla(rep78) mlabpos(0)
              my dependent variable is disclosure quality and my independent variable is the logarithm of executives compensation and several control variable.
              After running the OLS regression, I sue the commands aforementioned.

              I actually don't understand what is the predictor. maybe I am not fully aware of this test.

              Comment


              • #8
                Predictor variables as I use the term are any variables other than the outcome variable. You may want to distinguish independent and control variables for your readers but regress doesn't care.

                Comment


                • #9
                  Dear @Nick Cox
                  Actually I have run the codes you gave above. and I got the following plot.
                  Does this plot suggest that the model is trustworthy based on the results of residuals test?
                  I actually see this example similar to mine (the curve of the plot)
                  Click image for larger version

Name:	image_2022-01-18_015022.png
Views:	1
Size:	73.4 KB
ID:	1645477

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Predictor variables as I use the term are any variables other than the outcome variable. You may want to distinguish independent and control variables for your readers but regress doesn't care.
                    one more thing:
                    I need to add a title in the following command
                    HTML Code:
                    qnorm resid_dq, ms(none) mla(disclosure_quality) mlabpos(0)
                    please how ?

                    Comment


                    • #11
                      Whatever you used for #2 to get a title (a title() option, presumably) but it's better to use a shorter title.

                      Sorry but I can't say anything different about "trustworthy" than about "safe".

                      Comment

                      Working...
                      X