Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice of regression model

    My dependent variable is interval but bounded in the interval 5 to 80. All the values are known within the accuracy of the experiment and it is not possible for values to occur outside of these bounds. I had intended to analyse the data within a regression framework but I am uncertain which regression model to use.

    Would tobit or truncreg be appropriate or is there another model; or is the fact that the data does not follow the OLS assumption of an unbounded dependent variable not that important?


    Eddy

  • #2
    Dear Eddy Simms,

    Can you please post an histogram of your dependent variable?

    Best wishes,

    Joao

    Comment


    • #3
      If your outcome variable cannot by definition lie outside [5, 80] then there are at least two questions:

      1. In practice, it may be that the mean function is adequately modelled using linear regression. I guess this may be one of the considerations that Joao Santos Silva has in mind. This could easily happen if the bulk of outcome value lie somewhere in the middle. Also in many fields our models are not that great and so their lowish coefficients don't lead to predictions outside the allowed range.

      2. In principle working with logistic regression on

      (outcome - 5) ./ 75

      may work better (using robust standard errors).

      Models that assume that the outcome could be outside [5, 80] but it's just that such values are not observed in practice do not seem so attractive to me.

      Long-term readers of this forum may recognise one of my hobby-horses in what follows.

      It is overwhelmingly common practice to refer to model assumptions, but statistics is not a branch of formal logic whereby if a premiss (premise) is not satisfied then a conclusion is not correct. I think we'd all be better off, psychologically and pedagogically, in calling such assumptions ideal conditions whereby we sketch the fantasy perfect situation and then recognise that the data are not obliged to match our fantasies.

      The dark art of modelling is muddling through and recognising which ideal conditions are not so crucial. The first question here is whether Y = Xb is a good functional form to use. which is largely empirical.

      Comment


      • #4
        Joao, histogram attached. There are three main test groups with the possibility of further independent variables.

        Nick, thank you for your comments. I must educate myself to stop using the term 'assumptions' - it was unfortunately ingrained in me in my early statistics lectures.

        Eddy
        Click image for larger version

Name:	alloy conc.png
Views:	1
Size:	30.6 KB
ID:	1642963

        Comment


        • #5
          That's an interesting distribution. Perhaps the bimodality makes sense given your predictors. By eye the overall mean is about 35 and on the evidence so far I don't think plain or vanilla regression is absolutely excluded. Neither is logit.

          My use of ideal conditions may be unusual but there's an illustrious precedent in a paper by Francis Anscombe from 1961.

          https://projecteuclid.org/ebooks/ber...msp/1200512155

          Francis Anscombe and John Tukey married sisters, so Tukey referred to Anscombe as his brother-in-squared law. Anscombe was fastidious about publishing, but the quality and originality of his work was very high,

          Comment


          • #6
            Like Nick, I would not exclude simple linear regression, especially if your variable is continuous. It may either be enough or at least an interesting starting point and a good benchmark.

            Comment


            • #7
              Dear Joao, Nick,

              Many thanks for your advice.

              Eddy

              Comment

              Working...
              X