Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Most suitable regression for discrete dependent variable?

    Hello everyone:

    I am working with a regression model in which my dependent variable is years lived (counted as 1, 2, 3...121.). I am aware that this is a discrete variable.

    What type of regression would be best in order to find the effect of several independent variables on my discrete DV?

    I initially used an linear regression (OLS) but my professor said to try again. I have considered using an ordered logit model but I am not sure of whether it is appropriate.

    Thank you,

    Marcos

  • #2
    If you think of "years lived" as "years till death" you can see that this is a survival analysis problem. There is a whole suite of programs in Stata designed for that kind of problem, see help st or http://www.maartenbuis.nl/wp/survival.html or https://www.iser.essex.ac.uk/resourc...sis-with-stata
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Marcos:
      welcome to this forum.
      Maarten pointed you toward the first choice most of us would think of when presented with such an issue.
      Perhaps -poisson- may be worth trying.
      Admittedly, your teacher's advice "try again" sounds like a Delphi's oracle response.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Thank you both for your answers and your welcome to the forum. My teacher's advice is indeed a bit puzzling, as we do not have much experience on this field. We are just starting out.

        I have read Maarten's proposal and I see how it is a survival analysis problem. Nevertheless, I fail to understand how to translate this information to my Stata project. My hypothesis is that my independent variable x has statistically significant effects on the dependent variable y, y being this number of years survived. Is there a command by which I could estimate the effects of x on y without the 'discreteness' of the dependent variable compromising the validity of the results?

        Apologies if this can be inferred from your answers -- I'm not that well versed in econometrics or Stata, for that matter.

        Thank you,

        Marcos

        Comment


        • #5
          It seems like you're mainly interested in the effect of your x values on the mean value of y, in which case survival analysis is not necessary (nor is it direct). I would take Carlo's advice and use Poisson regression with robust standard errors. You have a single cross section, correct?

          Code:
          poisson y x1 ... xk, vce(robust)
          The mean function is E(y|x) = exp(b0 + b1*x1 + ... + bk*xk) so the coefficients are (roughly) proportionate effects. If xk is a log, bk is an elasticity.

          Comment


          • #6
            Originally posted by Jeff Wooldridge View Post
            It seems like you're mainly interested in the effect of your x values on the mean value of y, in which case survival analysis is not necessary (nor is it direct). I would take Carlo's advice and use Poisson regression with robust standard errors. You have a single cross section, correct?

            Code:
            poisson y x1 ... xk, vce(robust)
            The mean function is E(y|x) = exp(b0 + b1*x1 + ... + bk*xk) so the coefficients are (roughly) proportionate effects. If xk is a log, bk is an elasticity.
            That is right, Mr. Wooldridge. I have 150 observations. Each has a particular value for X (and other independent variables) and each lasts Y years. Essentially, I'm trying to ascertain whether having higher X values for an observation will lead to it having higher expected Y values (lasting more years), all else held constant. Should I use Poisson then? Thank you very much for your answer.

            Comment


            • #7
              As between the advice to use survival analysis and the advice to use a Poisson regression, I think a crucial deciding factor about the nature of the data has been overlooked.

              Specifically, what does "years lived" mean? If the outcome for a particular unit is "years lived" = 25, does that mean that that unit actually died (or went banbkrupt, or whatever end of living means for these units) before the 26th year? Or does it just mean that we know the unit survived 25 years, but we have no further information beyond that time? (Perhaps the study ended, or the unit withdrew from the study, or something happened that made it impossible to know if the unit was still alive at year 26 or not.) If the latter, then we have censored observations, and Poisson regression cannot properly accommodate that. A survival analysis would be needed in that situation.

              If however, "years lived" always refers to certainty that the unit died at that age, then a Poisson regression is fine for the stated purpose, and survival analysis will offer no particular advantage.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                As between the advice to use survival analysis and the advice to use a Poisson regression, I think a crucial deciding factor about the nature of the data has been overlooked.

                Specifically, what does "years lived" mean? If the outcome for a particular unit is "years lived" = 25, does that mean that that unit actually died (or went banbkrupt, or whatever end of living means for these units) before the 26th year? Or does it just mean that we know the unit survived 25 years, but we have no further information beyond that time? (Perhaps the study ended, or the unit withdrew from the study, or something happened that made it impossible to know if the unit was still alive at year 26 or not.) If the latter, then we have censored observations, and Poisson regression cannot properly accommodate that. A survival analysis would be needed in that situation.

                If however, "years lived" always refers to certainty that the unit died at that age, then a Poisson regression is fine for the stated purpose, and survival analysis will offer no particular advantage.
                Thank you for your answer, Mr. Schechter. I must clarify that 'years lived' always refers to the certainty that the unit died at that age: it is merely year of death minus year of birth. I understand that Poisson would be suitable then, is that right?

                Comment


                • #9
                  I want to clarify that my independent variable is continuous, just in case it is relevant for these purposes. Thank you all once again.

                  Comment


                  • #10
                    Marcos:
                    is the regressand (continous; binomial; count) that leads the choice among the available regression tools.
                    That said, are you sure that you have to deal with a simple regression (ie, a regression with one predictor only)?
                    Kind regards,
                    Carlo
                    (StataNow 18.5)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      Marcos:
                      is the regressand (continous; binomial; count) that leads the choice among the available regression tools.
                      That said, are you sure that you have to deal with a simple regression (ie, a regression with one predictor only)?
                      No, Mr. Lazzaro. There are several predictors here, all of them continuous, but I am mostly interested in one of them. Does this multiplicity affect the choice of the regression tools as well?

                      Comment


                      • #12
                        No.
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Re #8: Yes, that is right.

                          Comment

                          Working...
                          X