Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Model for dependent variable with integer values of certain magnitude

    Dear Statalist members,

    I would like to run a regression in which the dependent variable is not continuous and takes only certain values (specifically, integer values from 0 to 40).
    Due to the nature of the variable I was advised to not treat such a case with normal estimation methods (i.e, OLS). However, I am not aware which model can address this issue ( i.e., rank-ordered logistic regression or something similar?).
    Could you please advise me on that?

    Thank you in advance

  • #2
    Kleopatra:
    you may want to consider -poisson-.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      you don't really tell us enough; you might want to take a look at the help for -truncreg- as well as the help for -poisson- to see which is closer to your situation

      Comment


      • #4
        I'd consider dividing by 40 and using fracreg.

        Comment


        • #5
          To provide you with some additional info as requested, the dependent variable has a minimum integer value of "0" and a maximum value of "40".
          In addition, If I am not mistaken Poisson model assumes same mean and variance, but this does not hold for my case.

          Dr. Cox, unfortunately I believe that fracreg is available only in Stata 14 (I use Stata 13).

          Comment


          • #6
            Indeed, which is why the FAQ Advice requests that you will tell us that:

            11. What should I say about the version of Stata I use?

            The current version of Stata is 14.2. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.
            But not to worry. Advice such as that within http://www.stata-journal.com/sjpdf.h...iclenum=st0147 still applies to you. You can fire up glm.

            Comment


            • #7
              I suggest using the binomial distribution with an upper bound of 40. I discuss this in my book "Econometric Analysis of Cross Section and Panel Data." But you should use robust inference.

              Code:
              glm y x1 ... xk, fam(bin 40) link(logit) robust
              JW

              Comment


              • #8
                Nick's idea is good, too, but doesn't exploit the integer nature of the response.

                Comment


                • #9
                  Jeff: Naturally I like the principle of respecting that the data are integers, but what difference will it make? An example is not an argument, but I fired up the auto data and I pretend that rep78 is counted. Stata doesn't know otherwise.

                  I suppose, in some circles, a report that you scaled the integers to fractions might cause puzzlement or discomfort, as lack of respect or an unjustified extra step, but that's a matter of public relations.

                  (In this example, it's not a good model either way; my suggestion is that it's the same model.)

                  Code:
                  .   sysuse auto, clear
                  (1978 Automobile Data)
                  
                  . glm rep78 mpg weight, link(logit) f(binomial 5) vce(robust)
                  
                  Iteration 0:   log pseudolikelihood = -90.880341  
                  Iteration 1:   log pseudolikelihood = -90.622171  
                  Iteration 2:   log pseudolikelihood = -90.622036  
                  Iteration 3:   log pseudolikelihood = -90.622036  
                  
                  Generalized linear models                         No. of obs      =         69
                  Optimization     : ML                             Residual df     =         66
                                                                    Scale parameter =          1
                  Deviance         =   64.7931372                   (1/df) Deviance =   .9817142
                  Pearson          =  54.26662699                   (1/df) Pearson  =   .8222216
                  
                  Variance function: V(u) = u*(1-u/5)               [Binomial]
                  Link function    : g(u) = ln(u/(5-u))             [Logit]
                  
                                                                    AIC             =   2.713682
                  Log pseudolikelihood =  -90.6220359               BIC             =  -214.6579
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                         rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |   .0438609   .0371556     1.18   0.238    -.0289627    .1166845
                        weight |  -.0002177    .000224    -0.97   0.331    -.0006567    .0002213
                         _cons |   .5158199   1.427755     0.36   0.718    -2.282528    3.314168
                  ------------------------------------------------------------------------------
                  
                  . gen rep78_2 = rep78/5
                  (5 missing values generated)
                  
                  . fracreg logit rep78_2 mpg weight, vce(robust)
                  
                  Iteration 0:   log pseudolikelihood = -42.816676  
                  Iteration 1:   log pseudolikelihood =  -42.06548  
                  Iteration 2:   log pseudolikelihood = -42.061805  
                  Iteration 3:   log pseudolikelihood = -42.061805  
                  
                  Fractional logistic regression                  Number of obs     =         69
                                                                  Wald chi2(2)      =      16.96
                                                                  Prob > chi2       =     0.0002
                  Log pseudolikelihood = -42.061805               Pseudo R2         =     0.0262
                  
                  ------------------------------------------------------------------------------
                               |               Robust
                       rep78_2 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                           mpg |   .0438609   .0371556     1.18   0.238    -.0289627    .1166845
                        weight |  -.0002177    .000224    -0.97   0.331    -.0006567    .0002213
                         _cons |     .51582   1.427755     0.36   0.718    -2.282528    3.314168
                  ------------------------------------------------------------------------------

                  Comment


                  • #10
                    Good point Nick! It's the same when the upper bound is the same for all observations. I once knew that ....

                    Cheers,
                    Jeff

                    Comment


                    • #11
                      Jeff Wooldridge Thanks in turn for tactfully underlining the difference between constant and variable upper bounds.

                      This illuminates a current project with a student: her dataset includes a response which is # of days in a month with a particular hydrological condition and the upper bound is thus 28 to 31, depending on the month in question. While 14/28 (for example) is the same sample proportion as 15/30. its probability is not the same under any particular binomial model and standard errors will vary too. I must check how much difference it makes to ignore month length and to take it into consideration: at a guess, not much. ,

                      But I also guess that with say

                      # people in a family with university degrees

                      variations in family size would be more important.

                      Comment


                      • #12
                        I would like to thank you all for your prompt and always up to the point responses.

                        Comment

                        Working...
                        X