Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • biprobit and bitobit using cmp command

    Dear all,

    I estimated a bivariate probit model using the Stata command biprobit (Stata version 12.1). There are two dependent variables, Y1 and Y2, and the command only uses observations where observations on both, Y1 and Y2, are available. Observations with missing information on Y1 or Y2 are dropped automatically.

    I re-estimated the model using Roodman’s cmp command. It turned out that, in contrast to the biprobit command, this command includes all observations where at least one of the dependent variables is observed. The number of observations when using the cmp command is, thus, larger as compared to the one under biprobit. Consequently, regression results slightly differ.

    So, I am wondering what kind of likelihood function the cmp command uses in order to estimate the bivariate probit? Why does cmp yield results even when considering observations including missing data? Which of the two commands is preferred when results differ and why?

    In an alternative specification Y1 and Y2 are no longer binary but shares, that is, both dependent variables are constrained to the unit interval. The same issue as above arises, that is, the user-written bitobit command uses fewer observations than the cmp command. This gives rise to the exact same questions as above.

    Many thanks in advance for any help!

    Best regards,
    Philipp

  • #2
    Dear all, dear Phillip,

    Have you managed to sort out this problem somehow?

    I encountered exactly the same issue when estimating a multivariate probit of 6 equations and identical regressors in each equation.

    When I compare the results from -mvprobit- and -cmp-, the latter seems to use all observations where any of the dependent variables are observable. My sample size dropped sharply for -mvprobit-, as expected, but not for -cmp-. Moreover, the results of -cmp- seem to be more similar (though not identical) to single equation probits than to the -mvprobit- results. I also tried restricting the sample used by -cmp- to the subsample used by -mvprobit-. The results in this case were somewhat similar, but by far not identical.

    So I am also wondering what is going on. I would clearly prefer using -cmp- mainly because marginal effects (-margins-) can be computed automatically, which is not the case for -mvprobit-.

    Thanks for any clues and best regards,
    Peter

    Comment


    • #3
      Hi Peter,
      I believe the way CMP is using the information is assuming some type of selection model, if you have dependent variables that are partially observed.
      For instance, in most cases, when you have missing dependent or independent variables, most estimation will simply use all observations with complete information. Selection models, however, will use as much information as possible, and if some missing information exists, it will assume its because of some underlying selection process. Just like when you use a heckman model.
      Since the CMP estimates models using a multivariate normal distribution, I believe it's trying to model your data as if there is a selection process behind it.
      Hope this helps.
      Fernando

      Comment


      • #4
        Fernando has it about right. cmp is greedy. So if only one equation is fully observed for an observation, then cmp models it as single-equation probit or tobit. This can be good, e.g., if the pattern of missingness is exogenous, or if it is endogenous in a way that is dealt with, as in selection models.

        You can constrain this behavior with an "if" clause or by creating indicator variables that are zero for observations you want to exclude and passing them in the ind() option.
        --David

        Comment


        • #5
          Dear David and Fernando,
          Thank you for your answers!
          Best,
          Peter

          Comment

          Working...
          X