Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • two part model in panel data

    Hi Statalist,

    Happy new year!

    I have questions regarding using two part model in panel data. I want to analyse health experditures and chronic condition, and want to estimate total health expenditure expenditures condtional on a set of covariates (e.g. age, gender,....) and calculate marginal effect of chronic condition variables. I have two waves of panel data with repeated measures. Sample size is around 40,000 per wave. The health expenditure outcome has massive zeros (~30%).

    There are literatures comparing different models, e.g. linear OLS on natural scale, OLS on log transformed expenditure, GLM, Poisson, two part model, selection model etc., in cross-sectional settings. I would like to apply two part model with first part logit model and second part - log link and gamma distribution, but in longitidunal settings.


    First question, when using
    Code:
    twopm yvar $xvar, 
    margins, dydx(*)
    I get unconditional/combined marginal effects from both parts of two-part model.


    If using
    Code:
    ​​twopm yvar $xvar, 
    margins if yvar >0, dydx(*)
    I get unconditional margins based on the sample of positives.


    If using
    Code:
    glm yvar $xvar if yvar >0,
    margins, dydx(*)
    , I get conditional marginal effects for the sample of positives.

    Are my understanding correct here?


    Second question, is it feasible to run two-part model in panel data? If I run them seperately, how do I get a combined marginal effects from both models?

    Many thanks in advance.
    Tian Xin

  • #2
    perhaps ...
    Code:
    help zip

    Comment


    • #3
      The -zip- command uses the assumption that, conditional on y > 0, it follows a rescaled Poisson distribution. I think -churdle- is a better bet. It should do what the use-written command -twopm- does, and more. In particular, -churdle- computes the average marginal effects accounting for the two parts and provides valid standard errors. With panel data, you need to use vce(cluster id) to account for serial correlation.

      Comment


      • #4
        Originally posted by George Ford View Post
        perhaps ...
        Code:
        help zip
        Thanks a lot George!

        Comment


        • #5
          Originally posted by Jeff Wooldridge View Post
          The -zip- command uses the assumption that, conditional on y > 0, it follows a rescaled Poisson distribution. I think -churdle- is a better bet. It should do what the use-written command -twopm- does, and more. In particular, -churdle- computes the average marginal effects accounting for the two parts and provides valid standard errors. With panel data, you need to use vce(cluster id) to account for serial correlation.
          Thank you very much Jeff. I will start with -churdle-.

          Comment

          Working...
          X