Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to write a heckman selection model whose second stage is a finite mixture model

    Dear statalist:

    I want to do a heckman seclection model whose second stage is a fmm by joint maximum likelihood estimation.But I don't how to do it.
    As you see,I have got the key to do heckman whose second stage is a ols.So I am looking forward to your further help.
    My thesis is about the estimation of wage equation.It's obvious that those without work can't be observed so as to result "selction bias". My wage equation is a finite mixture model,and my consideration is that the first stage is a probit which estiamate people work or not and second stage is a fmm about wage.
    Thank you for your patience and I'm looking forward to your help.
    Zhang Bing





    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    
    program myheckman
    args lnf xb1 g lns1 zg1
        qui {
            * Selection equation*
            tempvar lnf1 lnf2
            gen double `lnf2'=ln(normal(`zg1')) if $ML_y2==1
            replace    `lnf2'=ln(1-normal(`zg1')) if $ML_y2==0
            * Outcome Equation*
            gen double `lnf1'=ln(normalden($ML_y1,`xb1'+`g'*normalden(`zg1')/normal(`zg1'),exp(`lns1')))
            replace    `lnf1'=0 if $ML_y2==0
            * Adding all Loglikelihoods*
            replace `lnf'=`lnf1'+`lnf2'
        }
    end
    
    webuse nlswork
    gen age2=age*age
    gen age3=age*age2
    gen age4=age2*age2
    gen employment=( wks_ue==0 )
    
    ml model lf myheckman (xb1:ln_wage=age age2 age3 age4 collgrad) (g:) (lns1:) (zg1:employment=age age2 age3 age4 collgrad)
    ml search
    ml max
    
    
    end
    ------------------ copy up to and including the previous line ------------------
    Last edited by Bing Zhang; 02 Mar 2019, 19:57.

  • #2
    Hi Bing
    As i mentioned, at some point i got a program to do exactly this for a class as an example for the application of Maximum Likelihood methods. Below I provide you with one example that does this:
    Code:
     webuse womenwk, clear
    *For FMM
    gen dwage=wage!=.
    probit dwage married children educ age
    capture drop mill
    predict double mill, score
    
    capture program drop fmmreg
    program fmmreg
    qui {
       args lnf xb1 lns1 xb2 lns2 lp zb
       tempvar f1 f2
       gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
       gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
       tempvar p
       gen double `p'=exp(`lp')/(1+exp(`lp'))
       replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
       replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
       replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
    }
    end
    
    
    capture program drop fmmreg2
    program fmmreg2
    qui {
       args lnf xb1 g1 lns1 xb2 g2 lns2 lp zb
       tempvar f1 f2
       replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
       replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
       tempvar mill
       gen double `mill'=normalden(`zb')/normal(`zb')
       
       gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
       gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
       tempvar p
       gen double `p'=exp(`lp')/(1+exp(`lp'))
    
       replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
    }
    end
    
    ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (lp:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
    matrix bin2=e(b)
    ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (lp:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
    ml display
    Be aware, however, that this is a very unstable model, and difficult to maximize. Even the code that i provided you before (the one you show) will show problems during the maximization. And that problems doubles when trying to do FMM.
    My suggestion would be to estimate the initial values for the parameters using the "fmm" function, which is quite robust to find good solutions (at least on my experience), and that may help you find the "correct" coefficients for your model.
    HTH
    Fernando

    Comment


    • #3
      Originally posted by FernandoRios View Post
      Hi Bing
      As i mentioned, at some point i got a program to do exactly this for a class as an example for the application of Maximum Likelihood methods. Below I provide you with one example that does this:
      Code:
      webuse womenwk, clear
      *For FMM
      gen dwage=wage!=.
      probit dwage married children educ age
      capture drop mill
      predict double mill, score
      
      capture program drop fmmreg
      program fmmreg
      qui {
      args lnf xb1 lns1 xb2 lns2 lp zb
      tempvar f1 f2
      gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
      gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
      tempvar p
      gen double `p'=exp(`lp')/(1+exp(`lp'))
      replace `lnf'=ln(normal(`zb')) if $ML_y2==1
      replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
      replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
      }
      end
      
      
      capture program drop fmmreg2
      program fmmreg2
      qui {
      args lnf xb1 g1 lns1 xb2 g2 lns2 lp zb
      tempvar f1 f2
      replace `lnf'=ln(normal(`zb')) if $ML_y2==1
      replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
      tempvar mill
      gen double `mill'=normalden(`zb')/normal(`zb')
      
      gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
      gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
      tempvar p
      gen double `p'=exp(`lp')/(1+exp(`lp'))
      
      replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
      }
      end
      
      ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (lp:) (zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult
      matrix bin2=e(b)
      ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age ) (g2:) (lns2:) (lp:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)
      ml display
      Be aware, however, that this is a very unstable model, and difficult to maximize. Even the code that i provided you before (the one you show) will show problems during the maximization. And that problems doubles when trying to do FMM.
      My suggestion would be to estimate the initial values for the parameters using the "fmm" function, which is quite robust to find good solutions (at least on my experience), and that may help you find the "correct" coefficients for your model.
      HTH
      Fernando
      Thank you very much!I am going to try it.

      Comment


      • #4
        Thank you for the code.
        I tried to adapt Fernando's code to a mixture of 3 components (code below) but I get an error message!
        Code:
        webuse womenwk, clear
        *For FMM
        gen dwage=wage!=.
        probit dwage married children educ age
        capture drop mill
        predict double mill, score
        
        capture program drop fmmreg
        program fmmreg
        qui {
           args lnf xb1 lns1 xb2 lns2 xb3 lns3 lp1 lp2 lp3 zb
           tempvar f1 f2 f3
           gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
           gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
           gen double `f3'=normalden($ML_y1,`xb3',exp(`lns3'))
           tempvar p1 p2 p3
           gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
           gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
           gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
           replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
           replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
           replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
        }
        end
        
        
        capture program drop fmmreg2
        program fmmreg2
        qui {
           args lnf xb1 g1 lns1 xb2 g2 lns2 xb3 g3 lns3 lp1 lp2 lp3 zb
           tempvar f1 f2 f3
           replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
           replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
           tempvar mill
           gen double `mill'=normalden(`zb')/normal(`zb')
           
           gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
           gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
           gen double `f3'=normalden($ML_y1,`xb3'+`g3'*`mill',exp(`lns3'))
           tempvar p1 p2 p3
           gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
           gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
           gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
        
           replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
        }
        end
        
        ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:wage=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
        matrix bin2=e(b)
        ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (xb3:wage=educ age ) (g3:) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
        ml display
        The error message:
        Code:
        . ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:wage=educ 
        > age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing t
        > echnique( nr bhhh) difficult
        
        initial:       log likelihood =     -<inf>  (could not be evaluated)
        could not find feasible values
        r(491);
        
        end of do-file
        I don't know hwo to fix this issue. Will be gratefull for any suggestion. Thanks in advance for your help!

        Comment


        • #5
          Hi Mustapha
          I see only one typo in your syntax
          you write this:
          Code:
          ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) ///
          (lns2:) (xb3:wage=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) ///
          (zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult
          when it should be this
          Code:
          ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) ///
          (lns2:) (xb3: =educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) ///
          (zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult
          Notice there is no "wage" in the xb3 equation.
          Other than that. It is possible that there are not 3 well defined groups in this data, which is why trying to run the above command finds no solution.
          If you want to test your code, you may have to start with simulated data
          HTH
          Fernando

          Comment


          • #6
            Thank you Fernando for these insights.
            In fact I am interesting to estimate finite mixture model knowing (a priori) the share of one compoenent following the paper of Gunther, I. and Launov, A. (2012). The idea is that we know the share of formal employment a priory and we look for 2 components (at least) finite mixture in the informal employment. In your opinion there is an estimation error if I split my sample to formal/informal and run two step heckman for formal part and your code for the informal part?
            Thank you in advance for your feedback.


            Reference:
            Gunther, I. and Launov, A., 2012. Informal employment in developing countries: opportunity or last resort? Journal of Development Economics, 97 (1), 88–98.

            Comment


            • #7
              Oh I see.
              I actually saw a master student doing exactly that in the past.
              So here are my thoughts.
              1. Because formal and informal's are already identified, I don't think you need to run a 3 group FMM model. You can simply do a fmm 2 for informals.
              2. Regarding the selection correction term
              In my toy example, I was assuming that the same selection term affects people in both groups.
              I can see that the same thing can be done for formals and informals. Estimate a single "selection model", and apply it for all sectors. However, you could estimate separate regressions for formals and informals too.
              3. I think doing the formals, and informals separate would fine, especially if the selection model is estimated separately for formals and informals.
              However, if the selection equation is to see why some people become formal vs informal, then I think you should include ALL equations into a single estimation. This implies adapting the code above for a third group,

              Let me know if that last scenario is the one you are working on.
              Fernando

              Comment


              • #8
                Thank you Fernando for these insights.
                In fact I am interesting in a mixed approach between (1) and (3):
                On selection equation for all components of labour market and making all equations in one maximum likelohood function. The issue is that how to take into account in the program above that we know the share of formal employment that is "p1" ( scalar) and p2+p3=1-p1 (I am not a pro on ml programming)? the second issue is how to use the correction method proposed by Murphy and Topel to deal with the biais of the variance-covariance matrix from the second stage (after the selection equation)? because the estimation strategy is based on two steps: one for selection equation and introduce these estimation in fmm.

                Any suggestion will be appreciated! Thanks

                Reference:
                Murphy, K.M. and Topel, R.H., 1985. Estimation and inference in two-step econometric models. Journal of Business and Economic Statistics, 3, 370–379.

                Comment


                • #9
                  Hi Mustapha,
                  The good news is that Murphy and Topel (1985( correction is not needed, because by estimating the whole system via Maximum Likelihood, you are already accounting for the fact that the model comes from a two step strategy.
                  See my paper here: https://www.stata-journal.com/articl...article=st0520
                  For the first part of the question, that is a bit more complicated. As you need to know how to write the ML (more than the program, write down the actual function you need to estimate).
                  If you can define the exact function you want to estimate for your paper, Translating into ML programming is straight forward.

                  Fernando

                  Comment


                  • #10
                    Thank you for the news about the correction of Murphy and Topel (1985).
                    The log-likelihood that I want to maximize is:
                    Click image for larger version

Name:	loglikelihood.JPG
Views:	1
Size:	23.3 KB
ID:	1523689

                    Where : pf (share of formal is known a priory) and: function f is:
                    Click image for larger version

Name:	function f.JPG
Views:	1
Size:	26.4 KB
ID:	1523690

                    Based on your code I need to know how to constraint p1 (share of formal) to equal to a scalar?
                    Code:
                    webuse womenwk, clear
                    *For FMM
                    gen dwage=wage!=.
                    probit dwage married children educ age
                    capture drop mill
                    predict double mill, score
                    
                    capture program drop fmmreg
                    program fmmreg
                    qui {
                       args lnf xb1 lns1 xb2 lns2 xb3 lns3 lp1 lp2 lp3 zb
                       tempvar f1 f2 f3
                       gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
                       gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
                       gen double `f3'=normalden($ML_y1,`xb3',exp(`lns3'))
                       tempvar p1 p2 p3
                       gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
                       gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
                       gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
                       replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
                       replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
                       replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
                    }
                    end
                    
                    
                    capture program drop fmmreg2
                    program fmmreg2
                    qui {
                       args lnf xb1 g1 lns1 xb2 g2 lns2 xb3 g3 lns3 lp1 lp2 lp3 zb
                       tempvar f1 f2 f3
                       replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
                       replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
                       tempvar mill
                       gen double `mill'=normalden(`zb')/normal(`zb')
                       
                       gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
                       gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
                       gen double `f3'=normalden($ML_y1,`xb3'+`g3'*`mill',exp(`lns3'))
                       tempvar p1 p2 p3
                       gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
                       gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
                       gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
                    
                       replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
                    }
                    end
                    
                    ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
                    matrix bin2=e(b)
                    ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (xb3:=educ age ) (g3:) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
                    ml display
                    Any idea how to do it?

                    Comment

                    Working...
                    X