How to write a heckman selection model whose second stage is a finite mixture model

Bing Zhang

Join Date: Oct 2018

Posts: 40
#1

How to write a heckman selection model whose second stage is a finite mixture model

02 Mar 2019, 18:51

Dear statalist:

I want to do a heckman seclection model whose second stage is a fmm by joint maximum likelihood estimation.But I don't how to do it.
As you see,I have got the key to do heckman whose second stage is a ols.So I am looking forward to your further help.
My thesis is about the estimation of wage equation.It's obvious that those without work can't be observed so as to result "selction bias". My wage equation is a finite mixture model,and my consideration is that the first stage is a probit which estiamate people work or not and second stage is a fmm about wage.
Thank you for your patience and I'm looking forward to your help.
Zhang Bing

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear program myheckman args lnf xb1 g lns1 zg1 qui { * Selection equation* tempvar lnf1 lnf2 gen double `lnf2'=ln(normal(`zg1')) if $ML_y2==1 replace `lnf2'=ln(1-normal(`zg1')) if $ML_y2==0 * Outcome Equation* gen double `lnf1'=ln(normalden($ML_y1,`xb1'+`g'*normalden(`zg1')/normal(`zg1'),exp(`lns1'))) replace `lnf1'=0 if $ML_y2==0 * Adding all Loglikelihoods* replace `lnf'=`lnf1'+`lnf2' } end webuse nlswork gen age2=age*age gen age3=age*age2 gen age4=age2*age2 gen employment=( wks_ue==0 ) ml model lf myheckman (xb1:ln_wage=age age2 age3 age4 collgrad) (g:) (lns1:) (zg1:employment=age age2 age3 age4 collgrad) ml search ml max end

------------------ copy up to and including the previous line ------------------

Last edited by Bing Zhang; 02 Mar 2019, 18:57.
Tags: None

FernandoRios

Join Date: Apr 2014
Posts: 2469

02 Mar 2019, 20:02

Hi Bing
As i mentioned, at some point i got a program to do exactly this for a class as an example for the application of Maximum Likelihood methods. Below I provide you with one example that does this:

Code:

 webuse womenwk, clear
*For FMM
gen dwage=wage!=.
probit dwage married children educ age
capture drop mill
predict double mill, score

capture program drop fmmreg
program fmmreg
qui {
   args lnf xb1 lns1 xb2 lns2 lp zb
   tempvar f1 f2
   gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
   tempvar p
   gen double `p'=exp(`lp')/(1+exp(`lp'))
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
}
end


capture program drop fmmreg2
program fmmreg2
qui {
   args lnf xb1 g1 lns1 xb2 g2 lns2 lp zb
   tempvar f1 f2
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   tempvar mill
   gen double `mill'=normalden(`zb')/normal(`zb')
   
   gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
   tempvar p
   gen double `p'=exp(`lp')/(1+exp(`lp'))

   replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
}
end

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (lp:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
matrix bin2=e(b)
ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (lp:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
ml display

Be aware, however, that this is a very unstable model, and difficult to maximize. Even the code that i provided you before (the one you show) will show problems during the maximization. And that problems doubles when trying to do FMM.
My suggestion would be to estimate the initial values for the parameters using the "fmm" function, which is quite robust to find good solutions (at least on my experience), and that may help you find the "correct" coefficients for your model.
HTH
Fernando

Comment

Bing Zhang

Join Date: Oct 2018
Posts: 40

02 Mar 2019, 21:35

Originally posted by FernandoRios View Post

Code:

webuse womenwk, clear
*For FMM
gen dwage=wage!=.
probit dwage married children educ age
capture drop mill
predict double mill, score

capture program drop fmmreg
program fmmreg
qui {
args lnf xb1 lns1 xb2 lns2 lp zb
tempvar f1 f2
gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
tempvar p
gen double `p'=exp(`lp')/(1+exp(`lp'))
replace `lnf'=ln(normal(`zb')) if $ML_y2==1
replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
}
end


capture program drop fmmreg2
program fmmreg2
qui {
args lnf xb1 g1 lns1 xb2 g2 lns2 lp zb
tempvar f1 f2
replace `lnf'=ln(normal(`zb')) if $ML_y2==1
replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
tempvar mill
gen double `mill'=normalden(`zb')/normal(`zb')

gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
tempvar p
gen double `p'=exp(`lp')/(1+exp(`lp'))

replace `lnf'=`lnf'+ln(`p'*`f1'+(1-`p')*`f2') if $ML_y2==1
}
end

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (lp:) (zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult
matrix bin2=e(b)
ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age ) (g2:) (lns2:) (lp:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)
ml display

Thank you very much!I am going to try it.

Comment

mustapha ziroili

Join Date: Jan 2015
Posts: 11

05 Nov 2019, 13:57

Thank you for the code.
I tried to adapt Fernando's code to a mixture of 3 components (code below) but I get an error message!

Code:

webuse womenwk, clear
*For FMM
gen dwage=wage!=.
probit dwage married children educ age
capture drop mill
predict double mill, score

capture program drop fmmreg
program fmmreg
qui {
   args lnf xb1 lns1 xb2 lns2 xb3 lns3 lp1 lp2 lp3 zb
   tempvar f1 f2 f3
   gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
   gen double `f3'=normalden($ML_y1,`xb3',exp(`lns3'))
   tempvar p1 p2 p3
   gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
   gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
   gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
}
end


capture program drop fmmreg2
program fmmreg2
qui {
   args lnf xb1 g1 lns1 xb2 g2 lns2 xb3 g3 lns3 lp1 lp2 lp3 zb
   tempvar f1 f2 f3
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   tempvar mill
   gen double `mill'=normalden(`zb')/normal(`zb')
   
   gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
   gen double `f3'=normalden($ML_y1,`xb3'+`g3'*`mill',exp(`lns3'))
   tempvar p1 p2 p3
   gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
   gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
   gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))

   replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
}
end

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:wage=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
matrix bin2=e(b)
ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (xb3:wage=educ age ) (g3:) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
ml display

The error message:

Code:

. ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:wage=educ 
> age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing t
> echnique( nr bhhh) difficult

initial:       log likelihood =     -<inf>  (could not be evaluated)
could not find feasible values
r(491);

end of do-file

I don't know hwo to fix this issue. Will be gratefull for any suggestion. Thanks in advance for your help!

Comment

FernandoRios

Join Date: Apr 2014
Posts: 2469

05 Nov 2019, 14:54

Hi Mustapha
I see only one typo in your syntax
you write this:

Code:

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) ///
(lns2:) (xb3:wage=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) ///
(zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult

when it should be this

Code:

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) ///
(lns2:) (xb3: =educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) ///
(zb:dwage=married children educ age), maximize missing technique( nr bhhh) difficult

Notice there is no "wage" in the xb3 equation.
Other than that. It is possible that there are not 3 well defined groups in this data, which is why trying to run the above command finds no solution.
If you want to test your code, you may have to start with simulated data
HTH
Fernando

Comment

mustapha ziroili

Join Date: Jan 2015

Posts: 11
#6

06 Nov 2019, 04:36

Thank you Fernando for these insights.
In fact I am interesting to estimate finite mixture model knowing (a priori) the share of one compoenent following the paper of Gunther, I. and Launov, A. (2012). The idea is that we know the share of formal employment a priory and we look for 2 components (at least) finite mixture in the informal employment. In your opinion there is an estimation error if I split my sample to formal/informal and run two step heckman for formal part and your code for the informal part?
Thank you in advance for your feedback.

Reference:
Gunther, I. and Launov, A., 2012. Informal employment in developing countries: opportunity or last resort? Journal of Development Economics, 97 (1), 88–98.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#7

06 Nov 2019, 12:00

Oh I see.
I actually saw a master student doing exactly that in the past.
So here are my thoughts.
1. Because formal and informal's are already identified, I don't think you need to run a 3 group FMM model. You can simply do a fmm 2 for informals.
2. Regarding the selection correction term
In my toy example, I was assuming that the same selection term affects people in both groups.
I can see that the same thing can be done for formals and informals. Estimate a single "selection model", and apply it for all sectors. However, you could estimate separate regressions for formals and informals too.
3. I think doing the formals, and informals separate would fine, especially if the selection model is estimated separately for formals and informals.
However, if the selection equation is to see why some people become formal vs informal, then I think you should include ALL equations into a single estimation. This implies adapting the code above for a third group,

Let me know if that last scenario is the one you are working on.
Fernando
Comment
mustapha ziroili

Join Date: Jan 2015

Posts: 11
#8

06 Nov 2019, 15:56

Thank you Fernando for these insights.
In fact I am interesting in a mixed approach between (1) and (3):
On selection equation for all components of labour market and making all equations in one maximum likelohood function. The issue is that how to take into account in the program above that we know the share of formal employment that is "p1" ( scalar) and p2+p3=1-p1 (I am not a pro on ml programming)? the second issue is how to use the correction method proposed by Murphy and Topel to deal with the biais of the variance-covariance matrix from the second stage (after the selection equation)? because the estimation strategy is based on two steps: one for selection equation and introduce these estimation in fmm.

Any suggestion will be appreciated! Thanks

Reference:
Murphy, K.M. and Topel, R.H., 1985. Estimation and inference in two-step econometric models. Journal of Business and Economic Statistics, 3, 370–379.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#9

06 Nov 2019, 18:30

Hi Mustapha,
The good news is that Murphy and Topel (1985( correction is not needed, because by estimating the whole system via Maximum Likelihood, you are already accounting for the fact that the model comes from a two step strategy.
See my paper here: https://www.stata-journal.com/articl...article=st0520
For the first part of the question, that is a bit more complicated. As you need to know how to write the ML (more than the program, write down the actual function you need to estimate).
If you can define the exact function you want to estimate for your paper, Translating into ML programming is straight forward.

Fernando
Comment

mustapha ziroili

Join Date: Jan 2015
Posts: 11

#10

07 Nov 2019, 01:49

Thank you for the news about the correction of Murphy and Topel (1985).
The log-likelihood that I want to maximize is:

Click image for larger version

Name: loglikelihood.JPG
Views: 1
Size: 23.3 KB
ID: 1523689

Where : pf (share of formal is known a priory) and: function f is:

Click image for larger version

Name: function f.JPG
Views: 1
Size: 26.4 KB
ID: 1523690

Based on your code I need to know how to constraint p1 (share of formal) to equal to a scalar?

Code:

webuse womenwk, clear
*For FMM
gen dwage=wage!=.
probit dwage married children educ age
capture drop mill
predict double mill, score

capture program drop fmmreg
program fmmreg
qui {
   args lnf xb1 lns1 xb2 lns2 xb3 lns3 lp1 lp2 lp3 zb
   tempvar f1 f2 f3
   gen double `f1'=normalden($ML_y1,`xb1',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2',exp(`lns2'))
   gen double `f3'=normalden($ML_y1,`xb3',exp(`lns3'))
   tempvar p1 p2 p3
   gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
   gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
   gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
}
end


capture program drop fmmreg2
program fmmreg2
qui {
   args lnf xb1 g1 lns1 xb2 g2 lns2 xb3 g3 lns3 lp1 lp2 lp3 zb
   tempvar f1 f2 f3
   replace `lnf'=ln(normal(`zb'))  if $ML_y2==1
   replace `lnf'=ln(normal(-`zb')) if $ML_y2==0
   tempvar mill
   gen double `mill'=normalden(`zb')/normal(`zb')
   
   gen double `f1'=normalden($ML_y1,`xb1'+`g1'*`mill',exp(`lns1'))
   gen double `f2'=normalden($ML_y1,`xb2'+`g2'*`mill',exp(`lns2'))
   gen double `f3'=normalden($ML_y1,`xb3'+`g3'*`mill',exp(`lns3'))
   tempvar p1 p2 p3
   gen double `p1'=1/(1+exp(`lp2')+exp(`lp3'))
   gen double `p2'=exp(`lp2')/(1+exp(`lp2')+exp(`lp3'))
   gen double `p3'=exp(`lp3')/(1+exp(`lp2')+exp(`lp3'))

   replace `lnf'=`lnf'+ln(`p1'*`f1'+`p2'*`f2'+`p3'*`f3') if $ML_y2==1
}
end

ml model lf fmmreg (xb1:wage=educ age mill ) (lns1:) (xb2:=educ age mill ) (lns2:) (xb3:=educ age mill ) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize   missing technique( nr bhhh) difficult
matrix bin2=e(b)
ml model lf fmmreg2 (xb1:wage=educ age ) (g1:) (lns1:) (xb2:=educ age  ) (g2:)  (lns2:) (xb3:=educ age ) (g3:) (lns3:) (lp1:) (lp2:) (lp3:) (zb:dwage=married children educ age), maximize init(bin2, skip) missing robust technique( nr bhhh)  
ml display

Any idea how to do it?

Announcement