Hello,
For an assignment, I have to use the heckman selection model with the traditional 2-steps approach. However, the y2 is a binary variable. This thus makes the heckman method invalid but our teacher is jsut trying to prove a point. However i get an error message that I don't understand. (see below the code)
I have to compute the commonly done but wrong approach where people apply the standard two–step procedure from the linear case when their structural equation is of Probit type. So I must set up a small Monte Carlo simulation where I estimate the model using heckprob and, in addition, compute the wrong procedure: (a) estimate selection equation by probit, (b) compute the inverse mills ratio as φ(z 0α)/Φ(z 0α), (c) estimate second equation by probit where I include the inverse mills ratio. I must also 2 different sample sizes in the Monte Carlo simulation, and study the average bias and variance of the two estimates for my β coefficients.
The model:
y ∗ 1 = z 0α + u
y ∗ 2 = x 0β + v
where we observe y1 = 1 if y ∗ 1 > 0 and y1 = 0 otherwise. However, y2 and x are only observed if y1 = 1. Then we observe y2 = 1 if y ∗ 2 > 0 and y2 = 0 otherwise.
My code:
gen x = rnormal()
gen z = uniform()
foreach i of numlist 200 900 { //2 different MC sample sizes,
// N1=200 & N2=900
tempfile mc_heck`i'
local j = 1
while `j' <= 1000 {
quietly {
preserve
sample `i', count
***Generate correlated error terms with 0.5 correlation coefficient
matrix omega=(1, 0.5 \ 0.5, 1)
matrix list omega
* cholesky decomposition to find L (Omega = LL')
matrix L=cholesky(omega)
matrix list L
matrix K=L'
mat list K
* non-correlated normally distributed errors
gen w1=invnorm(uniform())
gen w2=invnorm(uniform())
* generate 2 correlated error terms v1, v2 out of w1 and w2
gen v1=w1*K[1,1]+w2*K[2,1]
gen v2=w1*K[1,2]+w2*K[2,2]
drop w1 w2
corr v1 v2
* Rename the error terms according to the notation used before when describing the model
rename v1 u
rename v2 v
noisily di in yellow "." _continue
gen y1star = z + u
gen y1 = y1star > 0 //we observe y1=1 if y1star > 0 and y1 = 0 otherwise
tab y1
replace x = . if y1 != 1 //x is only observed if y1 = 1
gen y2star = x + v
gen y2 = y2star > 0 //we observe y2 = 1 if y2star > 0 and y2 = 0 otherwise
replace y2 = . if y1 != 1 //y2 is only observed if y1 = 1
tab y2
replace y2 = -9999 if y2 == .
*** Using heckprob command
heckprob y2 x, select(y1 = z)
matrix bH = e(b)
svmat bH, name(heck`i'_)
*** Using "Wrong procedure"
* a) estimate selection equation by probit
probit y1 z
* b) compute the inverse mills ratio
predict za, xb
gen mills = normalden(za)/normprob(za)
* c) estimate the second equation by probit where we include the inverse mills ratio
ivregress 2sls y2 z mills
matrix bW = e(b)
svmat bW, name(wp`i'_)
collapse (mean) heck* wp*
if `j' > 1 {
append using `mc_heck`i''
}
save `mc_heck`i'', replace
restore
local j = `j' + 1
}
}
}
use `mc_heck200', clear
merge using `mc_heck900'
drop _merge*
*Does not work either! error message!
* "cannot compute an improvement parameter -- discontinuous region encountered parameter
* names athrho not found". r(303)
Can you tell me what I need to change? I don't understand where the error message is coming from!
For an assignment, I have to use the heckman selection model with the traditional 2-steps approach. However, the y2 is a binary variable. This thus makes the heckman method invalid but our teacher is jsut trying to prove a point. However i get an error message that I don't understand. (see below the code)
I have to compute the commonly done but wrong approach where people apply the standard two–step procedure from the linear case when their structural equation is of Probit type. So I must set up a small Monte Carlo simulation where I estimate the model using heckprob and, in addition, compute the wrong procedure: (a) estimate selection equation by probit, (b) compute the inverse mills ratio as φ(z 0α)/Φ(z 0α), (c) estimate second equation by probit where I include the inverse mills ratio. I must also 2 different sample sizes in the Monte Carlo simulation, and study the average bias and variance of the two estimates for my β coefficients.
The model:
y ∗ 1 = z 0α + u
y ∗ 2 = x 0β + v
where we observe y1 = 1 if y ∗ 1 > 0 and y1 = 0 otherwise. However, y2 and x are only observed if y1 = 1. Then we observe y2 = 1 if y ∗ 2 > 0 and y2 = 0 otherwise.
My code:
gen x = rnormal()
gen z = uniform()
foreach i of numlist 200 900 { //2 different MC sample sizes,
// N1=200 & N2=900
tempfile mc_heck`i'
local j = 1
while `j' <= 1000 {
quietly {
preserve
sample `i', count
***Generate correlated error terms with 0.5 correlation coefficient
matrix omega=(1, 0.5 \ 0.5, 1)
matrix list omega
* cholesky decomposition to find L (Omega = LL')
matrix L=cholesky(omega)
matrix list L
matrix K=L'
mat list K
* non-correlated normally distributed errors
gen w1=invnorm(uniform())
gen w2=invnorm(uniform())
* generate 2 correlated error terms v1, v2 out of w1 and w2
gen v1=w1*K[1,1]+w2*K[2,1]
gen v2=w1*K[1,2]+w2*K[2,2]
drop w1 w2
corr v1 v2
* Rename the error terms according to the notation used before when describing the model
rename v1 u
rename v2 v
noisily di in yellow "." _continue
gen y1star = z + u
gen y1 = y1star > 0 //we observe y1=1 if y1star > 0 and y1 = 0 otherwise
tab y1
replace x = . if y1 != 1 //x is only observed if y1 = 1
gen y2star = x + v
gen y2 = y2star > 0 //we observe y2 = 1 if y2star > 0 and y2 = 0 otherwise
replace y2 = . if y1 != 1 //y2 is only observed if y1 = 1
tab y2
replace y2 = -9999 if y2 == .
*** Using heckprob command
heckprob y2 x, select(y1 = z)
matrix bH = e(b)
svmat bH, name(heck`i'_)
*** Using "Wrong procedure"
* a) estimate selection equation by probit
probit y1 z
* b) compute the inverse mills ratio
predict za, xb
gen mills = normalden(za)/normprob(za)
* c) estimate the second equation by probit where we include the inverse mills ratio
ivregress 2sls y2 z mills
matrix bW = e(b)
svmat bW, name(wp`i'_)
collapse (mean) heck* wp*
if `j' > 1 {
append using `mc_heck`i''
}
save `mc_heck`i'', replace
restore
local j = `j' + 1
}
}
}
use `mc_heck200', clear
merge using `mc_heck900'
drop _merge*
*Does not work either! error message!
* "cannot compute an improvement parameter -- discontinuous region encountered parameter
* names athrho not found". r(303)
Can you tell me what I need to change? I don't understand where the error message is coming from!