Hello Statalisters,
I am working on the impacts of migration on academic performance of children in migrant families. The dependent variable is whether the child attends school and the independent variable of interest is whether the household receives remittances or not. Both are discrete binary variables and the independent variable is endogenous. I use historical migration rates at NUTS-2 level as an instrumental variable. This instrument is assumed to be positively correlated with the state of receiving remittances and uncorrelated with the error.
There are a few methods suggested in the literature to be used like IV2SLS, bivariate probit and special regressor methods. None of them worked for me. In IV2SLS, the coefficient of the independent variable is out of (0,1) range.
In the bivariate probit method, my specification does not satisfy the joint normality of errors assumption. So, the estimated coefficients are severly biased.
I tried the semi-parametric estimation method proposed by Gallant and Nychka. The corresponding stata code snp2 is written by Guiseppe De Luca. The semi-parametric estimation method relaxes the joint normality of errors assumption. However, the code uses maximum likelihood estimation and it never converges in my sample. I restrict the iterations to 1 to see whether I could manage to estimate the marginal effects. However, the "mfx compute" command which is suggested by De Luca to estimate marginal effects after running snp2 gives an error which is
[oldest_girl_young_attend:regio] not found
oldest_girl_young is the dependent variable in my regression.
In the special regressor method,by using household head's age as the special regressor I could estimate the coefficients. However, the coefficient is again out of (0,1) range whereas Christopher Baum suggests that special regressor method by construction does not have this out of range coefficients problem. When I want to estimate the marginal effects of the regressors by using the bootstrap option in sspecialreg, Stata gives me "conformability error". Although the code by F Baum (sspecialreg in Stata) estimates the coefficients, it cannot estimate the marginal effects.
I believe the problem is due to the rareness of the treatment in my sample. Approximately, 1.500 households out of 100.000 households receive remittances. That is a ratio of 1.5%. Garry King worked on this rare events problem. In my study in IV2SLS method, the first stage corresponds to what Garry King refers to a rare event problem; the dependent variable has a distribution where the ratio of 1's to 0's is below 5%. However, he couldn't help me in solving my problem.
I believe that the rareness of the treatment causes my problems is also due to the probit regression output of the first stage regression. After I ran a probit regression of receiving remittances on the instrument and other exogenous variables from the second stage, the "estat class" command suggest me that I never predict correctly the households that receive remittances. That is, for the observations that receive remittances, the predicted probabilities never exceeds 0.5. Actually, the predicted probabilities of receiving remittances after a probit regression distributes between -0.1 to 0.1.
I seek help in finding a method which will consistently estimate the coefficients and the marginal effects.
I think that semi-parametric methods which are appropriate for binary choice models with binary endogenous regressors that do not use maximum likelihood estimation may solve my problem. There is Klein and Spady semi-parametric estimation method which uses kernel density. However, I couldn't find a Stata code for it. Can you please help me in finding Stata codes for such semi-parametric methods?
Or any kind of help will be appreciated.
Thank you
Best regards.
Erkan Duman
I am working on the impacts of migration on academic performance of children in migrant families. The dependent variable is whether the child attends school and the independent variable of interest is whether the household receives remittances or not. Both are discrete binary variables and the independent variable is endogenous. I use historical migration rates at NUTS-2 level as an instrumental variable. This instrument is assumed to be positively correlated with the state of receiving remittances and uncorrelated with the error.
There are a few methods suggested in the literature to be used like IV2SLS, bivariate probit and special regressor methods. None of them worked for me. In IV2SLS, the coefficient of the independent variable is out of (0,1) range.
In the bivariate probit method, my specification does not satisfy the joint normality of errors assumption. So, the estimated coefficients are severly biased.
I tried the semi-parametric estimation method proposed by Gallant and Nychka. The corresponding stata code snp2 is written by Guiseppe De Luca. The semi-parametric estimation method relaxes the joint normality of errors assumption. However, the code uses maximum likelihood estimation and it never converges in my sample. I restrict the iterations to 1 to see whether I could manage to estimate the marginal effects. However, the "mfx compute" command which is suggested by De Luca to estimate marginal effects after running snp2 gives an error which is
[oldest_girl_young_attend:regio] not found
oldest_girl_young is the dependent variable in my regression.
In the special regressor method,by using household head's age as the special regressor I could estimate the coefficients. However, the coefficient is again out of (0,1) range whereas Christopher Baum suggests that special regressor method by construction does not have this out of range coefficients problem. When I want to estimate the marginal effects of the regressors by using the bootstrap option in sspecialreg, Stata gives me "conformability error". Although the code by F Baum (sspecialreg in Stata) estimates the coefficients, it cannot estimate the marginal effects.
I believe the problem is due to the rareness of the treatment in my sample. Approximately, 1.500 households out of 100.000 households receive remittances. That is a ratio of 1.5%. Garry King worked on this rare events problem. In my study in IV2SLS method, the first stage corresponds to what Garry King refers to a rare event problem; the dependent variable has a distribution where the ratio of 1's to 0's is below 5%. However, he couldn't help me in solving my problem.
I believe that the rareness of the treatment causes my problems is also due to the probit regression output of the first stage regression. After I ran a probit regression of receiving remittances on the instrument and other exogenous variables from the second stage, the "estat class" command suggest me that I never predict correctly the households that receive remittances. That is, for the observations that receive remittances, the predicted probabilities never exceeds 0.5. Actually, the predicted probabilities of receiving remittances after a probit regression distributes between -0.1 to 0.1.
I seek help in finding a method which will consistently estimate the coefficients and the marginal effects.
I think that semi-parametric methods which are appropriate for binary choice models with binary endogenous regressors that do not use maximum likelihood estimation may solve my problem. There is Klein and Spady semi-parametric estimation method which uses kernel density. However, I couldn't find a Stata code for it. Can you please help me in finding Stata codes for such semi-parametric methods?
Or any kind of help will be appreciated.
Thank you
Best regards.
Erkan Duman
Comment