Our project is to estimate "true" rate of return to education for an ethnic minority.
Due to the small sample size of the ethnic group, I wonder that the results will be biased.
For instance, wage workers aged between 15 and 65 in the household survey which
we are using is 7988, but that of the ethnic group is only 65.
As known from previous studies on return to education, an education variable is
endogenous so that we would like to apply Instrument Variable approach.As an instrument,
we are thinking to use "quarter of birth" which is a common instrument in Economics of
Education.
The baseline model is as follows:
Yi = B0i + B1i*x1i + B2i*X2i + B3i*X3i + B4i*X4i + B5i*X5i + Ui (1)
x1i = W0i + W1*Qi + W2i*X2i + W3i*X3i + W4i*X4i + W5i*X5i + ri (2)
Where,
Yi = Log_wage
x1i = years of education
X2i = potential working experience (Age- years of education)
X3i = square of the potential working experience
X4i = The ethnicity dummy
X5i = an interaction term of "X1i" and "X4i"
Ui = error term at the second stage
Qi = quarter of birth (instrument)
ri = error term at the first stage
This model might be able to solve the endogenous bias but still the issue of
small sample size is unsolved(probably, the estimated standard error is not
correct because of clustering).
Therefore, it might be an option to adjust the clustered SE by using "vce" option,
identifying cluster unit (in this case, the ethnicity variable) when we run
"ivregress" command.
In order to deal with the small sample size issue, I guess that multilevel
modeling can be also alternative option. Gelman & Hill (2006) in their book "Data analysis
Using Regression and Multilevel/Hierarchical Model (http://www.stat.columbia.edu/~gelman/arm/software/)
" indicates how to use multilevel analysis with IV with command of "R" statistical
package (but no Stata commands).
So, I think multilevel modeling with IV is theoretically possible but I do not know how
to exactly apply this way of the analysis by Stata (I mean, which should I use
"ivregress" or "xtmixed"? and if I use one of them, how can I specify its command?).
Thank you very much for your kind help in advance,
Kentaro Shimada
PhD Candidate, Kobe University, Japan
Due to the small sample size of the ethnic group, I wonder that the results will be biased.
For instance, wage workers aged between 15 and 65 in the household survey which
we are using is 7988, but that of the ethnic group is only 65.
As known from previous studies on return to education, an education variable is
endogenous so that we would like to apply Instrument Variable approach.As an instrument,
we are thinking to use "quarter of birth" which is a common instrument in Economics of
Education.
The baseline model is as follows:
Yi = B0i + B1i*x1i + B2i*X2i + B3i*X3i + B4i*X4i + B5i*X5i + Ui (1)
x1i = W0i + W1*Qi + W2i*X2i + W3i*X3i + W4i*X4i + W5i*X5i + ri (2)
Where,
Yi = Log_wage
x1i = years of education
X2i = potential working experience (Age- years of education)
X3i = square of the potential working experience
X4i = The ethnicity dummy
X5i = an interaction term of "X1i" and "X4i"
Ui = error term at the second stage
Qi = quarter of birth (instrument)
ri = error term at the first stage
This model might be able to solve the endogenous bias but still the issue of
small sample size is unsolved(probably, the estimated standard error is not
correct because of clustering).
Therefore, it might be an option to adjust the clustered SE by using "vce" option,
identifying cluster unit (in this case, the ethnicity variable) when we run
"ivregress" command.
In order to deal with the small sample size issue, I guess that multilevel
modeling can be also alternative option. Gelman & Hill (2006) in their book "Data analysis
Using Regression and Multilevel/Hierarchical Model (http://www.stat.columbia.edu/~gelman/arm/software/)
" indicates how to use multilevel analysis with IV with command of "R" statistical
package (but no Stata commands).
So, I think multilevel modeling with IV is theoretically possible but I do not know how
to exactly apply this way of the analysis by Stata (I mean, which should I use
"ivregress" or "xtmixed"? and if I use one of them, how can I specify its command?).
Thank you very much for your kind help in advance,
Kentaro Shimada
PhD Candidate, Kobe University, Japan
Comment