Hi Statalist. I'm having trouble estimating the mean value of individual fixed effects within cohort groups. I'm trying to replicate a paper by "George J. Borjas - The Impact of Foreign Students on the Earnings of Doctorates (https://scholar.harvard.edu/files/gb...reeman2009.pdf). He uses the following approach two-step approach:

I have the following data available, here is an example:
To estimate this I have made a fairly over-complicated loop, which i'm not sure is the right approach:
So to clarify my question. Does my "spaghetti code" do the trick and give me the right results? And if not how do I estimate this correctly?
NB. I use Stata 17 if this is of relevance.
Thanks,
Nicolai Kjær
I have the following data available, here is an example:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float(l_sal p_weight gender uemp) int HDAY5P float child byte AGEGRP float empus byte(EMSIZE EMSECDT RACETHMP) double WTSURVY float(cohort field) 7.600903 .0010235414 0 0 2010 1 35 0 7 11 1 39.6161 2 20 10.596635 .001126549 1 1 2010 1 40 0 7 11 1 5.2514 2 14 10.714417 .004780876 1 0 1990 1 50 0 5 11 5 32.4895 3 1 10.714417 .0030316154 1 0 2010 0 30 0 6 11 5 1 2 1 10.714417 .0007754944 1 1 1985 1 65 0 8 11 4 19.3516 2 10 10.819778 .0016915894 0 0 2005 0 50 1 2 12 5 14.808300000000001 2 4 10.819778 .0011296882 0 1 2005 1 40 1 8 11 5 10.4425 1 4 11.05089 .0016915894 0 0 2015 0 25 1 8 31 5 22.8776 2 4 11.05089 .02052861 0 0 2005 1 35 1 8 11 5 14.203000000000001 2 22 11.066638 .0011296882 0 0 2005 0 35 1 8 11 7 8.373700000000001 1 4 11.082143 0 0 0 2000 1 40 1 8 11 5 18.5498 1 13 11.141862 .00122859 0 0 2005 0 40 1 8 12 5 5.8696 3 4 11.170435 0 0 0 2005 0 55 1 8 11 4 2.6871 3 13 11.184422 .001171303 1 0 1980 1 60 1 7 11 1 19.333000000000002 2 2 11.225244 0 0 0 2010 0 60 1 3 23 5 3.2468000000000004 1 14 11.25156 .0004604052 1 0 2005 0 40 1 4 21 5 89.01830000000001 2 8 11.25156 .000498008 0 0 2000 0 40 1 8 11 5 8.7528 1 8 11.289782 0 1 0 1985 1 55 0 6 11 4 2.5865 2 21 11.289782 .0016915894 0 0 2005 1 40 1 8 32 5 1.9892 2 4 11.314474 0 1 0 1995 0 55 1 8 11 3 11.378300000000001 2 13 11.385092 0 0 0 2000 1 40 1 5 11 5 3.3952 2 13 11.472103 .0010865629 0 1 2000 0 45 1 8 11 1 6.537800000000001 2 12 11.512925 0 0 0 1990 0 55 1 3 21 5 1.4746000000000001 2 5 11.512925 .0016915894 1 1 2005 0 45 1 8 11 1 3.6417 2 4 11.512925 .0007754944 0 0 2005 1 40 1 8 11 5 14.309800000000001 2 10 11.542484 0 1 1 2010 0 30 1 4 11 5 24.783900000000003 1 10 11.561716 .001549987 0 1 2005 0 35 0 5 11 5 29.1175 1 12 11.561716 .0016915894 0 0 2010 0 65 1 6 12 5 14.8041 2 4 11.617286 .0016915894 0 0 2010 0 35 1 8 21 5 11.6028 2 4 11.652687 .0011296882 1 0 2005 1 35 1 8 21 3 13.7058 1 4 11.66993 .0014695077 0 0 2010 0 40 1 8 11 5 1 2 11 11.695247 0 1 0 1970 0 70 1 1 22 5 23.2586 2 13 11.695247 0 1 0 2010 0 35 1 8 21 5 16.909000000000002 3 6 11.73607 .0006116208 1 0 2010 0 30 1 8 11 5 38.051100000000005 3 21 11.77529 .0002215821 1 0 1985 1 55 1 5 11 3 1.8376000000000001 1 6 11.83501 0 1 0 2010 1 30 1 8 11 1 3.3339000000000003 1 10 11.849398 0 1 0 1985 0 55 1 7 21 1 5.4634 2 16 11.849398 .001171303 1 0 1995 1 45 1 8 21 5 68.3165 2 2 11.863583 .0003447087 1 0 1995 1 45 1 8 21 5 10.3222 3 3 11.88449 .001126549 0 0 2005 0 50 1 8 32 3 1.0359 2 14 11.91839 .0017346054 1 0 1990 1 50 1 3 21 5 45.647200000000005 3 2 11.91839 0 1 0 1995 0 45 1 8 11 4 1.0904 1 18 11.91839 .0006116208 1 0 2010 1 40 1 8 21 5 21.1216 3 21 11.95118 0 1 0 1975 1 65 1 8 31 1 29.337200000000003 2 16 12.12269 .0005925926 1 0 1990 0 55 1 8 21 1 5.4815000000000005 2 3 12.15478 0 1 0 2000 0 55 1 4 21 3 6.0602 2 21 12.180755 0 0 0 1995 0 50 1 8 11 5 2.9581 2 5 12.18587 .0004604052 1 0 1985 1 55 1 6 23 5 2.7446 2 8 12.254863 .0003447087 1 0 1985 0 60 1 8 11 5 14.3194 3 3 12.429216 .0011296882 1 0 1985 0 55 1 7 23 5 22.1174 1 4 end
To estimate this I have made a fairly over-complicated loop, which i'm not sure is the right approach:
Code:
cap gen FE = . mat BIG_STORE = J(23,3,.) mat rown BIG_STORE = Fields mat coln BIG_STORE = Cohort forvalues f = 1(+1)23 { forvalues c = 1(+1)3 { display("0") qui mean l_sal if field == `f' & cohort == `c' mat l_sal_f`f'_c`c'_bar = e(b) display("1") qui mean p_weight i.gender i.uemp i.HDAY5P i.child i.AGEGRP i.empus i.EMSIZE i.EMSECDT i.RACETHMP if field == `f' & cohort == `c' mat x_f`f'_c`c'_bar = e(b) display("2") qui reg l_sal p_weight i.gender i.uemp i.HDAY5P i.child i.AGEGRP i.empus i.EMSIZE i.EMSECDT i.RACETHMP [pweight=WTSURVY] if field == `f' & cohort == `c', robust local newcol = colsof(e(b)) - 1 mat Beta_f`f'_c`c'_bar = J(`newcol',1,.) forvalues i = 1(+1)`newcol'{ mat Beta_f`f'_c`c'_bar[`i',1] = e(b)[1,`i'] } mat store = l_sal_f`f'_c`c'_bar + x_f`f'_c`c'_bar * Beta_f`f'_c`c'_bar mat BIG_STORE[`f',`c'] = store[1,1] replace FE = store[1,1] if field == `f' & cohort == `c' display("f`f'c`c'") } }
NB. I use Stata 17 if this is of relevance.
Thanks,
Nicolai Kjær