Dear all,
Happy Easter. I am writing a Master's thesis using panel data, with unique identifiers as the survey year and the person ID. In a wage regression to determine the native-immigrant wage gap in Germany, I am looking to control for occupation group. That is, I want to control for the fact that migrants may sort into different occupation groups and therefore earn lower wages.
In my dataset, the occupation variable was defined in two ways: First, from the start of the survey to 2013 it was defined according to the standard "ISCO-88", after 2013 it was changed to the standard "ISCO-08". The way both classification schemes defined the groups have many n-1 and 1-m splits and merges, so there is no way for me to harmonize the two. Given that there are 9 occupation groups in each standard, both in ISCO-88 and ISCO-08, I end up having 18 codes.
I define the variable "occup_combined" as a categorical variable has unique codes for every classification scheme-occupation group. Thus, it looks something like this:
I run the code below. My question is: Does this accurately control for occupation, no matter which time period it is?
It seems that STATA accurately drops one reference category in the occupations, given collinearity within each pre- and post-time period (before and after 2013).
Happy Easter. I am writing a Master's thesis using panel data, with unique identifiers as the survey year and the person ID. In a wage regression to determine the native-immigrant wage gap in Germany, I am looking to control for occupation group. That is, I want to control for the fact that migrants may sort into different occupation groups and therefore earn lower wages.
In my dataset, the occupation variable was defined in two ways: First, from the start of the survey to 2013 it was defined according to the standard "ISCO-88", after 2013 it was changed to the standard "ISCO-08". The way both classification schemes defined the groups have many n-1 and 1-m splits and merges, so there is no way for me to harmonize the two. Given that there are 9 occupation groups in each standard, both in ISCO-88 and ISCO-08, I end up having 18 codes.
I define the variable "occup_combined" as a categorical variable has unique codes for every classification scheme-occupation group. Thus, it looks something like this:
person ID | survey year | occup_combined |
1 | 2011 | ISCO-88-Group9 |
1 | 2012 | ISCO-88-Group9 |
1 | 2013 | ISCO-08-Group8 |
1 | 2014 | ISCO-08-Group8 |
2 | 2014 | ISCO-08-Group3 |
2 | 2015 | ISCO-08-Group3 |
I run the code below. My question is: Does this accurately control for occupation, no matter which time period it is?
Code:
egen cluster_var = group(bula regtyp) regdhfe ln_wages_gro immigrant sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined, a(cluster_var syear) vce(cluster cluster_var)
Code:
reghdfe immigrant sex age age_sq married no_children i.educ_level years_work_exp i.occup_combined, a(cluster_var syear) vce(cluster cluster_var) (MWFE estimator converged in 5 iterations) note: 889.occup_combined omitted because of collinearity HDFE Linear regression Number of obs = 373,914 Absorbing 2 HDFE groups F( 24, 27) = 1487.98 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.2058 Adj R-squared = 0.2057 Within R-sq. = 0.1404 Number of clusters (cluster_var) = 28 Root MSE = 0.3394 (Std. err. adjusted for 28 clusters in cluster_var) -------------------------------------------------------------------------------- | Robust immigrant | Coefficient std. err. t P>|t| [95% conf. interval] ---------------+---------------------------------------------------------------- sex | -.0062314 .0053259 -1.17 0.252 -.0171592 .0046965 age | .0129305 .0018842 6.86 0.000 .0090644 .0167966 age_sq | -.0190114 .0022333 -8.51 0.000 -.0235937 -.0144291 married | .0805015 .00617 13.05 0.000 .0678417 .0931612 no_children | .0305603 .0022926 13.33 0.000 .0258563 .0352643 | educ_level | 2 | -.2275403 .013966 -16.29 0.000 -.2561961 -.1988844 3 | -.1845185 .0156826 -11.77 0.000 -.2166965 -.1523404 | years_work_exp | .0012085 .000531 2.28 0.031 .0001189 .0022981 | occup_combined | 82 | .0365541 .0107535 3.40 0.002 .0144898 .0586184 83 | .0550285 .0114384 4.81 0.000 .0315589 .0784981 84 | .0754783 .010643 7.09 0.000 .0536408 .0973159 85 | .1769228 .0207138 8.54 0.000 .1344216 .2194241 86 | .0625833 .0236854 2.64 0.014 .0139848 .1111818 87 | .1984632 .0163995 12.10 0.000 .1648143 .2321121 88 | .2950671 .0256503 11.50 0.000 .2424369 .3476972 89 | .3754826 .0248805 15.09 0.000 .3244321 .4265331 881 | -.2014318 .0285681 -7.05 0.000 -.2600487 -.142815 882 | -.218334 .0289038 -7.55 0.000 -.2776397 -.1590283 883 | -.1955812 .027253 -7.18 0.000 -.2514997 -.1396627 884 | -.1969916 .0264007 -7.46 0.000 -.2511613 -.1428219 885 | -.1457004 .0210747 -6.91 0.000 -.1889421 -.1024588 886 | -.1956609 .0408315 -4.79 0.000 -.2794402 -.1118816 887 | -.0568629 .0111428 -5.10 0.000 -.0797259 -.0339998 888 | .0205399 .0117739 1.74 0.092 -.0036182 .044698 889 | 0 (omitted) | _cons | .1553774 .0249477 6.23 0.000 .1041889 .2065659 -------------------------------------------------------------------------------- Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| cluster_var | 28 28 0 *| syear | 37 1 36 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation
Comment