Hi all! I'm new to the forum and I'm coming here seeking some advice. I'm currently trying to estimate the effect of remote work on worklife balance and well being of mothers with fixed effects and I'm having a hard time deciding what to control to when it comes to occupation/industry.
The issue is the following: the occupation variable as is, contains 9330 categories with a total of 26178 valid observations in my dataset. (Link to the variable details: https://paneldata.org/soep-core/data...equiv/e1110598 ).
I also have the choice to use the 1- digit industry code (10 categories, 23971 observations) ( https://paneldata.org/soep-core/data...quiv/e1110697 ) and 2-digit industry code ( 33 categories, 23971 observations) (https://paneldata.org/soep-core/data...equiv/e1110797 )
I ended up picking the occupation variable, as it gave me the highest within R-squared (0.0903 within R-squared with occupation variable versus 0.0434 within R-squared with the 2-digit industry code) ; however only now I have realized that it the occupation variable basically serves as 9330 dummy variables, and because of that the sheer number alone could've accounted for the doubling of the R-squared (This is my understanding, please correct me if I'm wrong). Should I continue to only use occupation variable, or switch and just settle with only having an industry 1 or 2 digit code?
Thank you and kind regards,
Matej
Code with occupational variable (I've cut most of the +9000 occupational dummies due to length)
Code with 2-digit industry code:
Code with 1-digi industry code
The issue is the following: the occupation variable as is, contains 9330 categories with a total of 26178 valid observations in my dataset. (Link to the variable details: https://paneldata.org/soep-core/data...equiv/e1110598 ).
I also have the choice to use the 1- digit industry code (10 categories, 23971 observations) ( https://paneldata.org/soep-core/data...quiv/e1110697 ) and 2-digit industry code ( 33 categories, 23971 observations) (https://paneldata.org/soep-core/data...equiv/e1110797 )
I ended up picking the occupation variable, as it gave me the highest within R-squared (0.0903 within R-squared with occupation variable versus 0.0434 within R-squared with the 2-digit industry code) ; however only now I have realized that it the occupation variable basically serves as 9330 dummy variables, and because of that the sheer number alone could've accounted for the doubling of the R-squared (This is my understanding, please correct me if I'm wrong). Should I continue to only use occupation variable, or switch and just settle with only having an industry 1 or 2 digit code?
Thank you and kind regards,
Matej
Code with occupational variable (I've cut most of the +9000 occupational dummies due to length)
Code:
. xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim > esq i.partnerWFH i.occupation if sex==2 & children_in_hh_dummy816==0 , fe vce(cluster pid) note: age omitted because of collinearity. note: 1312.occupation omitted because of collinearity. note: 1319.occupation omitted because of collinearity. note: 2121.occupation omitted because of collinearity. note: 2213.occupation omitted because of collinearity. note: 2431.occupation omitted because of collinearity. note: 3232.occupation omitted because of collinearity. note: 3414.occupation omitted because of collinearity. note: 7215.occupation omitted because of collinearity. note: 7324.occupation omitted because of collinearity. note: 7332.occupation omitted because of collinearity. note: 7423.occupation omitted because of collinearity. note: 7442.occupation omitted because of collinearity. note: 8122.occupation omitted because of collinearity. note: 8240.occupation omitted because of collinearity. note: 8266.occupation omitted because of collinearity. note: 8312.occupation omitted because of collinearity. note: 9330.occupation omitted because of collinearity. Fixed-effects (within) regression Number of obs = 8,339 Group variable: pid Number of groups = 3,884 R-squared: Obs per group: Within = 0.0903 min = 1 Between = 0.0286 avg = 2.1 Overall = 0.0363 max = 5 F(188, 3883) = . corr(u_i, Xb) = -0.4376 Prob > F = . (Std. err. adjusted for 3,884 clusters in pid) --------------------------------------------------------------------------------------------------------------- | Robust overtimehours | Coefficient std. err. t P>|t| [95% conf. interval] ----------------------------------------------+---------------------------------------------------------------- freqWFH4 | Daily | 1.580801 .7068852 2.24 0.025 .1948997 2.966703 Semi-frequent, at least monthly | .99053 .5133477 1.93 0.054 -.0159267 1.996987 | 1.children_in_hh_dummy07 | -.9140177 .2526063 -3.62 0.000 -1.409271 -.4187641 | freqWFH4#children_in_hh_dummy07 | Daily#1 | -1.47619 1.292971 -1.14 0.254 -4.011156 1.058776 Semi-frequent, at least monthly#1 | -2.155114 .9060992 -2.38 0.017 -3.931589 -.3786379 | isced_edu | intermediate edu | .8347827 .3438065 2.43 0.015 .1607242 1.508841 higher edu | 1.295067 .4511599 2.87 0.004 .4105343 2.1796 | syear | 1999 | .3112715 .3356538 0.93 0.354 -.3468029 .9693459 2002 | .7115967 .3298794 2.16 0.031 .0648434 1.35835 2009 | 2.049542 .7357149 2.79 0.005 .6071182 3.491967 2014 | 2.34986 1.048949 2.24 0.025 .2933169 4.406404 | maritalstatus | Married, But Separated | .2121298 .3736198 0.57 0.570 -.5203799 .9446395 Single | .1504195 .2267014 0.66 0.507 -.2940455 .5948846 Divorced | .1304869 .3616733 0.36 0.718 -.5786008 .8395746 Widowed | -.422527 .5129552 -0.82 0.410 -1.428214 .5831603 Registered same sex partnership | 1.881282 .2077762 9.05 0.000 1.473921 2.288643 Registered same sex partnership, but separ.. | -1.513691 .7524476 -2.01 0.044 -2.988921 -.0384607 | age | 0 (omitted) agesq | -.0014598 .0006405 -2.28 0.023 -.0027156 -.0002041 | empstatus | Regular Part-Time Employment | -.4651567 .2181483 -2.13 0.033 -.8928529 -.0374606 1.disability_status | .1299819 .3198697 0.41 0.685 -.4971467 .7571106 logindincome | .0958683 .0873129 1.10 0.272 -.0753151 .2670518 workingexperience | -.0628805 .0241019 -2.61 0.009 -.1101341 -.0156269 | jobchange2 | Yes, changed job in last year | -.2996533 .201546 -1.49 0.137 -.6947993 .0954928 | regtyp | [2] Rural regions | .2161984 .4080876 0.53 0.596 -.583888 1.016285 | sizecompany | 2 | .3151676 .2366586 1.33 0.183 -.1488194 .7791546 3 | .2373861 .2621315 0.91 0.365 -.2765425 .7513146 4 | .2590375 .2525779 1.03 0.305 -.2361605 .7542355 5 | .2116392 .2584372 0.82 0.413 -.2950464 .7183248 6 | .4608767 .2867828 1.61 0.108 -.1013825 1.023136 7 | -.0050526 .3459975 -0.01 0.988 -.6834067 .6733015 Unknown | .0747334 .3874861 0.19 0.847 -.6849622 .8344291 | firmtime | -.0069339 .0253417 -0.27 0.784 -.0566182 .0427505 firmtimesq | .0004206 .0007693 0.55 0.585 -.0010877 .001929 | partnerWFH | Partner Working Remotely | .1729584 .2048384 0.84 0.399 -.2286428 .5745595 | occupation | 1140 | 2.003218 1.163329 1.72 0.085 -.2775754 4.284012 1142 | 1.869942 1.166629 1.60 0.109 -.4173218 4.157205 1200 | .8113705 2.558319 0.32 0.751 -4.204406 5.827147 1210 | . ................... ......................................................... 9161 | .8655941 1.293727 0.67 0.503 -1.670854 3.402043 9211 | -.8618563 1.494222 -0.58 0.564 -3.79139 2.067678 9320 | .2067987 .9640401 0.21 0.830 -1.683274 2.096872 9330 | 0 (omitted) | _cons | 2.287602 1.834321 1.25 0.212 -1.308722 5.883926 ----------------------------------------------+---------------------------------------------------------------- sigma_u | 3.7081101 sigma_e | 2.9130295 rho | .61837519 (fraction of variance due to u_i) ---------------------------------------------------------------------------------------------------------------
Code with 2-digit industry code:
Code:
. xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim > esq i.partnerWFH i.industrycode2 if sex==2 & children_in_hh_dummy816==0 , fe vce(cluster pid) note: age omitted because of collinearity. Fixed-effects (within) regression Number of obs = 7,765 Group variable: pid Number of groups = 3,737 R-squared: Obs per group: Within = 0.0434 min = 1 Between = 0.0246 avg = 2.1 Overall = 0.0254 max = 5 F(59, 3736) = . corr(u_i, Xb) = -0.3463 Prob > F = . (Std. err. adjusted for 3,737 clusters in pid) --------------------------------------------------------------------------------------------------------------- | Robust overtimehours | Coefficient std. err. t P>|t| [95% conf. interval] ----------------------------------------------+---------------------------------------------------------------- freqWFH4 | Daily | 2.52021 .7570638 3.33 0.001 1.035912 4.004509 Semi-frequent, at least monthly | 1.332821 .5584959 2.39 0.017 .2378345 2.427808 | 1.children_in_hh_dummy07 | -.8797856 .2582233 -3.41 0.001 -1.386058 -.3735133 | freqWFH4#children_in_hh_dummy07 | Daily#1 | -1.724419 1.332334 -1.29 0.196 -4.336593 .887754 Semi-frequent, at least monthly#1 | -2.349489 .9344255 -2.51 0.012 -4.181523 -.5174555 | isced_edu | intermediate edu | .7932312 .3587109 2.21 0.027 .0899429 1.49652 higher edu | 1.20516 .4579732 2.63 0.009 .3072578 2.103061 | syear | 1999 | .0557046 .3670551 0.15 0.879 -.6639433 .7753525 2002 | .8134055 .3518604 2.31 0.021 .1235483 1.503263 2009 | 1.979086 .7789014 2.54 0.011 .4519729 3.5062 2014 | 2.242524 1.11455 2.01 0.044 .0573377 4.42771 | maritalstatus | Married, But Separated | .2274815 .4186404 0.54 0.587 -.5933045 1.048268 Single | .0327204 .2330377 0.14 0.888 -.4241731 .4896139 Divorced | .1382372 .3800094 0.36 0.716 -.6068089 .8832832 Widowed | -.4293263 .5299967 -0.81 0.418 -1.468438 .6097848 Registered same sex partnership | 1.949983 .214771 9.08 0.000 1.528903 2.371063 Registered same sex partnership, but separ.. | -1.593916 .5594292 -2.85 0.004 -2.690732 -.4970996 | age | 0 (omitted) agesq | -.0012855 .000684 -1.88 0.060 -.0026265 .0000555 | empstatus | Regular Part-Time Employment | -.49855 .2211557 -2.25 0.024 -.9321478 -.0649523 1.disability_status | -.0180545 .3011389 -0.06 0.952 -.6084672 .5723582 logindincome | .2084391 .0906523 2.30 0.022 .0307063 .3861719 workingexperience | -.0520858 .0253048 -2.06 0.040 -.1016983 -.0024733 | jobchange2 | Yes, changed job in last year | -.1938162 .2089521 -0.93 0.354 -.6034876 .2158552 | regtyp | [2] Rural regions | .1276912 .422667 0.30 0.763 -.7009894 .9563717 | sizecompany | 2 | .2236825 .2463555 0.91 0.364 -.2593218 .7066869 3 | .2056656 .2693058 0.76 0.445 -.3223352 .7336664 4 | .1634258 .2591328 0.63 0.528 -.3446297 .6714814 5 | .2753488 .2639699 1.04 0.297 -.2421904 .792888 6 | .5658791 .2876042 1.97 0.049 .0020025 1.129756 7 | .0813208 .3483649 0.23 0.815 -.6016831 .7643246 Unknown | .2781849 .413024 0.67 0.501 -.5315897 1.087959 | firmtime | -.0277454 .0261347 -1.06 0.288 -.078985 .0234943 firmtimesq | .000589 .0007862 0.75 0.454 -.0009523 .0021304 | partnerWFH | Partner Working Remotely | .2272388 .2121365 1.07 0.284 -.1886759 .6431536 | industrycode2 | 3 | -.4429973 1.534138 -0.29 0.773 -3.450827 2.564833 4 | .8943711 1.833623 0.49 0.626 -2.700629 4.489372 5 | -.5896647 .9113456 -0.65 0.518 -2.376448 1.197119 6 | -.7362645 1.275964 -0.58 0.564 -3.237919 1.76539 7 | -.3418449 .9374674 -0.36 0.715 -2.179843 1.496153 8 | -.1244317 .91229 -0.14 0.892 -1.913067 1.664203 9 | -.1731643 .8890065 -0.19 0.846 -1.91615 1.569821 10 | .320897 .9012295 0.36 0.722 -1.446053 2.087847 11 | -.4672761 1.125556 -0.42 0.678 -2.674041 1.739489 12 | -.4847717 1.063409 -0.46 0.649 -2.56969 1.600147 13 | -.1433748 1.020529 -0.14 0.888 -2.144222 1.857473 14 | -.2006263 .8903235 -0.23 0.822 -1.946194 1.544941 16 | .8763018 .9196161 0.95 0.341 -.9266968 2.6793 18 | .4877597 .7734222 0.63 0.528 -1.028611 2.004131 21 | .2518529 .8895093 0.28 0.777 -1.492118 1.995824 22 | -.0626013 .9097828 -0.07 0.945 -1.846321 1.721118 23 | .6991119 .9062408 0.77 0.440 -1.077663 2.475887 24 | .8943916 1.196247 0.75 0.455 -1.450969 3.239753 25 | .0302612 .8446954 0.04 0.971 -1.625848 1.68637 26 | -1.074994 1.202115 -0.89 0.371 -3.43186 1.281873 27 | -.1815104 .9229585 -0.20 0.844 -1.991062 1.628041 28 | -.1836055 .9180045 -0.20 0.841 -1.983444 1.616233 30 | .0893199 .8873337 0.10 0.920 -1.650386 1.829026 31 | -.029833 .9358186 -0.03 0.975 -1.864598 1.804932 32 | .985862 1.285665 0.77 0.443 -1.534812 3.506536 33 | .2687769 .8859324 0.30 0.762 -1.468181 2.005735 | _cons | 2.03513 1.732971 1.17 0.240 -1.362531 5.432792 ----------------------------------------------+---------------------------------------------------------------- sigma_u | 3.5765006 sigma_e | 2.8954032 rho | .60408633 (fraction of variance due to u_i) ---------------------------------------------------------------------------------------------------------------
Code with 1-digi industry code
Code:
. xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim > esq i.partnerWFH i.industrycode if sex==2 & children_in_hh_dummy816==0 , fe vce(cluster pid) note: age omitted because of collinearity. Fixed-effects (within) regression Number of obs = 7,765 Group variable: pid Number of groups = 3,737 R-squared: Obs per group: Within = 0.0417 min = 1 Between = 0.0258 avg = 2.1 Overall = 0.0264 max = 5 F(41, 3736) = . corr(u_i, Xb) = -0.3373 Prob > F = . (Std. err. adjusted for 3,737 clusters in pid) --------------------------------------------------------------------------------------------------------------- | Robust overtimehours | Coefficient std. err. t P>|t| [95% conf. interval] ----------------------------------------------+---------------------------------------------------------------- freqWFH4 | Daily | 2.506616 .7575746 3.31 0.001 1.021316 3.991916 Semi-frequent, at least monthly | 1.31014 .5567582 2.35 0.019 .2185599 2.401719 | 1.children_in_hh_dummy07 | -.8776977 .2584498 -3.40 0.001 -1.384414 -.3709813 | freqWFH4#children_in_hh_dummy07 | Daily#1 | -1.745719 1.325147 -1.32 0.188 -4.3438 .8523621 Semi-frequent, at least monthly#1 | -2.328925 .9259446 -2.52 0.012 -4.144331 -.5135185 | isced_edu | intermediate edu | .7982281 .3598182 2.22 0.027 .0927689 1.503687 higher edu | 1.202107 .4609335 2.61 0.009 .2984012 2.105813 | syear | 1999 | .0688035 .3655281 0.19 0.851 -.6478506 .7854577 2002 | .8242518 .3520537 2.34 0.019 .1340155 1.514488 2009 | 1.977714 .7760868 2.55 0.011 .4561186 3.499309 2014 | 2.231432 1.111233 2.01 0.045 .0527493 4.410114 | maritalstatus | Married, But Separated | .2074701 .4178295 0.50 0.620 -.6117261 1.026666 Single | .0167293 .232257 0.07 0.943 -.4386335 .4720922 Divorced | .1407972 .377668 0.37 0.709 -.5996583 .8812527 Widowed | -.3756726 .5287056 -0.71 0.477 -1.412252 .6609071 Registered same sex partnership | 1.95335 .2149278 9.09 0.000 1.531963 2.374738 Registered same sex partnership, but separ.. | -1.563473 .5525402 -2.83 0.005 -2.646783 -.4801634 | age | 0 (omitted) agesq | -.0012707 .0006822 -1.86 0.063 -.0026082 .0000668 | empstatus | Regular Part-Time Employment | -.5092677 .2200708 -2.31 0.021 -.9407383 -.0777971 1.disability_status | -.0305753 .3001044 -0.10 0.919 -.6189598 .5578092 logindincome | .2135416 .0907753 2.35 0.019 .0355675 .3915157 workingexperience | -.0529485 .0252211 -2.10 0.036 -.1023971 -.0035 | jobchange2 | Yes, changed job in last year | -.1946817 .2089226 -0.93 0.351 -.6042951 .2149317 | regtyp | [2] Rural regions | .1158979 .4190269 0.28 0.782 -.7056459 .9374416 | sizecompany | 2 | .2628441 .245817 1.07 0.285 -.2191045 .7447927 3 | .2407289 .2683291 0.90 0.370 -.285357 .7668149 4 | .2010899 .2576507 0.78 0.435 -.3040599 .7062396 5 | .2992914 .2626124 1.14 0.254 -.2155862 .8141691 6 | .5893057 .2859272 2.06 0.039 .028717 1.149894 7 | .1081272 .3484248 0.31 0.756 -.5749942 .7912486 Unknown | .2916678 .4128469 0.71 0.480 -.5177594 1.101095 | firmtime | -.026983 .0257914 -1.05 0.296 -.0775496 .0235836 firmtimesq | .0005946 .0007822 0.76 0.447 -.0009389 .0021282 | partnerWFH | Partner Working Remotely | .235914 .2128906 1.11 0.268 -.1814791 .6533071 | industrycode | 2 | -.6843359 1.497694 -0.46 0.648 -3.620714 2.252042 3 | 1.107829 1.78769 0.62 0.535 -2.397113 4.612772 4 | -.2453813 .8045791 -0.30 0.760 -1.822838 1.332076 5 | -.2813446 .8825517 -0.32 0.750 -2.011675 1.448986 6 | .6179438 .7572475 0.82 0.415 -.8667151 2.102603 7 | .1473572 .8765486 0.17 0.867 -1.571203 1.865918 8 | .2111168 .8223362 0.26 0.797 -1.401155 1.823388 9 | -.0472463 .8081457 -0.06 0.953 -1.631696 1.537203 | _cons | 1.936961 1.723199 1.12 0.261 -1.441541 5.315463 ----------------------------------------------+---------------------------------------------------------------- sigma_u | 3.5625995 sigma_e | 2.891451 rho | .60287627 (fraction of variance due to u_i) ---------------------------------------------------------------------------------------------------------------
Comment