Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of remote work; picking the industry/occupational variable; too many categories?

    Hi all! I'm new to the forum and I'm coming here seeking some advice. I'm currently trying to estimate the effect of remote work on worklife balance and well being of mothers with fixed effects and I'm having a hard time deciding what to control to when it comes to occupation/industry.
    The issue is the following: the occupation variable as is, contains 9330 categories with a total of 26178 valid observations in my dataset. (Link to the variable details: https://paneldata.org/soep-core/data...equiv/e1110598 ).
    I also have the choice to use the 1- digit industry code (10 categories, 23971 observations) ( https://paneldata.org/soep-core/data...quiv/e1110697 ) and 2-digit industry code ( 33 categories, 23971 observations) (https://paneldata.org/soep-core/data...equiv/e1110797 )

    I ended up picking the occupation variable, as it gave me the highest within R-squared (0.0903 within R-squared with occupation variable versus 0.0434 within R-squared with the 2-digit industry code) ; however only now I have realized that it the occupation variable basically serves as 9330 dummy variables, and because of that the sheer number alone could've accounted for the doubling of the R-squared (This is my understanding, please correct me if I'm wrong). Should I continue to only use occupation variable, or switch and just settle with only having an industry 1 or 2 digit code?

    Thank you and kind regards,

    Matej


    Code with occupational variable (I've cut most of the +9000 occupational dummies due to length)

    Code:
    . xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp
    > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim
    > esq i.partnerWFH i.occupation  if sex==2  & children_in_hh_dummy816==0 , fe vce(cluster pid)
    note: age omitted because of collinearity.
    note: 1312.occupation omitted because of collinearity.
    note: 1319.occupation omitted because of collinearity.
    note: 2121.occupation omitted because of collinearity.
    note: 2213.occupation omitted because of collinearity.
    note: 2431.occupation omitted because of collinearity.
    note: 3232.occupation omitted because of collinearity.
    note: 3414.occupation omitted because of collinearity.
    note: 7215.occupation omitted because of collinearity.
    note: 7324.occupation omitted because of collinearity.
    note: 7332.occupation omitted because of collinearity.
    note: 7423.occupation omitted because of collinearity.
    note: 7442.occupation omitted because of collinearity.
    note: 8122.occupation omitted because of collinearity.
    note: 8240.occupation omitted because of collinearity.
    note: 8266.occupation omitted because of collinearity.
    note: 8312.occupation omitted because of collinearity.
    note: 9330.occupation omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =      8,339
    Group variable: pid                             Number of groups  =      3,884
    
    R-squared:                                      Obs per group:
         Within  = 0.0903                                         min =          1
         Between = 0.0286                                         avg =        2.1
         Overall = 0.0363                                         max =          5
    
                                                    F(188, 3883)      =          .
    corr(u_i, Xb) = -0.4376                         Prob > F          =          .
    
                                                                     (Std. err. adjusted for 3,884 clusters in pid)
    ---------------------------------------------------------------------------------------------------------------
                                                  |               Robust
                                    overtimehours | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ----------------------------------------------+----------------------------------------------------------------
                                         freqWFH4 |
                                           Daily  |   1.580801   .7068852     2.24   0.025     .1948997    2.966703
                 Semi-frequent, at least monthly  |     .99053   .5133477     1.93   0.054    -.0159267    1.996987
                                                  |
                         1.children_in_hh_dummy07 |  -.9140177   .2526063    -3.62   0.000    -1.409271   -.4187641
                                                  |
                  freqWFH4#children_in_hh_dummy07 |
                                         Daily#1  |   -1.47619   1.292971    -1.14   0.254    -4.011156    1.058776
               Semi-frequent, at least monthly#1  |  -2.155114   .9060992    -2.38   0.017    -3.931589   -.3786379
                                                  |
                                        isced_edu |
                                intermediate edu  |   .8347827   .3438065     2.43   0.015     .1607242    1.508841
                                      higher edu  |   1.295067   .4511599     2.87   0.004     .4105343      2.1796
                                                  |
                                            syear |
                                            1999  |   .3112715   .3356538     0.93   0.354    -.3468029    .9693459
                                            2002  |   .7115967   .3298794     2.16   0.031     .0648434     1.35835
                                            2009  |   2.049542   .7357149     2.79   0.005     .6071182    3.491967
                                            2014  |    2.34986   1.048949     2.24   0.025     .2933169    4.406404
                                                  |
                                    maritalstatus |
                          Married, But Separated  |   .2121298   .3736198     0.57   0.570    -.5203799    .9446395
                                          Single  |   .1504195   .2267014     0.66   0.507    -.2940455    .5948846
                                        Divorced  |   .1304869   .3616733     0.36   0.718    -.5786008    .8395746
                                         Widowed  |   -.422527   .5129552    -0.82   0.410    -1.428214    .5831603
                 Registered same sex partnership  |   1.881282   .2077762     9.05   0.000     1.473921    2.288643
    Registered same sex partnership, but separ..  |  -1.513691   .7524476    -2.01   0.044    -2.988921   -.0384607
                                                  |
                                              age |          0  (omitted)
                                            agesq |  -.0014598   .0006405    -2.28   0.023    -.0027156   -.0002041
                                                  |
                                        empstatus |
                    Regular Part-Time Employment  |  -.4651567   .2181483    -2.13   0.033    -.8928529   -.0374606
                              1.disability_status |   .1299819   .3198697     0.41   0.685    -.4971467    .7571106
                                     logindincome |   .0958683   .0873129     1.10   0.272    -.0753151    .2670518
                                workingexperience |  -.0628805   .0241019    -2.61   0.009    -.1101341   -.0156269
                                                  |
                                       jobchange2 |
                   Yes, changed job in last year  |  -.2996533    .201546    -1.49   0.137    -.6947993    .0954928
                                                  |
                                           regtyp |
                               [2] Rural regions  |   .2161984   .4080876     0.53   0.596     -.583888    1.016285
                                                  |
                                      sizecompany |
                                               2  |   .3151676   .2366586     1.33   0.183    -.1488194    .7791546
                                               3  |   .2373861   .2621315     0.91   0.365    -.2765425    .7513146
                                               4  |   .2590375   .2525779     1.03   0.305    -.2361605    .7542355
                                               5  |   .2116392   .2584372     0.82   0.413    -.2950464    .7183248
                                               6  |   .4608767   .2867828     1.61   0.108    -.1013825    1.023136
                                               7  |  -.0050526   .3459975    -0.01   0.988    -.6834067    .6733015
                                         Unknown  |   .0747334   .3874861     0.19   0.847    -.6849622    .8344291
                                                  |
                                         firmtime |  -.0069339   .0253417    -0.27   0.784    -.0566182    .0427505
                                       firmtimesq |   .0004206   .0007693     0.55   0.585    -.0010877     .001929
                                                  |
                                       partnerWFH |
                        Partner Working Remotely  |   .1729584   .2048384     0.84   0.399    -.2286428    .5745595
                                                  |
                                       occupation |
                                            1140  |   2.003218   1.163329     1.72   0.085    -.2775754    4.284012
                                            1142  |   1.869942   1.166629     1.60   0.109    -.4173218    4.157205
                                            1200  |   .8113705   2.558319     0.32   0.751    -4.204406    5.827147
                                            1210  |   .
                                   ...................
    .........................................................
                                            9161  |   .8655941   1.293727     0.67   0.503    -1.670854    3.402043
                                            9211  |  -.8618563   1.494222    -0.58   0.564     -3.79139    2.067678
                                            9320  |   .2067987   .9640401     0.21   0.830    -1.683274    2.096872
                                            9330  |          0  (omitted)
                                                  |
                                            _cons |   2.287602   1.834321     1.25   0.212    -1.308722    5.883926
    ----------------------------------------------+----------------------------------------------------------------
                                          sigma_u |  3.7081101
                                          sigma_e |  2.9130295
                                              rho |  .61837519   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------------------------------






    Code with 2-digit industry code:
    Code:
    . xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp
    > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim
    > esq i.partnerWFH i.industrycode2  if sex==2  & children_in_hh_dummy816==0 , fe vce(cluster pid)
    note: age omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =      7,765
    Group variable: pid                             Number of groups  =      3,737
    
    R-squared:                                      Obs per group:
         Within  = 0.0434                                         min =          1
         Between = 0.0246                                         avg =        2.1
         Overall = 0.0254                                         max =          5
    
                                                    F(59, 3736)       =          .
    corr(u_i, Xb) = -0.3463                         Prob > F          =          .
    
                                                                     (Std. err. adjusted for 3,737 clusters in pid)
    ---------------------------------------------------------------------------------------------------------------
                                                  |               Robust
                                    overtimehours | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ----------------------------------------------+----------------------------------------------------------------
                                         freqWFH4 |
                                           Daily  |    2.52021   .7570638     3.33   0.001     1.035912    4.004509
                 Semi-frequent, at least monthly  |   1.332821   .5584959     2.39   0.017     .2378345    2.427808
                                                  |
                         1.children_in_hh_dummy07 |  -.8797856   .2582233    -3.41   0.001    -1.386058   -.3735133
                                                  |
                  freqWFH4#children_in_hh_dummy07 |
                                         Daily#1  |  -1.724419   1.332334    -1.29   0.196    -4.336593     .887754
               Semi-frequent, at least monthly#1  |  -2.349489   .9344255    -2.51   0.012    -4.181523   -.5174555
                                                  |
                                        isced_edu |
                                intermediate edu  |   .7932312   .3587109     2.21   0.027     .0899429     1.49652
                                      higher edu  |    1.20516   .4579732     2.63   0.009     .3072578    2.103061
                                                  |
                                            syear |
                                            1999  |   .0557046   .3670551     0.15   0.879    -.6639433    .7753525
                                            2002  |   .8134055   .3518604     2.31   0.021     .1235483    1.503263
                                            2009  |   1.979086   .7789014     2.54   0.011     .4519729      3.5062
                                            2014  |   2.242524    1.11455     2.01   0.044     .0573377     4.42771
                                                  |
                                    maritalstatus |
                          Married, But Separated  |   .2274815   .4186404     0.54   0.587    -.5933045    1.048268
                                          Single  |   .0327204   .2330377     0.14   0.888    -.4241731    .4896139
                                        Divorced  |   .1382372   .3800094     0.36   0.716    -.6068089    .8832832
                                         Widowed  |  -.4293263   .5299967    -0.81   0.418    -1.468438    .6097848
                 Registered same sex partnership  |   1.949983    .214771     9.08   0.000     1.528903    2.371063
    Registered same sex partnership, but separ..  |  -1.593916   .5594292    -2.85   0.004    -2.690732   -.4970996
                                                  |
                                              age |          0  (omitted)
                                            agesq |  -.0012855    .000684    -1.88   0.060    -.0026265    .0000555
                                                  |
                                        empstatus |
                    Regular Part-Time Employment  |    -.49855   .2211557    -2.25   0.024    -.9321478   -.0649523
                              1.disability_status |  -.0180545   .3011389    -0.06   0.952    -.6084672    .5723582
                                     logindincome |   .2084391   .0906523     2.30   0.022     .0307063    .3861719
                                workingexperience |  -.0520858   .0253048    -2.06   0.040    -.1016983   -.0024733
                                                  |
                                       jobchange2 |
                   Yes, changed job in last year  |  -.1938162   .2089521    -0.93   0.354    -.6034876    .2158552
                                                  |
                                           regtyp |
                               [2] Rural regions  |   .1276912    .422667     0.30   0.763    -.7009894    .9563717
                                                  |
                                      sizecompany |
                                               2  |   .2236825   .2463555     0.91   0.364    -.2593218    .7066869
                                               3  |   .2056656   .2693058     0.76   0.445    -.3223352    .7336664
                                               4  |   .1634258   .2591328     0.63   0.528    -.3446297    .6714814
                                               5  |   .2753488   .2639699     1.04   0.297    -.2421904     .792888
                                               6  |   .5658791   .2876042     1.97   0.049     .0020025    1.129756
                                               7  |   .0813208   .3483649     0.23   0.815    -.6016831    .7643246
                                         Unknown  |   .2781849    .413024     0.67   0.501    -.5315897    1.087959
                                                  |
                                         firmtime |  -.0277454   .0261347    -1.06   0.288     -.078985    .0234943
                                       firmtimesq |    .000589   .0007862     0.75   0.454    -.0009523    .0021304
                                                  |
                                       partnerWFH |
                        Partner Working Remotely  |   .2272388   .2121365     1.07   0.284    -.1886759    .6431536
                                                  |
                                    industrycode2 |
                                               3  |  -.4429973   1.534138    -0.29   0.773    -3.450827    2.564833
                                               4  |   .8943711   1.833623     0.49   0.626    -2.700629    4.489372
                                               5  |  -.5896647   .9113456    -0.65   0.518    -2.376448    1.197119
                                               6  |  -.7362645   1.275964    -0.58   0.564    -3.237919     1.76539
                                               7  |  -.3418449   .9374674    -0.36   0.715    -2.179843    1.496153
                                               8  |  -.1244317     .91229    -0.14   0.892    -1.913067    1.664203
                                               9  |  -.1731643   .8890065    -0.19   0.846     -1.91615    1.569821
                                              10  |    .320897   .9012295     0.36   0.722    -1.446053    2.087847
                                              11  |  -.4672761   1.125556    -0.42   0.678    -2.674041    1.739489
                                              12  |  -.4847717   1.063409    -0.46   0.649     -2.56969    1.600147
                                              13  |  -.1433748   1.020529    -0.14   0.888    -2.144222    1.857473
                                              14  |  -.2006263   .8903235    -0.23   0.822    -1.946194    1.544941
                                              16  |   .8763018   .9196161     0.95   0.341    -.9266968      2.6793
                                              18  |   .4877597   .7734222     0.63   0.528    -1.028611    2.004131
                                              21  |   .2518529   .8895093     0.28   0.777    -1.492118    1.995824
                                              22  |  -.0626013   .9097828    -0.07   0.945    -1.846321    1.721118
                                              23  |   .6991119   .9062408     0.77   0.440    -1.077663    2.475887
                                              24  |   .8943916   1.196247     0.75   0.455    -1.450969    3.239753
                                              25  |   .0302612   .8446954     0.04   0.971    -1.625848     1.68637
                                              26  |  -1.074994   1.202115    -0.89   0.371     -3.43186    1.281873
                                              27  |  -.1815104   .9229585    -0.20   0.844    -1.991062    1.628041
                                              28  |  -.1836055   .9180045    -0.20   0.841    -1.983444    1.616233
                                              30  |   .0893199   .8873337     0.10   0.920    -1.650386    1.829026
                                              31  |   -.029833   .9358186    -0.03   0.975    -1.864598    1.804932
                                              32  |    .985862   1.285665     0.77   0.443    -1.534812    3.506536
                                              33  |   .2687769   .8859324     0.30   0.762    -1.468181    2.005735
                                                  |
                                            _cons |    2.03513   1.732971     1.17   0.240    -1.362531    5.432792
    ----------------------------------------------+----------------------------------------------------------------
                                          sigma_u |  3.5765006
                                          sigma_e |  2.8954032
                                              rho |  .60408633   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------------------------------


    Code with 1-digi industry code
    Code:
    . xtreg overtimehours ib5.freqWFH4##i.children_in_hh_dummy07 i.isced_edu i.syear i.maritalstatus age agesq i.emp
    > status i.disability_status logindincome workingexperience i.jobchange2 i.regtyp i.sizecompany firmtime firmtim
    > esq i.partnerWFH i.industrycode  if sex==2  & children_in_hh_dummy816==0 , fe vce(cluster pid)
    note: age omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =      7,765
    Group variable: pid                             Number of groups  =      3,737
    
    R-squared:                                      Obs per group:
         Within  = 0.0417                                         min =          1
         Between = 0.0258                                         avg =        2.1
         Overall = 0.0264                                         max =          5
    
                                                    F(41, 3736)       =          .
    corr(u_i, Xb) = -0.3373                         Prob > F          =          .
    
                                                                     (Std. err. adjusted for 3,737 clusters in pid)
    ---------------------------------------------------------------------------------------------------------------
                                                  |               Robust
                                    overtimehours | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    ----------------------------------------------+----------------------------------------------------------------
                                         freqWFH4 |
                                           Daily  |   2.506616   .7575746     3.31   0.001     1.021316    3.991916
                 Semi-frequent, at least monthly  |    1.31014   .5567582     2.35   0.019     .2185599    2.401719
                                                  |
                         1.children_in_hh_dummy07 |  -.8776977   .2584498    -3.40   0.001    -1.384414   -.3709813
                                                  |
                  freqWFH4#children_in_hh_dummy07 |
                                         Daily#1  |  -1.745719   1.325147    -1.32   0.188      -4.3438    .8523621
               Semi-frequent, at least monthly#1  |  -2.328925   .9259446    -2.52   0.012    -4.144331   -.5135185
                                                  |
                                        isced_edu |
                                intermediate edu  |   .7982281   .3598182     2.22   0.027     .0927689    1.503687
                                      higher edu  |   1.202107   .4609335     2.61   0.009     .2984012    2.105813
                                                  |
                                            syear |
                                            1999  |   .0688035   .3655281     0.19   0.851    -.6478506    .7854577
                                            2002  |   .8242518   .3520537     2.34   0.019     .1340155    1.514488
                                            2009  |   1.977714   .7760868     2.55   0.011     .4561186    3.499309
                                            2014  |   2.231432   1.111233     2.01   0.045     .0527493    4.410114
                                                  |
                                    maritalstatus |
                          Married, But Separated  |   .2074701   .4178295     0.50   0.620    -.6117261    1.026666
                                          Single  |   .0167293    .232257     0.07   0.943    -.4386335    .4720922
                                        Divorced  |   .1407972    .377668     0.37   0.709    -.5996583    .8812527
                                         Widowed  |  -.3756726   .5287056    -0.71   0.477    -1.412252    .6609071
                 Registered same sex partnership  |    1.95335   .2149278     9.09   0.000     1.531963    2.374738
    Registered same sex partnership, but separ..  |  -1.563473   .5525402    -2.83   0.005    -2.646783   -.4801634
                                                  |
                                              age |          0  (omitted)
                                            agesq |  -.0012707   .0006822    -1.86   0.063    -.0026082    .0000668
                                                  |
                                        empstatus |
                    Regular Part-Time Employment  |  -.5092677   .2200708    -2.31   0.021    -.9407383   -.0777971
                              1.disability_status |  -.0305753   .3001044    -0.10   0.919    -.6189598    .5578092
                                     logindincome |   .2135416   .0907753     2.35   0.019     .0355675    .3915157
                                workingexperience |  -.0529485   .0252211    -2.10   0.036    -.1023971      -.0035
                                                  |
                                       jobchange2 |
                   Yes, changed job in last year  |  -.1946817   .2089226    -0.93   0.351    -.6042951    .2149317
                                                  |
                                           regtyp |
                               [2] Rural regions  |   .1158979   .4190269     0.28   0.782    -.7056459    .9374416
                                                  |
                                      sizecompany |
                                               2  |   .2628441    .245817     1.07   0.285    -.2191045    .7447927
                                               3  |   .2407289   .2683291     0.90   0.370     -.285357    .7668149
                                               4  |   .2010899   .2576507     0.78   0.435    -.3040599    .7062396
                                               5  |   .2992914   .2626124     1.14   0.254    -.2155862    .8141691
                                               6  |   .5893057   .2859272     2.06   0.039      .028717    1.149894
                                               7  |   .1081272   .3484248     0.31   0.756    -.5749942    .7912486
                                         Unknown  |   .2916678   .4128469     0.71   0.480    -.5177594    1.101095
                                                  |
                                         firmtime |   -.026983   .0257914    -1.05   0.296    -.0775496    .0235836
                                       firmtimesq |   .0005946   .0007822     0.76   0.447    -.0009389    .0021282
                                                  |
                                       partnerWFH |
                        Partner Working Remotely  |    .235914   .2128906     1.11   0.268    -.1814791    .6533071
                                                  |
                                     industrycode |
                                               2  |  -.6843359   1.497694    -0.46   0.648    -3.620714    2.252042
                                               3  |   1.107829    1.78769     0.62   0.535    -2.397113    4.612772
                                               4  |  -.2453813   .8045791    -0.30   0.760    -1.822838    1.332076
                                               5  |  -.2813446   .8825517    -0.32   0.750    -2.011675    1.448986
                                               6  |   .6179438   .7572475     0.82   0.415    -.8667151    2.102603
                                               7  |   .1473572   .8765486     0.17   0.867    -1.571203    1.865918
                                               8  |   .2111168   .8223362     0.26   0.797    -1.401155    1.823388
                                               9  |  -.0472463   .8081457    -0.06   0.953    -1.631696    1.537203
                                                  |
                                            _cons |   1.936961   1.723199     1.12   0.261    -1.441541    5.315463
    ----------------------------------------------+----------------------------------------------------------------
                                          sigma_u |  3.5625995
                                          sigma_e |   2.891451
                                              rho |  .60287627   (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------------------------------
    Last edited by Matej Rencelj; 11 Jun 2024, 05:18.

  • #2
    If you have individual-level panel data and want to account for time-invariant individual effects, then you have to xtset using the person identifier. The only reason you'd bypass this and use any of the various industry identifiers is if the individuals in the sample are relatively homogeneous, and this is a testable assumption. There is no criterion that I know of that uses the within-R2 as the basis for selecting at what level the heterogeneity should be accounted for.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      If you have individual-level panel data and want to account for time-invariant individual effects, then you have to xtset using the person identifier. The only reason you'd bypass this and use any of the various industry identifiers is if the individuals in the sample are relatively homogeneous, and this is a testable assumption. There is no criterion that I know of that uses the within-R2 as the basis for selecting at what level the heterogeneity should be accounted for.
      My apologies, I have used xtset, but forgot to copy paste it in my code pasted here!

      Code:
      xtset pid syear

      Comment


      • #4
        This might help (it's fairly generic, but might fail). You'll need to estimate the model using reghdfe first using the higher level of aggregation. Then run papwood and set altfe to the lower level of aggregation.

        Don't mess with the papwood program.

        Code:
        capture program drop papwood
        program papwood , rclass
        * Coded by George Ford, [email protected]
        syntax [, altfe(varlist min=1 max=1) time(varlist min=1 max=1) datatype(string)]
        qui estimates store reghdfemodel
        preserve
        if "`datatype'" == "" {
                        di "You must specify data type (panel or pooled)."
                        exit
        }
        quietly {
        macro drop TTERMS MTERMS XTERMS
        capture drop *_xxx
        ** GET TIME VARIABLES FOR POOLED DATA
        egen tobs_xxx = count(`altfe') , by(`altfe')
        global TTERMS
        if "`datatype'" == "pooled" {
                        summ year
                        local start_xxx = r(min)-1
                        forv y = `r(min)'/`r(max)' {
                                        egen y`y'_xxx = total(year==`y'), by(schid)
                                        replace y`y'_xxx = y`y'_xxx/tobs_xxx
                                        global TTERMS $TTERMS y`y'_xxx
                        }
        }
        global XTERMS
        global MTERMS
        local varnum =  wordcount(`"`e(indepvars)'"')-1
        forv i = 1/`varnum' {
                        local var = `'"`: word `i' of `e(indepvars)''"'
                        global XTERMS $XTERMS `var'
                        egen `var'_xxx = mean(`var') , by(`altfe')
                        global MTERMS $MTERMS `var'_xxx
        }
        reghdfe `e(depvar)' $XTERMS $MTERMS $TTERMS , absorb(`e(absvars)') cluster(`e(clustvar)')
        qui estimates store reghdfemodel_aug
        test $MTERMS $TTERMS
        }
        local star = (cond(r(p)<.01, "***", cond(r(p)<.05, "**", cond(r(p)<.10, "*", ""))))
        di `"{hline 50}"'
        di "PAPKE-WOOLDRIDGE FIXED EFFECT TEST"
        di `"{hline 50}"'
        di "Main model stored as reghdfemodel"
        di "Augmented model stored as reghdfemodel_aug"
        di `"{hline 50}"'
        di "Null:  Higher level fixed effects is OK."
        di "F-stat = " %6.2f `r(F)' "`star'"
        di "Prob Level = " %6.4f r(p)
        if r(p) < 0.10 {
                        di "Null is REJECTED."
        }
        else {
                        di "Null is ACCEPTED"
        }
        di `"{hline 50}"'
        restore
        qui estimates restore reghdfemodel
        end
        
        reghdfe [your model with higher level of aggregation]
        ** pick whether data is panel or pooled.
        papwood , altfe(lower level of aggregation) time(year) datatype(pooled)
        papwood , altfe(lower level of aggregation) time(year) datatype(panel)
        Last edited by George Ford; 11 Jun 2024, 16:42.

        Comment


        • #5
          You may want to absorb any of the "i.xxx" variables in additional to the industry code and syear. Also, time(syear) in the papwood.
          Last edited by George Ford; 11 Jun 2024, 16:47.

          Comment


          • #6
            You can alter the r(p) thresholds in papwood if you prefer an alternative.

            Comment


            • #7
              Originally posted by George Ford View Post
              This might help (it's fairly generic, but might fail). You'll need to estimate the model using reghdfe first using the higher level of aggregation. Then run papwood and set altfe to the lower level of aggregation.

              Don't mess with the papwood program.

              Code:
              capture program drop papwood
              program papwood , rclass
              * Coded by George Ford, [email protected]
              syntax [, altfe(varlist min=1 max=1) time(varlist min=1 max=1) datatype(string)]
              qui estimates store reghdfemodel
              preserve
              if "`datatype'" == "" {
              di "You must specify data type (panel or pooled)."
              exit
              }
              quietly {
              macro drop TTERMS MTERMS XTERMS
              capture drop *_xxx
              ** GET TIME VARIABLES FOR POOLED DATA
              egen tobs_xxx = count(`altfe') , by(`altfe')
              global TTERMS
              if "`datatype'" == "pooled" {
              summ year
              local start_xxx = r(min)-1
              forv y = `r(min)'/`r(max)' {
              egen y`y'_xxx = total(year==`y'), by(schid)
              replace y`y'_xxx = y`y'_xxx/tobs_xxx
              global TTERMS $TTERMS y`y'_xxx
              }
              }
              global XTERMS
              global MTERMS
              local varnum = wordcount(`"`e(indepvars)'"')-1
              forv i = 1/`varnum' {
              local var = `'"`: word `i' of `e(indepvars)''"'
              global XTERMS $XTERMS `var'
              egen `var'_xxx = mean(`var') , by(`altfe')
              global MTERMS $MTERMS `var'_xxx
              }
              reghdfe `e(depvar)' $XTERMS $MTERMS $TTERMS , absorb(`e(absvars)') cluster(`e(clustvar)')
              qui estimates store reghdfemodel_aug
              test $MTERMS $TTERMS
              }
              local star = (cond(r(p)<.01, "***", cond(r(p)<.05, "**", cond(r(p)<.10, "*", ""))))
              di `"{hline 50}"'
              di "PAPKE-WOOLDRIDGE FIXED EFFECT TEST"
              di `"{hline 50}"'
              di "Main model stored as reghdfemodel"
              di "Augmented model stored as reghdfemodel_aug"
              di `"{hline 50}"'
              di "Null: Higher level fixed effects is OK."
              di "F-stat = " %6.2f `r(F)' "`star'"
              di "Prob Level = " %6.4f r(p)
              if r(p) < 0.10 {
              di "Null is REJECTED."
              }
              else {
              di "Null is ACCEPTED"
              }
              di `"{hline 50}"'
              restore
              qui estimates restore reghdfemodel
              end
              
              reghdfe [your model with higher level of aggregation]
              ** pick whether data is panel or pooled.
              papwood , altfe(lower level of aggregation) time(year) datatype(pooled)
              papwood , altfe(lower level of aggregation) time(year) datatype(panel)
              Thank you for this, iIll definitely bookmark it for a time similar situation arises! I have realized that the occupation variable I have is of ISCO-88 standard classification, and managed to go from thousands of categories to just a bit under 30 by going from the 4-digit to 2-digit ISCO-88 occupation codes. I believe this resolves my issue that I had with the german SOEP dataset.

              For any future viewers with a similar issue, here is the code I've used to do this.
              Code:
              *Changing occupation to 2-digit ISCO codebook
              
              * Step 1: Create the new variable with two-digit categories
              gen occupation_group = .
              
              * Assign values based on the first two digits of the ISCO-88 codes
              replace occupation_group = 1  if inrange(occupation, 100, 199)
              replace occupation_group = 11 if inrange(occupation, 1100, 1143)
              replace occupation_group = 12 if inrange(occupation, 1200, 1300)
              replace occupation_group = 13 if inrange(occupation, 1310, 1319)
              replace occupation_group = 2  if inrange(occupation, 2000, 2999)
              replace occupation_group = 21 if inrange(occupation, 2100, 2139)
              replace occupation_group = 22 if inrange(occupation, 2200, 2239)
              replace occupation_group = 23 if inrange(occupation, 2300, 2359)
              replace occupation_group = 24 if inrange(occupation, 2400, 2479)
              replace occupation_group = 3  if inrange(occupation, 3000, 3999)
              replace occupation_group = 31 if inrange(occupation, 3100, 3159)
              replace occupation_group = 32 if inrange(occupation, 3200, 3239)
              replace occupation_group = 33 if inrange(occupation, 3300, 3349)
              replace occupation_group = 34 if inrange(occupation, 3400, 3489)
              replace occupation_group = 4  if inrange(occupation, 4000, 4999)
              replace occupation_group = 41 if inrange(occupation, 4100, 4149)
              replace occupation_group = 42 if inrange(occupation, 4200, 4229)
              replace occupation_group = 5  if inrange(occupation, 5000, 5999)
              replace occupation_group = 51 if inrange(occupation, 5100, 5169)
              replace occupation_group = 52 if inrange(occupation, 5200, 5229)
              replace occupation_group = 6  if inrange(occupation, 6000, 6999)
              replace occupation_group = 61 if inrange(occupation, 6100, 6159)
              replace occupation_group = 7  if inrange(occupation, 7000, 7999)
              replace occupation_group = 71 if inrange(occupation, 7100, 7149)
              replace occupation_group = 72 if inrange(occupation, 7200, 7249)
              replace occupation_group = 73 if inrange(occupation, 7300, 7349)
              replace occupation_group = 74 if inrange(occupation, 7400, 7449)
              replace occupation_group = 8  if inrange(occupation, 8000, 8999)
              replace occupation_group = 81 if inrange(occupation, 8100, 8179)
              replace occupation_group = 82 if inrange(occupation, 8200, 8299)
              replace occupation_group = 83 if inrange(occupation, 8300, 8349)
              replace occupation_group = 9  if inrange(occupation, 9000, 9999)
              replace occupation_group = 91 if inrange(occupation, 9100, 9169)
              replace occupation_group = 92 if inrange(occupation, 9200, 9219)
              replace occupation_group = 93 if inrange(occupation, 9300, 9339)
              replace occupation_group = 0  if inrange(occupation, 0, 99)
              replace occupation_group = 01 if inrange(occupation, 1000, 1000)
              
              * Step 2: Relabel the two-digit categories
              label define occupation_lbl 1 "Legislators, senior officials and managers" ///
                  11 "Legislators and senior officials" ///
                  12 "Corporate managers" ///
                  13 "Managers of small enterprises" ///
                  2 "Professionals" ///
                  21 "Physical, mathematical and engineering science professionals" ///
                  22 "Life science and health professionals" ///
                  23 "Teaching professionals" ///
                  24 "Other professionals" ///
                  3 "Technicians and associate professionals" ///
                  31 "Physical and engineering science associate professionals" ///
                  32 "Life science and health associate professionals" ///
                  33 "Teaching associate professionals" ///
                  34 "Other associate professionals" ///
                  4 "Clerks" ///
                  41 "Office clerks" ///
                  42 "Customer services clerks" ///
                  5 "Service workers and shop and market sales workers" ///
                  51 "Personal and protective services workers" ///
                  52 "Models, salespersons and demonstrators" ///
                  6 "Skilled agricultural and fishery workers" ///
                  61 "Skilled agricultural and fishery workers" ///
                  7 "Craft and related trades workers" ///
                  71 "Extraction and building trades workers" ///
                  72 "Metal, machinery and related trades workers" ///
                  73 "Precision, handicraft, craft printing and related trades workers" ///
                  74 "Other craft and related trades workers" ///
                  8 "Plant and machine operators and assemblers" ///
                  81 "Stationary plant and related operators" ///
                  82 "Machine operators and assemblers" ///
                  83 "Drivers and mobile plant operators" ///
                  9 "Elementary occupations" ///
                  91 "Sales and services elementary occupations" ///
                  92 "Agricultural, fishery and related labourers" ///
                  93 "Labourers in mining, construction, manufacturing and transport" ///
                  0 "Armed forces" ///
                  01 "Armed forces"
              label values occupation_group occupation_lbl

              Comment

              Working...
              X