Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extreamly low R-Squared in Fixed-effect using xtreg, fe

    Hi Everyone!

    For my master's thesis, I am conducting regression estimation using the Two-Way Fixed Effect method. My dataset comprises six rounds of panel data from 10,802 communities. The aim is to test how Humanitarian Aid (binary variable) delivered to communities attracts Internally Displaced Persons ArrivalIDPs (continuous variable) compared to those that have not received the Aid.

    Start with the basic estimation:

    Code:
    Code:
    xtset SettlementID Round
    
    reg ArrivalIDPs HUMDelivered
    outreg2 using my_reg1.doc, replace ctitle(Poold OLS) ///
    keep(HUMDelivered ) ///
    addtext(Settlement FE, YES)
    
    xtreg ArrivalIDPs HUMDelivered, re
    outreg2 using my_reg1.doc, append ctitle(Random-Effect) ///
    keep(HUMDelivered ) ///
    addtext(Settlement FE, YES)
    
    xtreg ArrivalIDPs HUMDelivered, fe
    outreg2 using my_reg1.doc, append ctitle(Fixed-Effect) ///
    keep(HUMDelivered i.Round ) ///
    addtext(Settlement FE, YES)
    
    reg ArrivalIDPs HUMDelivered i.Round
    outreg2 using my_reg1.doc, append ctitle(Poold OLS) ///
    keep(HUMDelivered i.Round) ///
    addtext(Year FE, YES, Settlement FE, YES)
    
    xtreg ArrivalIDPs HUMDelivered i.Round , re
    outreg2 using my_reg1.doc, append ctitle(Random-Effect) ///
    keep(HUMDelivered i.Round) ///
    addtext(Year FE, YES, Settlement FE, YES)
    
    xtreg ArrivalIDPs HUMDelivered i.Round , fe
    outreg2 using my_reg1.doc, append ctitle(Fixed-Effect) ///
    keep(HUMDelivered i.Round) ///
    addtext(Year FE, YES, Settlement FE, YES)
    The Result:
    Click image for larger version

Name:	Baseline regression.png
Views:	1
Size:	129.1 KB
ID:	1755364


    #1: Why is R-squared not available for the random effect? I am aware that I can find this separately adj_R^2 by adding r2_b, r2_w, and without r2_a, but just wondering if it is still possible to include it directly in the output. Also, the R-squared is extremely low.

    #2: After conducting the Hausman Test, it confirmed that the fixed effect (xtreg, fe) is the best fit. However, even with adding many control variables, the R-squared still remains extremely low. Despite this, the coefficients are significant, especially for the extreme categories within the categorical variable (e.g., =5), which is what I am looking for and makes sense. What could be the issue and the possible solution? I have uploaded the descriptions of the variables below if needed.

    Result:

    Code:
    . xtreg ArrivalIDPs HUMDelivered i.Round i.IDPInConflicts i.IDPsNatDisaster i.pashtun_greg i.HLTClinics i.EduSchoolExist, fe
    note: 1.pashtun_greg omitted because of collinearity.
    
    Fixed-effects (within) regression               Number of obs     =     64,620
    Group variable: SettlementID                    Number of groups  =     10,770
    
    R-squared:                                      Obs per group:
         Within  = 0.0032                                         min =          6
         Between = 0.0445                                         avg =        6.0
         Overall = 0.0166                                         max =          6
    
                                                    F(18, 53832)      =       9.69
    corr(u_i, Xb) = 0.1131                          Prob > F          =     0.0000
    
    ----------------------------------------------------------------------------------
         ArrivalIDPs | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -----------------+----------------------------------------------------------------
        HUMDelivered |   11.23654    6.46464     1.74   0.082    -1.434206    23.90729
                     |
               Round |
                 11  |   2.771986   8.585365     0.32   0.747     -14.0554    19.59937
                 12  |  -10.64786   8.612658    -1.24   0.216    -27.52874    6.233019
                 13  |   15.06621   8.631937     1.75   0.081    -1.852459    31.98487
                 14  |   52.79905   8.635658     6.11   0.000     35.87309    69.72501
                 16  |   31.35202   9.322178     3.36   0.001     13.08047    49.62356
                     |
      IDPInConflicts |
                  1  |  -19.43057   13.52439    -1.44   0.151    -45.93847    7.077339
                  2  |  -3.425755   12.49668    -0.27   0.784    -27.91935    21.06784
                  3  |  -1.334446   11.44944    -0.12   0.907    -23.77544    21.10655
                  4  |   1.823892   11.74439     0.16   0.877     -21.1952    24.84298
                  5  |   31.98156    10.9981     2.91   0.004     10.42519    53.53793
                     |
     IDPsNatDisaster |
                  1  |   30.34869   12.99162     2.34   0.019     4.885003    55.81237
                  2  |   8.353065   11.92255     0.70   0.484    -15.01522    31.72135
                  3  |   35.46722   10.80746     3.28   0.001      14.2845    56.64994
                  4  |   41.21825   11.11009     3.71   0.000     19.44237    62.99412
                  5  |   70.43386   11.74086     6.00   0.000     47.42169    93.44604
                     |
      1.pashtun_greg |          0  (omitted)
        1.HLTClinics |  -11.40218    10.1488    -1.12   0.261     -31.2939    8.489543
    1.EduSchoolExist |   13.95478    8.30587     1.68   0.093    -2.324793    30.23435
               _cons |    353.908   9.975406    35.48   0.000     334.3562    373.4599
    -----------------+----------------------------------------------------------------
             sigma_u |  1751.7956
             sigma_e |  627.38448
                 rho |  .88631833   (fraction of variance due to u_i)
    ----------------------------------------------------------------------------------
    F test that all u_i=0: F(10769, 53832) = 45.58               Prob > F = 0.0000
    Click image for larger version

Name:	Variables Description.png
Views:	1
Size:	385.5 KB
ID:	1755365





    Will appreciate any insights and assistance.
    Last edited by Ahmadullah Ahmadzai; 04 Jun 2024, 14:57.

  • #2
    Focus on the F-stat, not the R2. You've got a lot of variation with so many communities. It's not a problem.

    Comment


    • #3
      Ahmadullah:
      1) -xtreg,re- does not return R-sq, but Chi_Sq;
      2) with such large a sample size you should go -robust- or -vce(cluster panelid)- standard errors (both options do the very same job under -xtreg-);
      3) I'd compare your R_sq within with those reported in other research published in your research field: is there a relevant difference with yours?
      4) your low R_sq within may depend on your limited within panel variation in the -IDPInConflicts- regressor. It may also depend on the lack of continuous predictors;
      5) I'd check whether your regression is correctly specified, by replicating by hand the -linktest-, as in the following toy-example:
      Code:
      . use "https://www.stata-press.com/data/r18/nlswork.dta"
      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
      
      . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
      
      Fixed-effects (within) regression               Number of obs     =     28,510
      Group variable: idcode                          Number of groups  =      4,710
      
      R-squared:                                      Obs per group:
           Within  = 0.1087                                         min =          1
           Between = 0.1006                                         avg =        6.1
           Overall = 0.0865                                         max =         15
      
                                                      F(2, 4709)        =     507.42
      corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000
      
                                   (Std. err. adjusted for 4,710 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
               age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                   |
       c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                   |
             _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
      -------------+----------------------------------------------------------------
           sigma_u |   .4039153
           sigma_e |  .30245467
               rho |  .64073314   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . predict fitted, xb
      (24 missing values generated)
      
      . gen sq_fitted=fitted^2
      (24 missing values generated)
      
      . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
      
      Fixed-effects (within) regression               Number of obs     =     28,510
      Group variable: idcode                          Number of groups  =      4,710
      
      R-squared:                                      Obs per group:
           Within  = 0.1092                                         min =          1
           Between = 0.1033                                         avg =        6.1
           Overall = 0.0881                                         max =         15
      
                                                      F(2, 4709)        =     523.09
      corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000
      
                                   (Std. err. adjusted for 4,710 clusters in idcode)
      ------------------------------------------------------------------------------
                   |               Robust
           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
            fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
         sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
             _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
      -------------+----------------------------------------------------------------
           sigma_u |    .403403
           sigma_e |  .30238578
               rho |  .64025357   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      
      . test sq_fitted
      
       ( 1)  sq_fitted = 0
      
             F(  1,  4709) =    4.85
                  Prob > F =    0.0276
      
      .
      The outcome of the (redundant) -test- tells us that the model is misspecified (as expected) and calls for more predictors and/or interactions.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        George Ford Carlo Lazzaro Thank you both for the insightful comments.



        Carlo, for comment 4) Even without adding IDPInConflicts- regressor, the R-squared remains low. Maybe I will try to find and add some other relevant continuous predictors.

        Code:
        . xtreg ArrivalIDPs HUMDelivered i.Round i.SecSituation_numeric i.IDPsNatDisaster HLTClini
        > cs EduSchoolExist, fe robust
        
        Fixed-effects (within) regression               Number of obs     =     64,812
        Group variable: SettlementID                    Number of groups  =     10,802
        
        R-squared:                                      Obs per group:
             Within  = 0.0037                                         min =          6
             Between = 0.0200                                         avg =        6.0
             Overall = 0.0087                                         max =          6
        
                                                        F(18, 10801)      =      16.40
        corr(u_i, Xb) = 0.0738                          Prob > F          =     0.0000
        
                                      (Std. err. adjusted for 10,802 clusters in SettlementID)
        --------------------------------------------------------------------------------------
                             |               Robust
                 ArrivalIDPs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        ---------------------+----------------------------------------------------------------
                HUMDelivered |   14.14875   8.635855     1.64   0.101    -2.779111    31.07661
                             |
                       Round |
                         11  |   1.814061   7.532533     0.24   0.810    -12.95109    16.57921
                         12  |  -10.08923   10.52772    -0.96   0.338    -30.72549    10.54702
                         13  |   14.99009     10.826     1.38   0.166    -6.230865    36.21105
                         14  |   66.55325   13.06744     5.09   0.000     40.93867    92.16784
                         16  |   44.58953   18.47772     2.41   0.016     8.369797    80.80927
                             |
        SecSituation_numeric |
                          1  |  -91.40932   66.11488    -1.38   0.167    -221.0066    38.18798
                          2  |  -75.47755   65.16598    -1.16   0.247    -203.2148    52.25974
                          3  |   -99.7809   65.32189    -1.53   0.127    -227.8238      28.262
                          4  |  -136.2215   65.58824    -2.08   0.038    -264.7865   -7.656524
                          5  |  -130.4257   65.81546    -1.98   0.048    -259.4361    -1.41536
                             |
             IDPsNatDisaster |
                          1  |   18.08753   12.67376     1.43   0.154     -6.75537    42.93042
                          2  |   1.786606   8.408966     0.21   0.832    -14.69651    18.26972
                          3  |   31.52363   6.484614     4.86   0.000     18.81259    44.23466
                          4  |    44.5084   10.98231     4.05   0.000     22.98106    66.03573
                          5  |   83.99806   12.21556     6.88   0.000     60.05333    107.9428
                             |
                  HLTClinics |  -9.382916   8.267201    -1.13   0.256    -25.58815    6.822316
              EduSchoolExist |    12.7537   6.484551     1.97   0.049     .0427891    25.46461
                       _cons |   462.6464   65.39749     7.07   0.000     334.4553    590.8375
        ---------------------+----------------------------------------------------------------
                     sigma_u |  1750.6248
                     sigma_e |  626.37347
                         rho |  .88650847   (fraction of variance due to u_i)
        --------------------------------------------------------------------------------------

        For the comment 5)

        Code:
        . xtreg ArrivalIDPs HUMDelivered i.Round i.SecSituation_numeric i.IDPsNatDisaster i.IDPInC
        > onflicts HLTClinics EduSchoolExist, fe vce(cluster SettlementID)
        
        Fixed-effects (within) regression               Number of obs     =     64,812
        Group variable: SettlementID                    Number of groups  =     10,802
        
        R-squared:                                      Obs per group:
             Within  = 0.0041                                         min =          6
             Between = 0.0320                                         avg =        6.0
             Overall = 0.0136                                         max =          6
        
                                                        F(23, 10801)      =      13.34
        corr(u_i, Xb) = 0.0968                          Prob > F          =     0.0000
        
                                      (Std. err. adjusted for 10,802 clusters in SettlementID)
        --------------------------------------------------------------------------------------
                             |               Robust
                 ArrivalIDPs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        ---------------------+----------------------------------------------------------------
                HUMDelivered |   13.33004   8.592451     1.55   0.121    -3.512746    30.17282
                             |
                       Round |
                         11  |   2.125291   7.580392     0.28   0.779    -12.73367    16.98425
                         12  |  -10.43841   10.61373    -0.98   0.325    -31.24327    10.36645
                         13  |    14.2755   10.93973     1.30   0.192    -7.168382    35.71939
                         14  |   69.14291   12.99161     5.32   0.000     43.67697    94.60884
                         16  |   51.02739   18.19497     2.80   0.005     15.36191    86.69288
                             |
        SecSituation_numeric |
                          1  |  -91.85798   66.22678    -1.39   0.165    -221.6746    37.95868
                          2  |  -75.53399   65.24865    -1.16   0.247    -203.4333    52.36534
                          3  |  -100.5699   65.37836    -1.54   0.124    -228.7235    27.58372
                          4  |  -137.4823   65.65216    -2.09   0.036    -266.1726   -8.792037
                          5  |  -133.0657   65.90598    -2.02   0.044    -262.2536   -3.877911
                             |
             IDPsNatDisaster |
                          1  |   28.01189   15.35037     1.82   0.068    -2.077651    58.10143
                          2  |   7.602631   11.22829     0.68   0.498    -14.40688    29.61214
                          3  |   34.58329    9.64989     3.58   0.000     15.66774    53.49885
                          4  |   40.74288   14.74282     2.76   0.006     11.84424    69.64151
                          5  |   70.34094   13.16667     5.34   0.000     44.53186    96.15003
                             |
              IDPInConflicts |
                          1  |  -18.22021   11.17469    -1.63   0.103    -40.12466    3.684233
                          2  |  -4.655364   12.55325    -0.37   0.711    -29.26205    19.95132
                          3  |   -2.12734   11.15096    -0.19   0.849    -23.98526    19.73058
                          4  |   1.912029   12.45183     0.15   0.878    -22.49585    26.31991
                          5  |   34.17113   13.06128     2.62   0.009     8.568623    59.77365
                             |
                  HLTClinics |  -10.43011   8.283288    -1.26   0.208    -26.66688    5.806657
              EduSchoolExist |   13.02106   6.497105     2.00   0.045     .2855386    25.75658
                       _cons |   457.1701   65.61705     6.97   0.000     328.5486    585.7915
        ---------------------+----------------------------------------------------------------
                     sigma_u |    1749.34
                     sigma_e |  626.27148
                         rho |  .88639346   (fraction of variance due to u_i)
        --------------------------------------------------------------------------------------
        
        . predict fitted, xb
        
        . gen sq_fitted=fitted^2
        
        . xtreg ArrivalIDPs fitted sq_fitted , fe vce(cluster SettlementID )
        
        Fixed-effects (within) regression               Number of obs     =     64,812
        Group variable: SettlementID                    Number of groups  =     10,802
        
        R-squared:                                      Obs per group:
             Within  = 0.0046                                         min =          6
             Between = 0.0356                                         avg =        6.0
             Overall = 0.0148                                         max =          6
        
                                                        F(2, 10801)       =      99.04
        corr(u_i, Xb) = 0.1008                          Prob > F          =     0.0000
        
                              (Std. err. adjusted for 10,802 clusters in SettlementID)
        ------------------------------------------------------------------------------
                     |               Robust
         ArrivalIDPs | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
              fitted |  -3.492953   .8909718    -3.92   0.000    -5.239422   -1.746485
           sq_fitted |   .0053578   .0011192     4.79   0.000      .003164    .0075517
               _cons |   930.5709   175.6164     5.30   0.000     586.3304    1274.811
        -------------+----------------------------------------------------------------
             sigma_u |  1748.8951
             sigma_e |  625.98235
                 rho |  .88643522   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . test sq_fitted
        
         ( 1)  sq_fitted = 0
        
               F(  1, 10801) =   22.92
                    Prob > F =    0.0000
        Could you please add your comment about this result?



        Also with Lag1 of the HUMDelivered, and by taking the log of ArrivalIDPs, the between R-squared increased to 0.21, Do you think taking the log of ArrivalIDPs here makes sense?
        Code:
        . xtreg logarrival L1.HUMDelivered i.Round i.SecSituation_numeric i.IDPsNatDisaster i.IDPI
        > nConflicts HLTClinics EduSchoolExist, fe robust
        
        Fixed-effects (within) regression               Number of obs     =     43,208
        Group variable: SettlementID                    Number of groups  =     10,802
        
        R-squared:                                      Obs per group:
             Within  = 0.0431                                         min =          4
             Between = 0.2113                                         avg =        4.0
             Overall = 0.1333                                         max =          4
        
                                                        F(21, 10801)      =      32.74
        corr(u_i, Xb) = 0.2850                          Prob > F          =     0.0000
        
                                      (Std. err. adjusted for 10,802 clusters in SettlementID)
        --------------------------------------------------------------------------------------
                             |               Robust
                  logarrival | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        ---------------------+----------------------------------------------------------------
                HUMDelivered |
                         L1. |   .0271499   .0144459     1.88   0.060    -.0011668    .0554666
                             |
                       Round |
                         12  |   .0045459   .0106542     0.43   0.670    -.0163383    .0254301
                         13  |   .0494532   .0130331     3.79   0.000     .0239059    .0750005
                         14  |   .1891757   .0187977    10.06   0.000     .1523286    .2260227
                             |
        SecSituation_numeric |
                          1  |   .2584217   .1694418     1.53   0.127    -.0737153    .5905588
                          2  |   .1378936   .1673402     0.82   0.410    -.1901239     .465911
                          3  |   .1157455   .1680523     0.69   0.491    -.2136678    .4451588
                          4  |   .1280523   .1676331     0.76   0.445    -.2005393     .456644
                          5  |   .1639102   .1694685     0.97   0.333    -.1682791    .4960995
                             |
             IDPsNatDisaster |
                          1  |   .1747923    .030488     5.73   0.000     .1150302    .2345544
                          2  |   .1141439   .0282058     4.05   0.000     .0588553    .1694325
                          3  |   .1460408   .0263818     5.54   0.000     .0943277    .1977539
                          4  |   .1673155   .0277441     6.03   0.000      .112932    .2216989
                          5  |   .2807403   .0305979     9.18   0.000     .2207628    .3407179
                             |
              IDPInConflicts |
                          1  |    .200513   .0337742     5.94   0.000     .1343094    .2667166
                          2  |   .2969831   .0329146     9.02   0.000     .2324644    .3615018
                          3  |   .4258252   .0301033    14.15   0.000     .3668171    .4848332
                          4  |   .5013003   .0323385    15.50   0.000     .4379109    .5646898
                          5  |    .542521   .0323485    16.77   0.000     .4791119      .60593
                             |
                  HLTClinics |    .142712   .0323354     4.41   0.000     .0793288    .2060953
              EduSchoolExist |   .1193473   .0260984     4.57   0.000     .0681896     .170505
                       _cons |   3.062243   .1700464    18.01   0.000     2.728921    3.395565
        ---------------------+----------------------------------------------------------------
                     sigma_u |  2.3186362
                     sigma_e |  1.0312254
                         rho |  .83485894   (fraction of variance due to u_i)
        --------------------------------------------------------------------------------------

        Thank you for your time and valuable feedback.

        Comment


        • #5
          Ahmadullah:
          1) the null of correct specified regression is clearly rejected: your model needs more (and/or different) predictors and interactions;
          2) I do not think that logging adds remarkable reward there.
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Thank you, Carlo. Your response was incredibly helpful, I'll definitely work further on my model with these new insights.

            Comment

            Working...
            X