Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data regression

    Dear Researchers,
    I am trying to regress GDP, population, irrigation, rainfall, temperature, the area under non-agricultural use, credit institutions, the area under small and marginal farms, and average land size on agricultural land use at the district level using secondary panel data which has been taken from agricultural census data of four time periods of 2000,2005,2010 and 2015. It has 4 time periods and 27 districts( cross-sections). The Hausman test shows that RE model is the appropriate model. But when I regress without adding one of the independent variables i.e.the Non-agricultural area, the chosen model is the FE model by the Hausman test. Should I choose the RE model or the FE model? Should I do more tests before concluding the results? I was wondering if you can suggest which model will be appropriate in this case which would be very much helpful and I would like to appreciate the same. Thank you.

  • #2
    My understanding is that the fixed effects model accounts for all of the variation in the outcome due to temporal autocorrelation - that is, the things that are constant (or at least relatively stable) about each of your subjects over time. This is why you can't include temporally invariant predictors in a fixed effects model: you've already accounted for temporally-invariant (and therefore cross-sectional) variation across subjects. In contrast, the random effects model assumes that the temporal autocorrelation is identically and independently distributed part of the error term - meaning that the model assumes you've explained the effect of temporal autocorrelation on your outcome with your independent variables. If your Hausman test shows that the RE model is appropriate, this is a great result, because it means that you are accounting for as much auto-correlative variation as in the fixed effects model, but you can also explain that variation with your independent variables. I hope we can agree that, all else equal, it's better to explain the variation than to simply account for it. When you take an independent variable out of the model, and the Hausman test no longer indicates the RE model is appropriate, what this means is that you need that independent variable to explain some of the temporal autocorrelation across subjects.

    I might start by thinking through theoretically which variables you might expect will explain the temporal autocorrelation across subjects. Then, as a follow up, you might take your full model and estimate it as a fixed effects model (ignoring the Hausman test). Any time-invariant predictors in your model should fall out completely (zero coefficient, missing standard errors and etc.), and any time-variant predictors that none the less explain the effect of temporal autocorrelation on your outcome should ether go non-significant (indicating that the variable only explains temporal autocorrelation and has no direct effect on your outcome outside of the temporally auto-correlative variation). You might also remove each independent variable from the model one at a time and see if the Hausman test still indicates that a random effect model is appropriate. If the Hausman test no longer indicates the RE model is appropriate, then you know that variable has some effect on the outcome related to temporal autocorrelation. This might give you some extra insight into what is going on with the model.

    Overall, it seems to me like you have good evidence that the RE model is appropriate, and that your model is underspecified when you drop certain independent variables. I would focus on the RE model and report the RE model in the writeup, with maybe a paragraph or two discussion on why you use the RE model instead of the FE model, that you did some robustness checks with the FE model, and that the results of the FE model are available on request.

    Edit: looking back at this again, you might also take care not to over-interpret variables that are no longer significant in the FE model. I use the language of mediation above, but it is not quite right to say that variables that fall out in the FE model have no direct effect on the outcome. There may be a direct effect, but the effect is not independent within the same subject across time. That is, the effect depends on the subject. I'm sorry if this isn't clear. Statistics are hard, and I'm struggling to talk about this in a way that is both accurate and understandable.
    Last edited by Daniel Schaefer; 14 Jun 2023, 10:30.

    Comment


    • #3
      Radhica:
      as an aside the Daniel's hepful reply, I'd also wonder about the relevance of the -Non-agricultural area- variable in your data generating process. Is it an independent variable or a (so-called) control?
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you, Daniel Schaefer and Carlo Lazzaro, for your kind reply to my query on FE and RE choice in panel data regression. My sincere gratitude to both of you. As Daniel mentioned, temporal autocorrelation has been accounted in the FE model, and the RE model tries to explain it in the independent variables instead of accounting . As per your suggestion, I have estimated the full model first and removed and checked when it becomes the FE model from the RE model. Following is the first model (FE) included with all the variables. Hausman test chooses the RE model. When I remove the variable named area under nonagricultural land, it becomes the FE model with all the other variables.

        I have included year-fixed effects as i.year in both the FE and RE models.

        Also, I have done a vif test to know the multicollinearity, and it shows values of more than 10 for all the variables. What should I do in this regard?

        The following command, I have used here to know multicollinearity in the panel regression model. (is it the correct method to check multicollinearity).
        Also, I have checked for heteroskedasticity using the command xttest3, which shows the presence of heteroskedasticity.


        reg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal shareofsc shareofst averagelandsize_ha shareofnonagriareainga i.year i. districtid
        .vif

        Results show a value of more than 10 for all the variables. Is it the right method?
        ​​​​​​​
        ​​​​​​

        1. Should I include non-agricultural land as a variable, as its removal will result in the choice of the FE model?
        2. The method of testing multicollinearity in the FE model is correct, as mentioned above, or not?
        3. What do we do if we have more than ten values in the vif test?
        4. For correcting heteroskedasticity, what do we do ?

        As mentioned in Carlo Lazzaro's reply on the non-agricultural area as an independent variable or control, I have used it as an independent variable, assuming the share of non - agricultural area in the district will have a positive relationship with the process of diversification towards horticulture crops.

        Once again, I am grateful to you for your kind suggestions here, and I would like to appreciate the same.

        Thank you

        Radhika C

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Radhica:
          as an aside the Daniel's hepful reply, I'd also wonder about the relevance of the -Non-agricultural area- variable in your data generating process. Is it an independent variable or a (so-called) control?
          First I was expecting to use it as a proxy variable for the urbanization at the district level. But later learned that the variable itself has direct effect on the outcome variable

          Comment


          • #6
            Originally posted by Daniel Schaefer View Post
            My understanding is that the fixed effects model accounts for all of the variation in the outcome due to temporal autocorrelation - that is, the things that are constant (or at least relatively stable) about each of your subjects over time. This is why you can't include temporally invariant predictors in a fixed effects model: you've already accounted for temporally-invariant (and therefore cross-sectional) variation across subjects. In contrast, the random effects model assumes that the temporal autocorrelation is identically and independently distributed part of the error term - meaning that the model assumes you've explained the effect of temporal autocorrelation on your outcome with your independent variables. If your Hausman test shows that the RE model is appropriate, this is a great result, because it means that you are accounting for as much auto-correlative variation as in the fixed effects model, but you can also explain that variation with your independent variables. I hope we can agree that, all else equal, it's better to explain the variation than to simply account for it. When you take an independent variable out of the model, and the Hausman test no longer indicates the RE model is appropriate, what this means is that you need that independent variable to explain some of the temporal autocorrelation across subjects.

            I might start by thinking through theoretically which variables you might expect will explain the temporal autocorrelation across subjects. Then, as a follow up, you might take your full model and estimate it as a fixed effects model (ignoring the Hausman test). Any time-invariant predictors in your model should fall out completely (zero coefficient, missing standard errors and etc.), and any time-variant predictors that none the less explain the effect of temporal autocorrelation on your outcome should ether go non-significant (indicating that the variable only explains temporal autocorrelation and has no direct effect on your outcome outside of the temporally auto-correlative variation). You might also remove each independent variable from the model one at a time and see if the Hausman test still indicates that a random effect model is appropriate. If the Hausman test no longer indicates the RE model is appropriate, then you know that variable has some effect on the outcome related to temporal autocorrelation. This might give you some extra insight into what is going on with the model.

            Overall, it seems to me like you have good evidence that the RE model is appropriate, and that your model is underspecified when you drop certain independent variables. I would focus on the RE model and report the RE model in the writeup, with maybe a paragraph or two discussion on why you use the RE model instead of the FE model, that you did some robustness checks with the FE model, and that the results of the FE model are available on request.

            Edit: looking back at this again, you might also take care not to over-interpret variables that are no longer significant in the FE model. I use the language of mediation above, but it is not quite right to say that variables that fall out in the FE model have no direct effect on the outcome. There may be a direct effect, but the effect is not independent within the same subject across time. That is, the effect depends on the subject. I'm sorry if this isn't clear. Statistics are hard, and I'm struggling to talk about this in a way that is both accurate and understandable.
            Thank you Daniel, for your elaborate reply and helpful suggestions. Please find attached my results from Stata 11 on panel data regression, along with diagnostic test results. Expect your kind perusal of the same and valuable suggestions.


            Following is the first model (FE) included with all the variables. Hausman test chooses the RE model. When I remove the variable named area under nonagricultural land, it becomes the FE model with all the other variables.

            I have included year-fixed effects as i.year in both the FE and RE models.


            ​​​​​​

            1. Should I include non-agricultural land as a variable, as its removal will result in the choice of the FE model?
            2. The method of testing multicollinearity in the FE model is correct, as mentioned above, or not?
            3. What do we do if we have more than ten values in the vif test?
            4. For correcting heteroskedasticity, what do we do ?






            Also, I have done a vif test to know the multicollinearity, and it shows values of more than 10 for all the variables. What should I do in this regard?

            The following command, I have used here to know multicollinearity in the panel regression model. (is it the correct method to check multicollinearity).
            Also, I have checked for heteroskedasticity using the command xttest3, which shows the presence of heteroskedasticity.


            reg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal shareofsc shareofst averagelandsize_ha shareofnonagriareainga i.year i. districtid
            .vif

            Results show a value of more than 10 for all the variables. Is it the right method?​​​​​​​



            Last edited by Radhika Channanamchery; 21 Jun 2023, 04:58.

            Comment


            • #7
              Please find attached my results from the estimation as FE and RE models, along with diagnostic tests on multicollinearity.
              Results June 21.docx

              Comment


              • #8
                Radhica:
                1. You should include non-agricultural land as a variable, as long as it a relevant predictor.
                2. I would have used -estat vce, corr-. However, oftentimes the risk of multicollineraity are overestimated (see https://www.hup.harvard.edu/catalog....=9780674175440, Chapter 23)
                3. if some predictor is a square of a linear one, high collinearity is granted.
                4. For correcting heteroskedasticity, w you should use -robust- or -vce(cluster panelid)- that do the very same jobunder -xtreg-. However, once you have invoked non-default standard errors, you should swutch from -hausman- to the community-contributed module -xtoverid- to assess which one of the two specification works better with your dataset.
                Last edited by Carlo Lazzaro; 21 Jun 2023, 07:45.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Radhica:
                  1. You should include non-agricultural land as a variable, as long as it a relevant predictor.
                  2. I would have used -estat vce, corr-. However, oftentimes the risk of multicollineraity are overestimated (see https://www.hup.harvard.edu/catalog....=9780674175440, Chapter 23)
                  3. if some predictor is a square of a linear one, high collinearity is granted.
                  4. For correcting heteroskedasticity, w you should use -robust- or -vce(cluster panelid)- that do the very same jobunder -xtreg-. However, once you have invoked non-default standard errors, you should swutch from -hausman- to the community-contributed module -xtoverid- to assess which one of the two specification works better with your dataset.
                  Dear Carlo,

                  Thank you so much for your kind reply. You have been most helpful. May I please kindly ask you few more questions related to my research.


                  As per your previous reply, I have run the following command.

                  Please look into the results of estat vce,corr command after fixed effects regression.

                  xtset districtid yearid
                  panel variable: districtid (strongly balanced)
                  time variable: yearid, 1 to 4
                  delta: 1 unit

                  . xtreg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm s
                  > hareofiaingca sharegcasmallintotgca sharemarginal averagelandsize_ha shareofsc shareofnonagriareainga i. year,fe


                  estat vce,corr


                  e(V) | meante~e rainfall popula~m gddppe~s roadpe~m number~m sha~ngca shareg~a sharem~l averag~a shareo~c shar~nga
                  -------------+------------------------------------------------------------------------------------------------------------------------
                  meantemper~e | 1.0000
                  rainfall | -0.0354 1.0000
                  population~m | -0.0943 -0.1556 1.0000
                  gddppercap~s | 0.0840 0.1202 -0.1917 1.0000
                  roadper100km | 0.1277 -0.1171 -0.1376 0.2097 1.0000
                  numberofba~m | -0.0374 0.0196 0.2112 -0.0633 -0.2241 1.0000
                  shareofiai~a | 0.2053 0.0805 -0.0366 -0.0023 -0.0437 0.0998 1.0000
                  sharegcasm~a | 0.0805 0.0518 0.0649 -0.0262 -0.1033 0.1663 -0.3855 1.0000
                  sharemargi~l | -0.1149 0.0124 -0.1787 -0.1708 0.1434 -0.3590 -0.1023 -0.3570 1.0000
                  averagelan~a | -0.0701 0.1100 0.0620 -0.3818 -0.0171 0.0317 -0.0843 0.3761 0.0078 1.0000
                  shareofsc | -0.1232 0.0557 0.2219 0.1273 -0.1419 0.3397 -0.2161 0.1802 0.1837 -0.0479 1.0000
                  shareofnon~a | 0.0430 0.0807 -0.6431 -0.2124 0.0341 -0.1001 0.0394 -0.0003 0.0144 0.1242 -0.0062 1.0000
                  2006.year | 0.0455 -0.0950 0.1866 -0.4816 -0.4521 0.1526 -0.0974 0.4321 -0.1483 0.5328 -0.0175 -0.0016
                  2011.year | -0.0903 -0.0423 0.1923 -0.7331 -0.6138 0.0720 -0.1630 0.2703 -0.0168 0.5400 -0.0519 0.0777
                  2016.year | -0.1664 0.0863 0.2084 -0.7571 -0.5495 0.0456 -0.0319 0.2301 -0.0602 0.6441 -0.1019 0.0975
                  _cons | -0.7093 -0.1295 0.0437 0.0974 -0.0984 -0.1512 -0.0746 -0.4445 0.0262 -0.4579 -0.2957 -0.2377

                  | 2006. 2011. 2016.
                  e(V) | year year year _cons
                  -------------+----------------------------------------
                  2006.year | 1.0000
                  2011.year | 0.7867 1.0000
                  2016.year | 0.7461 0.9280 1.0000
                  _cons | -0.3040 -0.1507 -0.1388 1.0000

                  Sorry that it has not come good after copy paste.


                  Kindly help me with the following concerns w.r.t. the inclusion of certain variables.

                  .
                  1. What is the cut-off value to decide on collinearity?

                  For the variable, share of the non-agricultural area, it was -0.64 with variable population. I think it points towards collinearity between the two.


                  2. Similarly, for the year fixed effects variable, it shows a value on the higher side. Should I remove year-fixed effects in the model based on these results?

                  3. for the heteroskedasticity problem should I use vce(robust) or vce(clusterid) ? what is the difference?

                  4. Could you please share insights on how to use xtoverid- as I am learning it for the first time? Why do we need to change from the Hausman test?

                  Once again, thank you for your kind help.

                  Radhika






                  Comment


                  • #10
                    Radhica:
                    1) in his "Multiple Regression: A Primer", Pine Forge Press, 1999, Paul Allison reports on a possible cut-off to be > 0.60 (page 140). However, as he wrote, this cut-off should be read considering all the coefficients. In addition, you may have apparent case of multicollinearity when you include, say, a linear and a squared term for a given predictor;
                    2) it's a good habit to keep -i.year- in the right-hand side of your -fe- regression equation. It's easy to test the joint significance of -i.year- via -testparm-:
                    3) both options do the very same job under -xtreg- (please note that this feature does not hold for -regress-);
                    4) the community-contributed -xtoverid- needs the -re- regression only, its null being that -re- is the way to go. If the null is rejected, -fe- (or rarely pooled OLS. if no evidence of panel-wise effect is detected) is the right specification for your regression.
                    In addition, being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. Thus, you have to prefix your -xtreg- code with -xi:- (and create interaction by hand):
                    Code:
                    . use "https://www.stata-press.com/data/r17/nlswork.dta"
                    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                    
                    . xi: xtreg ln_wage i.south, vce(cluster idcode)
                    i.south           _Isouth_0-1         (naturally coded; _Isouth_0 omitted)
                    
                    Random-effects GLS regression                   Number of obs     =     28,526
                    Group variable: idcode                          Number of groups  =      4,711
                    
                    R-squared:                                      Obs per group:
                         Within  = 0.0012                                         min =          1
                         Between = 0.0396                                         avg =        6.1
                         Overall = 0.0367                                         max =         15
                    
                                                                    Wald chi2(1)      =      92.50
                    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
                    
                                                 (Std. err. adjusted for 4,711 clusters in idcode)
                    ------------------------------------------------------------------------------
                                 |               Robust
                         ln_wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                       _Isouth_1 |  -.1192071   .0123943    -9.62   0.000    -.1434995   -.0949148
                           _cons |   1.704133   .0076527   222.68   0.000     1.689134    1.719132
                    -------------+----------------------------------------------------------------
                         sigma_u |  .37723921
                         sigma_e |  .32013261
                             rho |  .58134284   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    
                    . xtoverid
                    
                    Test of overidentifying restrictions: fixed vs random effects
                    Cross-section time-series model: xtreg re  robust cluster(idcode)
                    Sargan-Hansen statistic  25.757  Chi-sq(1)    P-value = 0.0000
                    
                    .
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Originally posted by Carlo Lazzaro View Post
                      Radhica:
                      1) in his "Multiple Regression: A Primer", Pine Forge Press, 1999, Paul Allison reports on a possible cut-off to be > 0.60 (page 140). However, as he wrote, this cut-off should be read considering all the coefficients. In addition, you may have apparent case of multicollinearity when you include, say, a linear and a squared term for a given predictor;
                      2) it's a good habit to keep -i.year- in the right-hand side of your -fe- regression equation. It's easy to test the joint significance of -i.year- via -testparm-:
                      3) both options do the very same job under -xtreg- (please note that this feature does not hold for -regress-);
                      4) the community-contributed -xtoverid- needs the -re- regression only, its null being that -re- is the way to go. If the null is rejected, -fe- (or rarely pooled OLS. if no evidence of panel-wise effect is detected) is the right specification for your regression.
                      In addition, being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. Thus, you have to prefix your -xtreg- code with -xi:- (and create interaction by hand):
                      Code:
                      . use "https://www.stata-press.com/data/r17/nlswork.dta"
                      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                      
                      . xi: xtreg ln_wage i.south, vce(cluster idcode)
                      i.south _Isouth_0-1 (naturally coded; _Isouth_0 omitted)
                      
                      Random-effects GLS regression Number of obs = 28,526
                      Group variable: idcode Number of groups = 4,711
                      
                      R-squared: Obs per group:
                      Within = 0.0012 min = 1
                      Between = 0.0396 avg = 6.1
                      Overall = 0.0367 max = 15
                      
                      Wald chi2(1) = 92.50
                      corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
                      
                      (Std. err. adjusted for 4,711 clusters in idcode)
                      ------------------------------------------------------------------------------
                      | Robust
                      ln_wage | Coefficient std. err. z P>|z| [95% conf. interval]
                      -------------+----------------------------------------------------------------
                      _Isouth_1 | -.1192071 .0123943 -9.62 0.000 -.1434995 -.0949148
                      _cons | 1.704133 .0076527 222.68 0.000 1.689134 1.719132
                      -------------+----------------------------------------------------------------
                      sigma_u | .37723921
                      sigma_e | .32013261
                      rho | .58134284 (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      
                      . xtoverid
                      
                      Test of overidentifying restrictions: fixed vs random effects
                      Cross-section time-series model: xtreg re robust cluster(idcode)
                      Sargan-Hansen statistic 25.757 Chi-sq(1) P-value = 0.0000
                      
                      .
                      Dear Carlo,

                      Thank you for your kind reply.

                      1) I have tried to apply xtoverid after running RE model with the following specification
                      . xtreg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal averagelandsize_ha shareofsc shareofst shareofnonagriareainga i. year,re vce(cluster districtid)

                      and then the command


                      . xtoverid

                      unrecognized command: xtoverid

                      r(199);

                      The above-given error message has occurred.


                      2) Also, when I tried testparm command :

                      xtreg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal averagelandsize_ha shareofsc shareofst shareofnonagriareainga i. year,fe

                      . testparm i. year

                      ( 1) 2006.year = 0
                      ( 2) 2011.year = 0
                      ( 3) 2016.year = 0

                      F( 3, 65) = 3.25
                      Prob > F = 0.0274

                      Is this the right specification? How will we interpret the results here?



                      Your kind advice on this is very much appreciated, and thanks again for your kind response.



                      Comment


                      • #12
                        Radhica:
                        1) type -search xtoverid- and follow the instruction to install it along with the ancillary modules;
                        2) the -testparm. outcome tells you that, being jointly statistically significant. -i.year- should be kept in the right-hand side of your regression equation.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Radhica:
                          1) type -search xtoverid- and follow the instruction to install it along with the ancillary modules;
                          2) the -testparm. outcome tells you that, being jointly statistically significant. -i.year- should be kept in the right-hand side of your regression equation.
                          Dear Carlo,

                          Thank you very much for your kind help with my queries. It has helped my research a lot, and I sincerely express my gratitude to you for sharing your insights here. I have learned diagnostic tests like testparm and xtoverid from you, which helps a lot. Please find the following results of xtoverid test of choice between FE and RE.



                          .xtreg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal averagelandsize_ha shareofsc shareofst shareofnonagriareainga ,re cluster( districtid)

                          .xtoverid

                          Test of overidentifying restrictions: fixed vs random effects
                          Cross-section time-series model: xtreg re robust cluster(districtid)
                          Sargan-Hansen statistic 108.029 Chi-sq(13) P-value = 0.0000



                          As per your previous reply Ho is RE is the model. so based on the P value, we need to reject the RE model and choose FE. Am I correct?


                          ( I could not understand this: In addition, being glorious but a bit old-fashioned, -xtoverid- does not support -fvvarlist- notation. Thus, you have to prefix your -xtreg- code with -xi:- (and create interaction by hand): ) . Without adding xi as a prefix it has worked. Or is there any specification error?

                          The results of the Hausman test said to choose the RE model, not FE.

                          Then I checked for panel effect using the Breush Pagan LM test using xttest0

                          xtreg percentshareofhortareaingca meantemperature rainfall populationdensitypersqkm gddppercapitars roadper100km numberofbanksper1000sqkm shareofiaingca sharegcasmallintotgca sharemarginal averagelandsize_ha shareofsc shareofst shareofnonagriareainga ,re cluster( districtid)

                          xttest0

                          Breusch and Pagan Lagrangian multiplier test for random effects

                          percentshareofhortareaingca[districtid,t] = Xb + u[districtid] + e[districtid,t]

                          Estimated results:
                          | Var sd = sqrt(Var)
                          ---------+-----------------------------
                          pe~aingca | 238.037 15.42845
                          e | 11.56119 3.400175
                          u | 65.06692 8.066407

                          Test: Var(u) = 0
                          chibar2(01) = 75.32
                          Prob > chibar2 = 0.0000

                          Which showed there is a significant difference across panels.


                          My concern is in the choice between FE and RE.

                          Should I go with the xtoverid results or the Hausman and BPLM test results ?

                          Or any other tests are there to check with?

                          Please share your insights on this.

                          Thank you for your kind help

                          Radhika









                          Comment


                          • #14
                            Radhika:
                            1) you did not have -fvvarlist- related notations in the right-hand side of your regerssion equation: that's why -xtoverid- did not complain;
                            2) as per -xtoverid- outcome, you should go -fe-;
                            3) -hausman- is out of debate here, because it does not support non-default standard errors;
                            4) in sum: I'd follow -xtoverid- indications and go -fe- as painful as it may be.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Originally posted by Carlo Lazzaro View Post
                              Radhika:
                              1) you did not have -fvvarlist- related notations in the right-hand side of your regerssion equation: that's why -xtoverid- did not complain;
                              2) as per -xtoverid- outcome, you should go -fe-;
                              3) -hausman- is out of debate here, because it does not support non-default standard errors;
                              4) in sum: I'd follow -xtoverid- indications and go -fe- as painful as it may be.

                              Dear Carlo,

                              Thank you very much for your kind reply. It helps a lot with choosing the final model for my research.

                              I have a few more questions in this regard.


                              1. Why when the OLS regression model is not heteroscedastic (I tested using -estat hettest - command) but FE model is heteroscedatic . When I tested xttest0 command after running the RE reg model, which showed a P value of 0 , i.e., which supports choosing the RE model instead of OLS, finally, due to the problem of Heteroscedasticity, I used xtoverid command, and the test says to go for FE, which you suggested in your previous reply. Is it normal to have homoscedasticity in OLS regression and heteroscedasticity in the FE model? Am I doing the right diagnostic test for heteroscedasticity using the hettest3 -command? I have used predict resid, residual, and then -sktest resid- which shows p (skewness), P(kurtosis), and joint P all values less than 0.05, which means we reject the H0 of constant variance, isn't it? Both hettest3 and sktest show the same result of the presence of heteroscedasticity.


                              2. I have run xtserial command without including the year dummy variable i.year (which was not accepting its inclusion) and got a p-value of 0.003. It shows the model has serial autocorrelation. Do I need to correct it, as my model has 27 districts (N) and 4 time periods(T). ?

                              3. Do I need to do any other diagnostic tests other than heteroscedasticity and autocorrelation?

                              Expecting your kind reply

                              Thank you

                              Radhika






                              Comment

                              Working...
                              X