Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello

    I ran the following regression to model accounting performance of Indian firms for the period 2001-16. Two independent variables namely Leverage1 and CurrentRatio are endogenous on account of simultaneity. Other three independent variables namely are Size2, lnAgeofthefirm and SalesGrowth are exogenous. Although all independent variables are statistically significant, the AR(2) and Sargan-Hansen test do not get satisfied. I have tried many combinations of lag lengths with collapse and without collapse options. However, results do not conform to the diagnostic tests of AR(2) and Sargan-Hansen test. Is there anything we can change in the model so that conditions of AR(2) and Sargan-Hansen are met?

    Code:
    xtdpdgmm Profitability4 L.Profitability4 Size2 lnAgeofthefirm Leverage1 CurrentRatio SalesGrowth , t
    > effects twostep vce(cluster CompanyID) gmmiv(L.Profitability4, lag(1 2)  model(fodev)) gmmiv(Leverag
    > e1 CurrentRatio, lag(1 1) model(fodev)) iv(Size2 lnAgeofthefirm SalesGrowth , model(level)) nofootno
    > te
    
    Generalized method of moments estimation
    
    Fitting full model:
    Step 1         f(b) =  .00024728
    Step 2         f(b) =  .05463668
    
    Group variable: CompanyID                    Number of obs         =     20574
    Time variable: Year                          Number of groups      =      1657
    
    Moment conditions:     linear =      71      Obs per group:    min =         1
                        nonlinear =       0                        avg =  12.41642
                            total =      71                        max =        15
    
                                (Std. Err. adjusted for 1,657 clusters in CompanyID)
    --------------------------------------------------------------------------------
                   |              WC-Robust
    Profitability4 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    Profitability4 |
               L1. |   .6668446   .0262416    25.41   0.000     .6154121    .7182772
                   |
             Size2 |    .001875   .0005509     3.40   0.001     .0007952    .0029548
    lnAgeofthefirm |   .0024152   .0011865     2.04   0.042     .0000897    .0047406
         Leverage1 |   .0495562   .0120445     4.11   0.000     .0259494     .073163
      CurrentRatio |  -.0006555   .0002972    -2.21   0.027    -.0012379   -.0000731
       SalesGrowth |   .0357161   .0018874    18.92   0.000     .0320168    .0394154
                   |
              Year |
             2003  |    .002406   .0020818     1.16   0.248    -.0016743    .0064863
             2004  |   .0069989   .0020466     3.42   0.001     .0029877    .0110102
             2005  |    .004757   .0020531     2.32   0.021     .0007329    .0087811
             2006  |   .0086493   .0020496     4.22   0.000     .0046321    .0126664
             2007  |   .0050735   .0020601     2.46   0.014     .0010357    .0091112
             2008  |   .0046124    .002133     2.16   0.031     .0004317    .0087931
             2009  |  -.0105933   .0021875    -4.84   0.000    -.0148807   -.0063059
             2010  |   .0152441    .002063     7.39   0.000     .0112006    .0192876
             2011  |   -.000839   .0021657    -0.39   0.698    -.0050836    .0034057
             2012  |  -.0072511   .0021387    -3.39   0.001    -.0114428   -.0030594
             2013  |  -.0026715   .0021785    -1.23   0.220    -.0069412    .0015982
             2014  |  -.0036378   .0022557    -1.61   0.107    -.0080589    .0007834
             2015  |  -.0026539   .0022484    -1.18   0.238    -.0070607    .0017528
             2016  |    .003732   .0023898     1.56   0.118    -.0009518    .0084159
                   |
             _cons |  -.0172578   .0060395    -2.86   0.004     -.029095   -.0054206
    --------------------------------------------------------------------------------
    
    . estat serial
    
    Arellano-Bond test for autocorrelation of the first-differenced residuals
    H0: no autocorrelation of order 1:     z =  -16.8996   Prob > |z|  =    0.0000
    H0: no autocorrelation of order 2:     z =    2.3835   Prob > |z|  =    0.0171
    
    . estat overid
    
    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid
    
    2-step moment functions, 2-step weighting matrix       chi2(50)    =   90.5330
                                                           Prob > chi2 =    0.0004
    
    2-step moment functions, 3-step weighting matrix       chi2(50)    =   90.0441
                                                           Prob > chi2 =    0.0004
    Thanks!

    Comment


    • You are assuming that Size2 lnAgeofthefirm SalesGrowth are strictly exogenous and in particular also uncorrelated with the unobserved company-specific effects. This is a strong assumption.

      Another explanation might be that your model is dynamically misspecified. Further lags of the dependent variable and/or the independent variables might have predictive power when added as regressors. This could help to deal with the serial correlation (and thus also with the overidentification tests). See slides 90 onwards of my 2019 London Stata Conference presentation.
      https://twitter.com/Kripfganz

      Comment


      • Thanks for your quick reply, Prof. Kripfganz. I saw your PPT

        1. On slide 90, you mention that higher order lags of dependent variable (or independent variables) may have predictive power and their inclusion may help resolve auto-correlation. However, I have studied in some papers (for eg. Flannery and Hankins, 2013) that instruments in system GMM are invalidated due to presence of second order serial correlation. In this regard, is it fine to include higher order lags of dependent variable as independent variables in the model? Actually, I have not seen a system GMM regression in empirical research which includes higher order lags (second and onward) of dependent variable as regressors in the model.

        2. Even when I include the second lag of the dependent variable as an independent variable in my model, it comes out to be insignificant. Should we still include it in our model?

        3. I considered the three regressors namely Size2 lnAgeofthefirm SalesGrowth as endogenous by putting them in gmmiv bracket. This also didn't improve diagnostic tests. Whenever we are not sure about a variable being endogenous or exogenous, is it a safer choice to consider it as endogenous?

        4. Should I follow the sequential model selection process you describe in your PPT? If yes, is there a relevant text available for the same because I am not able to understand it fully as of now..

        Thanks!

        Reference:

        Flannery, M. and Hankins, K., 2013. Estimating dynamic panel models in corporate finance. Journal of Corporate Finance, 19, pp.1-19.
        Last edited by Prateek Bedi; 08 Apr 2020, 09:52.

        Comment


        • 1. The invalidity of some instruments might precisely be caused by the omission of these variables as regressors from the regression model. If these variables are not included as regressors, they might end up in the error term if they have a nonzero coefficient. When they become part of the error term, we cannot use these variables as instruments any more because they would be naturally correlated with the error term. The other way round, adding those variables (lags of existing variables) as regressors, removes them from the error term and thus could help to improve the validity of the instruments. The discussion in my presentation slides is motivated from a recent paper by Jan Kiviet:
          2. If it helps to improve the AR and Hansen test, it might be worth including it even if it is statistically insignificant. If not, then there is probably no point for including it.

          3. As often, it depends. If the instruments are strong enough, then treating the variables as endogenous might be the safer approach. If the instruments are weak, then treating the variables as endogenous might possibly do more harm than good.

          4. Please see the reference given to 1.
          https://twitter.com/Kripfganz

          Comment


          • Thanks a lot, Prof. Kripfganz for clearing my doubts! I can relate to most of your points. Here are some follow-up questions:

            1. Inclusion of lags of independent variables is alright. But if inclusion of second lag (and further lags) of dependent variable as regressor in dynamic panel regressions is not prohibited, then why do some papers mention that absence of second order serial correlation is required for the results to be valid? Specifically, I have not seen any model which has second lag (and beyond) of dependent variable as a regressor. So far, I thought that it is not allowed to include second lag of dependent variables as regressor in dynamic panel regression.

            2. Further, if we include second lag of dependent variable in our model, don't you think that the hypothesis testing for AR(2) becomes redundant since the researcher himself believes that the dependent variables follows a dynamic data generating process of order 2 i.e. an AR(2) process (which is why he included second lag of dependent variable as a regressor in the first place).

            3. I understand that lags of existing independent variables (such as Size, Leverage, Current Ratio etc. in my case) may have an influence on my dependent variable. However, including them in my model poses a challenge to me in the form of justifying my model specification on the basis of existent literature since most earlier papers only include contemporaneous values of these variables as regressors in the model. In such a case, do we build our own theoretical arguments to justify inclusion of lags of independent variables in our model?

            4. Could you please throw some light on the difference between weak and strong instruments?

            Thanks a lot!

            Comment


            • 1. It is true that the vast majority of empirical papers includes only 1 lag of the dependent variable as a regressor. It is hard to speculate for the reasons. The absence of second-order serial correlation (of the first-differenced error term) is only relevant for the validity of the instruments. The first lag of the first-differenced dependent variable is endogenous by construction. The second lag of the first-differenced dependent variable is exogenous if there is no second-order serial correlation and endogenous otherwise. In the latter case, we can still continue by using appropriate instruments (e.g. higher-order lags if there is no higher-order serial correlation).

              2. No, as indicated in 1., we still need to find valid instruments for both the first and the second lagged dependent variable and this depends on the degree of serial correlation. Adding a second lag of the dependent variable as a regressor does not guarantee that the error term no longer has second-order serial correlation.

              3. If you deviate from the existing literature, you can either justify this based on the arguments put forward in Kiviet (2020) or by finding theoretical arguments why lags should have a direct effect in the model.

              4. Weak instruments bias the coefficient estimates and result in large standard errors. They also lead to imprecise estimates of the optimal weighting matrix which negatively affects two-step estimation and overidentification tests. You can find a large literature on weak instruments.
              https://twitter.com/Kripfganz

              Comment


              • Thanks a lot, Prof. Sebastian for your insightful response! Please take care!

                Comment


                • Hi,

                  Although I understand that Sargan-Hansen test of instrument validity may not get satisfied due to multiple reasons including omitted explanatory variables. However, assuming that our model is correctly specified and supposing the Sargan-Hansen conditions are not still getting satisfied, is there a way to know which instruments are causing the problem. For instance, if we have a list of 3 endogenous variables (lagged Y, X1 and X2) for which we are using instruments, how do we figure out that specific endogenous variable which is causing the Sargan-Hansen test to show significant p-values?

                  Actually, I observed that xtabond2 shows us Sargan-Hansen test separately for different endogenous variables. Is there a similar mechanism in xtdpdgmm as well?

                  Comment


                  • Yes, these are Difference-in-Hansen tests or incremental Hansen tests. Please check out slides 48 to 52 of my 2019 London Stata Conference presentation.
                    https://twitter.com/Kripfganz

                    Comment


                    • Thanks Prof. Kripfganz. I read these slides.

                      1. I tried calculating the difference-in-Hansen tests for my model. However, I received the following error.

                      Code:
                      . xtdpdgmm Profitability4 L.Profitability4 Size2 AgeoftheFirm Leverage1 CurrentRatio SalesGro CapitalE
                      > xpenditure2 WPromoterSharesin1 AD_Totalremuneration , teffects twostep vce(cluster CompanyID) gmmiv(
                      > L.Profitability4 , lag(1 10) coll model(fodev)) gmmiv(Leverage1, lag(1 4)  model(fodev)) gmmiv(Curre
                      > ntRatio, lag(1 4)  model(fodev)) gmmiv(CapitalExpenditure2 , lag(1 4)  model(fodev)) gmmiv(AD_Totalr
                      > emuneration, lag(1 14)  model(fodev)) iv(Size2 AgeoftheFirm SalesGrowth WPromoterSharesin1, model(le
                      > vel)) nofootnote overid
                      
                      Generalized method of moments estimation
                      
                      Fitting full model:
                      Step 1         f(b) =  .00059667
                      Step 2         f(b) =  .22160483
                      
                      Fitting reduced model 1:
                      Step 1         f(b) =   .2036076
                      
                      Fitting reduced model 2:
                      Step 1         f(b) =  .17650799
                      
                      Fitting reduced model 3:
                      Step 1         f(b) =  .17006182
                      
                      Fitting reduced model 4:
                      Step 1         f(b) =  .18061305
                      
                      Fitting reduced model 5:
                      Step 1         f(b) =  .14821727
                      
                      Fitting reduced model 6:
                      Step 1         f(b) =    .221365
                      
                      Fitting reduced model 7:
                      Step 1         f(b) =  .20598217
                                    uniqrows():  3001  expected 1 arguments but received 2
                                    xtdpdgmm():     -  function returned error
                                       <istmt>:     -  function returned error
                      r(3001);
                      2. Further, you mention on slide 48 - "Incremental overidentifications tests are only meaningful if the reduced model already passed the overidentification test." Could you please elaborate the significance of this statement, preferably with the help of an example?

                      Thanks!
                      Last edited by Prateek Bedi; 14 Apr 2020, 05:15.

                      Comment


                      • 1. That error should not have happened. Could you please type
                        Code:
                        which xtdpdgmm
                        in Stata and tell me which version of the command you have installed?

                        2. Suppose you have 2 instruments Z1 and Z2. The incremental overidentification test is comparing 2 versions of the model, one with instruments Z1 and Z2, and another one with instrument Z1 but not Z2 (or the other way round). The null hypothesis to be tested is that Z2 (which is excluded in the second version of the model) is a valid instrument. But since Z1 is still part of both model versions, the test only makes sense if Z1 is a valid instrument. Otherwise the estimates from both model versions will be inconsistent and the test useless. In other words, you would first need to establish that Z1 is a valid instrument.
                        https://twitter.com/Kripfganz

                        Comment


                        • Thanks Prof. Kripfganz. Here is the output I got after typing which xtdpdgmm

                          Code:
                          . which xtdpdgmm
                          c:\ado\plus\x\xtdpdgmm.ado
                          *! version 2.2.1  20aug2019
                          *! Sebastian Kripfganz, www.kripfganz.de

                          Comment


                          • The latest version is 2.2.3. which has a few bugs fixed. May I ask you to update the command and try again whether the error message still occurs?
                            Code:
                            net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
                            I also recommend to regularly check for updates of community-contributed commands by typing
                            Code:
                            adoupdate
                            in Stata's command window.
                            https://twitter.com/Kripfganz

                            Comment


                            • I updated the command, Prof. Kripfganz. However, the error still persists:

                              Code:
                              . xtdpdgmm Profitability4 L.Profitability4 Size2 AgeoftheFirm Leverage1 CurrentRatio SalesGro CapitalE
                              > xpenditure2 WPromoterSharesin1 AD_Totalremuneration , teffects twostep vce(cluster CompanyID) gmmiv(
                              > L.Profitability4 , lag(1 10) coll model(fodev)) gmmiv(Leverage1, lag(1 4)  model(fodev)) gmmiv(Curre
                              > ntRatio, lag(1 4)  model(fodev)) gmmiv(CapitalExpenditure2 , lag(1 4)  model(fodev)) gmmiv(AD_Totalr
                              > emuneration, lag(1 14)  model(fodev)) iv(Size2 AgeoftheFirm SalesGrowth WPromoterSharesin1, model(le
                              > vel)) overid
                              
                              Generalized method of moments estimation
                              
                              Fitting full model:
                              Step 1         f(b) =  .00059667
                              Step 2         f(b) =  .22160483
                              
                              Fitting reduced model 1:
                              Step 1         f(b) =   .2036076
                              
                              Fitting reduced model 2:
                              Step 1         f(b) =  .17650799
                              
                              Fitting reduced model 3:
                              Step 1         f(b) =  .17006182
                              
                              Fitting reduced model 4:
                              Step 1         f(b) =  .18061305
                              
                              Fitting reduced model 5:
                              Step 1         f(b) =  .14821727
                              
                              Fitting reduced model 6:
                              Step 1         f(b) =    .221365
                              
                              Fitting reduced model 7:
                              Step 1         f(b) =  .20598217
                                            uniqrows():  3001  expected 1 arguments but received 2
                                            xtdpdgmm():     -  function returned error
                                               <istmt>:     -  function returned error
                              r(3001);
                              I hope there's nothing wrong with my command. Further, I have a query. As you mentioned in #131, incremental overidentification compares 2 versions of the model, one with instruments Z1 and Z2, and another one with instrument Z1 but not Z2. However, since I run only one model with all the instruments, how shall the software decide which instruments to keep and which ones to drop? Moreover, how do we read the output of difference-in-Hansen test that you mention on slide 52 of your presentation?

                              Comment


                              • Would it be possible for you to send me your data set by e-mail so that I can locate where exactly the problem is?

                                The command separates your instruments into different sets, one for each gmm() or iv() option. It then internally reestimates the model, leaving out one of those instrument sets at a time. The table you can see on slide 52 shows the Hansen test for the model with Z1 but without the respective Z2 in the first column and the difference to the Hansen test for the full model with Z1 and Z2 in the second column. The test in the first column should not reject the null hypothesis. Otherwise, the test in the second column is meaningless as explained before. The test in the second column is the test for the validity of Z2 given that Z1 is valid.
                                https://twitter.com/Kripfganz

                                Comment

                                Working...
                                X