XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#556

25 May 2023, 02:43

If you have variables that are completely exogenous (which includes uncorrelatedness with the unobserved group-specific effects), then you can use them as standard instruments for the level model. This way, the correlation of the instruments with the regressors is maximized. This typically applies to time dummies and other dummy variables (e.g. industry dummies).

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#557

28 May 2023, 08:45

Dear Professor Sebastian,

Thanks again for your time and hard work! The assistance you show is extremely appreciated. I should acknowledge that I could not have pulled this off without your support, Professor! I still have the following questions, please!

1) Regarding your post #516 point 2.10) “Somewhere earlier in this thread I gave examples for different estimators, including the Ahn-Schmidt estimator.”
Thus, I searched for an example you gave for the nonlinear Ahn-Schmidt estimator, but could not know which example you meant. Therefore, I kindly ask you please for the one (the example for the nonlinear Ahn-Schmidt estimator) you referred to.

2) Is it normal in Stata to take a long time when applying the nonlinear Ahn-Schmidt estimator?

Is there any way to quicken performing the nonlinear Ahn-Schmidt estimator?

3) The dummy variable ‘cf’ that takes the value of 1 for the 3 years 2008, 2009, and 2010 (cf takes the value of 1 for the 3 years 2008, 2009, and 2010, while it takes the value of 0 for the years before 2008 and for the years after 2010). Thus, my question is: Is this dummy variable ‘cf’ considered a time-variant or a time-invariant variable?

4) My regression model is dynamic panel data. It also includes L2.y as a regressor (L2.y is the second lag of the dependent variable y). Thus, I have the following questions, please!

4.1) For the FOD estimator using your xtdpdgmm command, which lag should the instruments for the dependent variable y start from?

4.1.A) Is it right to use L.y as an instrument for the dependent variable y, given that L.y is already included in the regression model as a regressor?

4.1.B) Is it right to use L2.y as an instrument for the dependent variable y, given that L2.y is already included in the regression model as a regressor?

4.1.C) Do I have to start with L3.y as instruments for the dependent variable y, given that L.y and L2.y are already included in the regression model as regressors?

4.2) For the Difference GMM estimator, which lag should the instruments for the dependent variable y start from?

4.2.A) Is it right to use L2.y as an instrument for the dependent variable y, given that L2.y is already included in the regression model as a regressor?

4.2.B) Do I have to start with L3.y as an instrument for the dependent variable y, given that L2.y is already included in the regression model as a regressor?

5) The main independent variable of my regression model is L.x1 (L.x1 is endogenous). Also, my regression model includes L2.x1 as a regressor. Therefore, I have the following questions, please!

5.1) To apply the FOD estimator using your xtdpdgmm command, which lag should the instruments for the independent variable L.x1 start from?

5.1.A) Is it right to use L2.x1 as an instrument for the independent variable L.x1, given that L2.x1 is already included in the regression model as a regressor?

5.1.B) Do I have to start with L3.x1 as an instrument for the independent variable L.x1, given that L2.x1 is already included in the regression model as a regressor?

6) If the differenced instruments are used for the differenced model, will these differenced instruments be omitted?

7) Usually, it is written {T small; N small}, {T small; N large}; {T large; N small}; {T large; N large}. Thus, my question is: What is the criterion to decide whether they (i.e., T, N) are small or large?

The work you do is great and so appreciated. Many thanks once again for all that you do, Professor!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#558

28 May 2023, 11:02

1) Please see post #450 for examples of different estimators, including the Ahn-Schmidt GMM estimator.

2) Computational speed depends on many factors, including your hardware and the size of the data set. Because the Ahn-Schmidt GMM estimator is a nonlinear estimator and requires iterative optimization, it naturally takes longer than linear estimators. You should not normally include Blundell-Bond type instruments for the level model, which would otherwise create potential multicollinearity problems, which in turn can make it difficult for the numerical algorithm to converge.

3) This dummy variable is time varying; it changes its value from 0 to 1 and back to 0 at certain points in time.

4.1) Under absence of serial error correlation, the lags of the dependent variable can be instrumented starting with the first lag of the dependent variable in the FOD model.
4.1.A) Yes; see 4.1).
4.1.B) Yes; the lagged dependent variable is uncorrelated with the FOD-transformed error term (and so is the second lag). Hence, they qualify as instruments.
4.1.C) No; see above.

4.2) For difference GMM, the first suitable lag of the dependent variable as an instrument is lag 2.
4.2.A) Yes.
4.2.B) No.

5.1) If L.x1 is endogenous, then L2.x1 is a valid instrument in the FOD model.
5.1.A) Yes.
5.1.B) No.

6) I do not understand this question. In general, differencing the instruments (for the differenced model) still yields valid (and non-redundant) instruments.

7) There is no clear criterion. T is small when the Nickell bias due to the correlation of the lagged dependent variable with the fixed effects is "too big". This depends (among other things) on the (true but unknown) persistence of the dependent variable. The higher the persistence, the larger the bias, the larger T would be required to no longer be seen as small. It is even less obvious, when N should be considered small. If you make seemingly innocuous changes to your estimator - e.g. changing the maximum lag order for the instruments from, say, 4 to 5 - and your estimates change substantially, this is typically a sign that your N is small. With large N, such changes should hardly matter.

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#559

28 May 2023, 16:49

Dear Professor Sebastian,

Many thanks for your swift valuable response. Your cooperation and support are priceless. Indeed, saying “thank you very much” is not enough. I am very grateful to you for all your help and effort, Professor!

1) To use your xtdpdgmmfe command to apply Chudik-Pesaran (2022) estimator for unbalanced dynamic panel data with at least one endogenous regressor, I have the following questions, please!

1.1) Are the following codes correct?

A) xtdpdgmmfe y L2.y L(1/2).x1 L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8 mn cf cf*L.x1, exogenous(x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8 mn cf) predetermined(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 L2.y) endogenous(L(1/2).x1 cf*L.x1) initdev collapse teffects igmm vce(robust, dc) center

B) xtdpdgmmfe y L2.y L(1/2).x1 L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10 i.ind mn cf cf*L.x1, exogenous(x10 i.ind mn cf) predetermined(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 L2.y) endogenous(L(1/2).x1 cf*L.x1) initdev collapse teffects igmm vce(robust, dc) center

1.2) If none of the previous codes is correct, what is the correct code I have to use in order to implement Chudik-Pesaran (2022) estimator using your xtdpdgmmfe command?
Are there other codes which are more appropriate to apply Chudik-Pesaran (2022) estimator using your xtdpdgmmfe command? Where: y is the dependent variable; L2.y is the second lag of the dependent variable as a regressor (L2.y is predetermined); L.x1 is the independent variable (L.x1 is endogenous); L2.x1 is the first lag of the independent variable L.x1; the control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined; the control variable x10 (firm age) is exogenous; ind is industry dummies; mn is country dummies; cf is a dummy variable that takes the value of 1 for the 3 years 2008, 2009, and 2010; cf*L.x1 is an interaction between the dummy variable cf and the independent variable L.x1.

1.3) To use your xtdpdgmmfe command, do I have by myself to type the ‘exogenous’ option and the ‘endogenous’ option, open their corresponding brackets, and fill in them?

2) When using your xtdpdgmmfe command, I have the following questions, please!

2.1) Can I specify the dummy variables (such as industry dummies, country dummies, …) as exogenous variables and put them in the brackets of the ‘exogenous’ option?

2.2) Does the xtdpdgmmfe command instrument the dummies (industry dummies, country dummies, …) in the differenced model or in the level model?

2.3) Does the xtdpdgmmfe command use the differenced instruments or the level instruments for the dummies (industry dummies, country dummies, …)?

3) We can modify the xtdpdgmm command line to try several trials. Thus, can we do the same when using the xtdpdgmmfe command? If so, how?

4) When applying the two-step System GMM estimator using your xtdpdgmmfe command, it uses model(diff). Thus, how to amend the code of the xtdpdgmmfe command in order to use model(fod) instead of model(diff) to apply the two-step System GMM estimator?

5) When applying the two-step System GMM estimator using your xtdpdgmmfe command, can I include the option ‘orthogonal’ in the code?

6) To apply Hayakawa, Qi, and Breitung (2019) estimator for unbalanced dynamic panel data with at least one endogenous regressor, I have the following questions, please!

6.1) Are the results of Hayakawa, Qi, and Breitung (2019) estimator identical regardless whether I use your xtdpdgmm command or your xtdpdgmmfe command?

6.2) To use your xtdpdgmmfe command to apply Hayakawa, Qi, and Breitung (2019) estimator, will I lose an additional observation for each firm?

6.3) When using your xtdpdgmmfe command to apply Hayakawa, Qi, and Breitung (2019) estimator, do I have to specify the option curtail() for the exogenous variables, and another option curtail() for the endogenous variables, and another curtail() option for the predetermined variables? i.e., Do I have to specify the option curtail() three times when I have three variables’ classifications?

6.4) To apply Hayakawa, Qi, and Breitung (2019) estimator using your xtdpdgmmfe command, are the following codes correct?

A) xtdpdgmmfe y L2.y L(1/2).x1 L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10 i.ind mn cf cf*L.x1, exogenous(x10 i.ind mn cf) predetermined(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 L2.y) endogenous(L(1/2).x1 cf*L.x1) initdev collapse curtail(1) orthogonal nonl teffects onestep

B) xtdpdgmmfe y L2.y L(1/2).x1 L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10 i.ind mn cf cf*L.x1, exogenous(x10 i.ind mn cf) curtail(0) predetermined(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 L2.y) curtail(1) endogenous(L(1/2).x1 cf*L.x1) curtail(2) initdev collapse orthogonal nonl teffects onestep

6.5) If the previous codes are incorrect to apply Hayakawa, Qi, and Breitung (2019) estimator using your xtdpdgmmfe command, what do I have to add, delete, amend in the previous codes to apply Hayakawa, Qi, and Breitung (2019) estimator using your xtdpdgmmfe command?

Are there other codes which are more appropriate to apply Hayakawa, Qi, and Breitung (2019) estimator using your xtdpdgmmfe command? Where: y is the dependent variable; L2.y is the second lag of the dependent variable as a regressor (L2.y is predetermined); L.x1 is the independent variable (L.x1 is endogenous); L2.x1 is the first lag of the independent variable L.x1; the control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined; the control variable x10 (firm age) is exogenous; ind is industry dummies; mn is country dummies; cf is a dummy variable that takes the value of 1 for the 3 years 2008, 2009, and 2010; cf*L.x1 is an interaction between the dummy variable cf and the independent variable L.x1.

7) If one of the control variables is measured by the natural logarithm, do we consider that control variable endogenous or predetermined?

8) The main independent variable of my regression model is L.x1 (L.x1 is endogenous). Also, my regression model includes L2.x1 (L2.x1 is the first lag of the independent variable L.x1) as a regressor. Therefore, to use your xtdpdgmmfe command, do I have to specify L2.x1 in the ‘endogenous’ option or in the ‘predetermined’ option?

Your patience, support and effort are highly appreciated, Professor!
Comment
Mugi Jang

Join Date: May 2023

Posts: 14
#560

28 May 2023, 21:23

Dear Professor Sebastian,
I hope you're doing well!
with using ab.dta I am replicating the results of Blundell-Bond (1998), Table 4, col. III and IV by xtabond2 and xtdpdgmm

xtabond2 n l.n l(0/1).(w k) yr1978-yr1984, iv(yr1978-yr1984, eq(diff)) gmm(n w k, lag(2 .) eq(diff)) noleveleq h(2) robust

1. I got the following output

some output is dropped

Sargan test of overid. restrictions: chi2(79) = 125.19 Prob > chi2 = 0.001
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(79) = 88.80 Prob > chi2 = 0.211
(Robust, but weakened by many instruments.)

and

xtdpdgmm n l.n l(0/1).(w k) yr1978-yr1984, iv(yr1978-yr1984, model(diff) diff) gmmiv(n w k, lag(2 .) model(diff)) nolevel vce(r) w(ind) overid
estat overid

2. I got the following output

some output is dropped

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

1-step moment functions, 1-step weighting matrix chi2(79) = 125.1925
note: * Prob > chi2 = 0.0007

1-step moment functions, 2-step weighting matrix chi2(79) = 103.4848
note: * Prob > chi2 = 0.0337

* asymptotically invalid if the one-step weighting matrix is not optimal

Here is my question
xtabond2's Sargan test statistic ,chi2(79) = 125.19, matches to xtdpdgmm's 1-step weighting matrix chi2(79) = 125.1925
but xtabond2's hansen test statistic,chi2(79) = 88.80 , does not appear in xtdpdgmm's result
As I understand, under heteroscadasticity hansen test holds and sagan test only under homoscadasticity.
and xtabond2's hansen test statistic,chi2(79) = 88.80 corresponds to that of the paper

then how can I get the hansen test statistic chi2(79) = 88.80 with xtdpdgmm?

Always Thanks lot!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#561

29 May 2023, 03:58

Zainab Mariam

1.1.A) xtdpdgmmfe has a lags() option, which allows you to directly specify the number of lags of the dependent variable used as regressors. Thus, use option lags(2) and remove L2.y from the list of regressors and the list of predetermined variables. The Chudik-Pesaran estimator does not allow for endogenous regressors! What you get in this case is a first-difference GMM estimator (like Arellano-Bond, but with first-differenced instruments instead of level instruments for the first-differenced model). If your industry dummies and country dummies are time invariant, then you cannot/should not/need not include them in this specification; the unobserved fixed effects (i.e. first-differencing of the model) take care of them.
1.1.B) As before.

1.2) You would need to classify all variables as either exogenous or predetermined in order to use the Chudik-Pesaran estimator.

1.3) Yes; this classification is always your responsibility.

2.1) Yes, but as indicated above these dummies (if time invariant) should really only be included when some instruments refer to the untransformed level model. This is not the case for the Chudik-Pesaran estimator.

2.2) The xtdpdgmmfe command does not offer special treatment for dummy variables; they are treated the same way as any other variable. As the suffix fe of the command suggests, this command takes a fixed-effects approach; this means that coefficients of time-invariant variables cannot be identified. For a more flexible approach, you would need to use the main xtdpdgmm command, but xtdpdgmmfe might give you an idea about how to specify the xtdpdgmm command.

2.3) See 2.2.; whether differenced or level instruments are used depends on other options. In particular, option initdev requires differenced instruments. As always, you can see the full list of instruments and their transformations in the list below the regression output.

3) I do not understand this question.

4) Use option orthogonal.

5) Yes.

6.1) I am not sure I understand this question. xtdpdgmmfe is just a wrapper for xtdpdgmm. Based on your input, xtdpdgmmfe constructs the more complicated command line for xtdpdgmm (which is also displayed above the regression output) and then executes the latter.

6.2) Yes; this is a disadvantage of this estimator.

6.3) No; you can only specify the curtail() option once. If you would like to curtail lag orders differently for different sets of instruments, you would need to use the more flexible xtdpdgmm command.

6.4.A) As argued above, you should not include time-invariant dummy variables when there is no level model. Also, use the lags() option instead of specifying the lags of the dependent variable manually. Endogenous variables can be used here.

6.4.B) As mentioned in 6.3), you cannot include the curtail() option multiple times.

6.5) See 6.4) and 6.3).

7) Whether to transform a variable into natural logs has nothing to do with its classification as endogenous or predetermined.

8) A variable is classified as endogenous if it is allowed to be correlated with the contemporaneous error term (and all lagged errors). If this is true for L1.x, then L2.x could possibly still be classified as exogenous. L1.x being endogenous means that any show in the current period is anticipated by x in the previous period (because x is lagged). You would then still need to decide, if this shock can even be anticipated by x two periods ahead. It is up to you to make this judgement.

https://twitter.com/Kripfganz
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#562

29 May 2023, 04:15

Mugi Jang

The coefficient estimates in column (3) of Table 4 in Blundell and Bond (1998, Journal of Econometrics) can indeed be replicated with the code you provided. Note that this is a one-step GMM estimator. The "Sargan" test statistic reported in that table is actually not the one-step Sargan test but the two-step Hansen test statistic. To see this, re-estimate the model with xtdpdgmm using the twostep option.

The fact that xtabond2 reports the two-step Hansen test after a one-step estimation is somewhat confusing (even though it follows the appropriate logic that the one-step Hansen test is asymptotically invalid and therefore the two-step Hansen test should be used; but then again the two-step estimator should probably be used in the first place).

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#563

29 May 2023, 13:10

Dear Professor Sebastian,

Thank you very much for your swift useful reply. I am very grateful to you for all your support and effort, Professor! If I may follow up with your response, please!

1) Regarding your post #561 point 8) “A variable is classified as endogenous if it is allowed to be correlated with the contemporaneous error term (and all lagged errors). If this is true for L1.x, then L2.x could possibly still be classified as exogenous. L1.x being endogenous means that any show in the current period is anticipated by x in the previous period (because x is lagged). You would then still need to decide, if this shock can even be anticipated by x two periods ahead. It is up to you to make this judgement.…”.

I think you meant endogenous instead of exogenous.

2) Regarding your post #561 point 1.1.A) “… The Chudik-Pesaran estimator does not allow for endogenous regressors! What you get in this case is a first-difference GMM estimator (like Arellano-Bond, but with first-differenced instruments instead of level instruments for the first-differenced model). ...”. And regarding your post #479 point 10) “… the Chudik-Pesaran estimator requires all variables to be either strictly exogenous or predetermined. xtdpdgmmfe "solves" this issue by switching to a specific version of a difference GMM estimator when endogenous variables are present.”. And regarding your post #481 point 2.1) “xtdpdgmmfe automatically selects the appropriate instruments / moment conditions (and therefore the relevant estimator) corresponding to the chosen assumptions.”.

Thus, to check my understanding, the Chudik-Pesaran (2022) estimator can be applied for unbalanced dynamic panel data with at least one endogenous regressor just by using your xtdpdgmmfe command. Am I right?

3) Is it right or wrong not to include the initdev option in the code of the Chudik-Pesaran (2022) estimator?

4) Regarding your post #561 point 6.4.A) “As argued above, you should not include time-invariant dummy variables when there is no level model. …”.

Thus, my question is: How to have a level model when applying Hayakawa, Qi, and Breitung (2019) estimator using your xtdpdgmmfe command? (the code does not include the option nolevel).

5) How to decide the number I should specify in the curtail() option?

6) I read in some research that it is a good idea to lag all explanatory variables one period for endogeneity concerns. Thus, what is your opinion on this idea?

7) If one of the explanatory variables is endogenous, and some of the explanatory variables are predetermined. Then, if all these explanatory variables are lagged one period. Thus, my question is: Does the classification of those lagged explanatory variables still endogenous and predetermined, respectively? Or does the classification of those lagged explanatory variables become predetermined and exogenous, respectively? i.e., Does lagging the variables transform their classification or does lagging the variables have no effect on the variables' classification?

8) Also, I read in some research that when all variables on the right-hand side of the regression model are lagged one-time period, hence, they are assumed to be predetermined rather than endogenous.
Thus, what is your opinion on this idea?

9) When using your command xtdpdgmmfe, should all lags of the dependent variable y be considered predetermined? Can I consider the deep lags of the dependent variable (i.e., L2.y and deeper lags) exogenous?

Even though I may not say it all the time, I do appreciate all that you do, Professor! Much obliged!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#564

30 May 2023, 04:24

1) Yes.

2) No; the Chudik-Pesaran estimator does not allow for endogenous regressors.

3) initdev is a necessary option for the Chudik-Pesaran estimator.

4) Hayakawa, Qi, and Breitung (2019) do not consider a level model for their estimator. xtdpdgmmfe does not have a nolevel option (only xtdpdgmm does); it automatically decides whether to include instruments for the level model or not, based on the other options specified. If you execute the xtdpdgmmfe command for this estimator, you will see that the nolevel option has been set for the implied xtdpdgmm command line.

5) There is unfortunately no clear guidance about that. As mentioned earlier, with large enough N, it should not matter too much. Personally, I find some value in the vicinity of 4 quite reliable.

6) My personal opinion is that lagging of the regressors is overused in empirical research. If you think that there is indeed no contemporaneous effect, but any effect take some time (1 period) to materialize, then sure: go for it. If it is reasonable to assume that there is a contemporaneous effect, but you lag the variable because of endogeneity concerns, then most likely as a result your model is misspecified. You can then debate whether the resulting misspecification is a smaller or larger problem than the original endogeneity problem. Personally, I think lagging of the regressors often does more harm than good. Endogeneity in the context of GMM should normally be dealt with by using lagged instruments, not lagged regressors.

7) In the light of point 6), in a misspecified model it is very much unclear what happens. The omitted contemporaneous regressors essentially create an omitted-variables bias, which cannot normally be dealt with by using lagged instruments in the usual way. In short, there is no general answer to this question.

8) Again, under model misspecification there is no general answer. You would first need to decide what the assumed timing of the effects is. This gives you the lag structure. Then, again based on economic theory, you can decide whether these variables are endogenous or predetermined. If there is a good theoretical justification for lagging (due to delayed effects), then it is often also easier to assume that those lagged regressors are at least predetermined and not endogenous, because endogeneity would require some kind of anticipation effects (which might still be reasonable depending on the context). In any case, the decision should always be made on economic theory, not a technical argument based on lag orders.

9) Effectively, lags of the dependent variable are predetermined if there is no assumed serial error correlation. If you specify lags with the lags() option, xtdpdgmmfe does this classification automatically for you.

https://twitter.com/Kripfganz
Comment
Mugi Jang

Join Date: May 2023

Posts: 14
#565

05 Jun 2023, 05:44

Dear Professor Sebastian,
how can we get the
Goodness of fit Panel Data ?
especially for dynamic model?
here is some code some one suggested
xtreg y x, fe i(unit)
add the following lines of code
egen ybar = mean(y) gen y2 = (y - ybar)^2 predict resid, e gen e2 = resid^2 drop resid egen sse = sum(e2) egen sst = sum(y2) gen r2 = 1 - sse/sst sum r2

Last edited by Mugi Jang; 05 Jun 2023, 05:49.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#566

05 Jun 2023, 05:55

An R-squared is not very meaningful for dynamic panel models with endogenous regressors (or, more generally, any IV/2SLS/GMM estimator); see the following Stata FAQ for context: https://www.stata.com/support/faqs/s...least-squares/

The xtdpdgmm postestimation command estat mmsc provides some criteria that allow you to compare the fit across models (similar to the conventional AIC/BIC criteria).

https://twitter.com/Kripfganz
Comment
Neyati Ahuja

Join Date: Jun 2022

Posts: 6
#567

30 Jun 2023, 06:50

Dear Prof. Sebastian

I have a balanced panel dataset of 1,400 MNEs for over 10 years from 2010-20.
My model is dynamic and have also introduced lag of dependant variable as explanatory variable. For the empirical analysis I used System GMM (through xtdpdgmm).

I have a query regarding the post estimation test:

In respect of Sargan Hansen 2 step test few of the p value obtained are 0.52, 0.67, 0.35, 0.48 and so on for different models designed.
No. of moment condition linear range from 25, 27, 35 and so on.

AR(2) Arellano Bond p value is greater that 0.10 around 0.25, 0.27,0.30.

I wanted to check regarding my Sargan Hansen test result. I have read few research articles that report similar p-value for the Sargan Hansen Test.

Thank You.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#568

30 Jun 2023, 07:04

I am not sure what exactly your question is. The reported p-values for the Hansen test look all fine, and so does the AR(2) test.

https://twitter.com/Kripfganz
Comment
Neyati Ahuja

Join Date: Jun 2022

Posts: 6
#569

30 Jun 2023, 07:57

Thank you sir
Actually my doubt was in reference to Roodman (2009) which states p value of sargan hansen test shoul be from 0.10 to 0.25.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#570

30 Jun 2023, 08:15

I would not focus too much on these specific thresholds. Roodman (2009) correctly emphasizes that very large p-values (in extreme situations virtually 1.000) are often an indication of first-stage overfitting - i.e., using too many instruments relative to the sample size. Given your relatively large cross-sectional dimension and moderate time dimension, this should not be a concern in your case. We also should not "accept" the specification right away when the p-value barely exceeds the conventional significance level (say, 5%) because the consequence of incorrectly "accepting" a specification (type-II error) is usually more severe than that of incorrectly rejecting a specification (type-I error).

If you compare the two 2009 papers by Roodman in the Stata Journal and the Oxford Bulletin of Economics and Statistics, they are actually contradicting each other regarding the 0.25 threshold. The Stata Journal article suggests to view values higher than 0.25 as potential signs of concern, while the Oxford Bulletin article recommends that p-values "as high as" (i.e., all p-values lower than) 0.25 as signs of concern. My take: Forget about strict threshold values.

https://twitter.com/Kripfganz
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment