XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

alessandro gst

Join Date: Jun 2019

Posts: 62
#91

17 Oct 2019, 11:25

Dear Prof. Kripfganz,
thank you very much for your kind reply it was indeed very helpful.
I have a few further doubts and it would be very useful if you could illuminate me on these.
My model which I have now refined looks something like this:

Code:

xtdpdgmm L(0/1).y l.x $controls_lag i.Year, model(fodev) collapse gmm(y x $endogen_controls, lag(1 4)) iv(l.$exogen_controls i.Year) igmm vce(r) small noconstant igmmiterate(200)

Where: y is my dependent variable, x my variable of interest, $controls_lag a set of relevant controls and i.Year are year dummies. I then use lags instruments for all of my endogenous variables (y x $endogen_controls) and use normal instruments for my exogenous variables and year dummies. Do note that while endogenous regressors are not lagged, exogenous regressors are.
my questions are the following:
1) Is my specification for the exogenous variables instrument correct or should i specify iv(l.$exogen_controls i.Year, model(fodev)) or iv(l.$exogen_controls i.Year, d)? As they are exogenous variables I am satisfied if they simply instrument themself. However, I would ideally make sure that my instruments are uncorrelated with unobserved unit-specific heterogeneity.
2) I would also like to estimate the same model specified using an iterated System GMM. Excluding the SYS GMM assumption, the model should be the same. Would this be the correct specification?

Code:

xtdpdgmm y l.x $controls_lag i.Year, model(fodev) collapse gmm(y x $endogen_controls, lag(1 4)) iv(l.$exogen_controls i.Year) gmm(y x $endogen_controls, lag(1 1) diff model(level)) gmm(l.$exogen_controls i.Year, lag(0 0) diff model(level)) igmm vce(r)

3) What are the consequences of specifying vce(r, model(fodev))? I would like to make my SE robust to intragroup correlation is this the specification to use?

4) I have an unbalanced panel. There are over 93 countries with some observations on the dependent variable, but some years are missing. For some reasons however, the number of groups reduces to 76 when I estimate the model. Do you have any idea of why this could be the case?
5) Using the same unbalanced dataset, the estimated model reports negative t statistics for some of the variables (with the p values close to one (.999, .998)) is this correct? Should I be concerned?
6) Finally, I have a general question that relates to lag length and the appropriateness of GMM. Another possibility to study address my research question would be to use a panel with 43 countries over 28 years. Do you believe it is credible that, with such a small number of groups and a relatively large number of time points, the assumption of GMM work in this context? Which alternative approach you would suggest? The follow-up question is how to deal with lag length. With such a small number of groups, there is a very limited number of lags that we can use to avoid instrument proliferation (1 or 2). Do you have any suggestion to determine lag length when you do not have strong theoretical reasons?

I thank in advance very much for your great helpfulness

Best regards
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#92

18 Oct 2019, 03:06

A new update to version 2.2.3 is available on my personal website that fixes the bug kindly reported by Tim yesterday. The bug could occur in some unbalanced panel data sets when the sample is further restricted with an if-condition.

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#93

18 Oct 2019, 03:31

Alessandro:

1) The specification looks reasonable. By using the global option model(fodev), all instruments refer to the model in forward-orthogonal deviations. These forward-orthogonal deviations remove the unobserved unit-specific heterogeneity, so you do not need to worry about it any more.

2) Assuming that the first-differences of your regressors are uncorrelated with the unobserved unit-specific heterogeneity, which is the standard assumption needed for system GMM, the specification looks good. (Just watch out that the number of instruments does not become too large.) Also, I would treat the time dummies differently. Instead of using their differences as instruments, I would use them untransformed in the level model, i.e. iv(i.Year, model(level)). You could then even remove them from the instruments for the forward-orthogonal deviations model. Once you account for them in the level model, they are typically redundant in the transformed model. To achieve this, instead of explicitly specifying time dummies manually, you could just use the teffects option of xtdpdgmm.

3) With vce(robust), the vce suboption model(fodev) has no effect. It is only relevant with the conventionally computed standard errors for the one-step estimator. In the latter case, it specifies that the variance of the error term shall be computed from the residuals of the model in forward-orthogonal deviations. With the new update that I just released (see my previous post), it is now no longer possible to specify the model() suboption with vce(robust).

4) The likely reason is that some of your groups do not have sufficiently many consecutive observations. With lagged controls and forward-orthogonal deviations, any group do be included needs to have at least 3 consecutive observations.

5) The t-statistic will be negative if the estimated coefficient is negative. If it is close to zero, then the p-values will be close to 1. There is nothing wrong with that per se. It just means that these variables are probably not relevant for your model and you might consider to exclude them.

6) Your concerns about instrument proliferation in this setup are justified. I unfortunately do not have a good practical argument for the choice of the lag length. 43 countries and 28 years is a situation that is somewhere between the two worlds (large-N, small-T versus small-N, large-T). If you take action against instrument proliferation, GMM should in principle still work, although with just 43 countries it may be quite sensitive to the specification. But as long as you have endogenous covariates, you cannot really avoid using instrumental variables.

https://www.kripfganz.de/stata/
2 likes
Comment
alessandro gst

Join Date: Jun 2019

Posts: 62
#94

18 Oct 2019, 05:32

Dear Prof. Kripfganz,
Thank you very much for your extremely helpful support. Your suggestions are all well taken and you really clarified a lot both in term of model specification (43 countries over 28 years) and in term of syntax (vce(r)).

I have just a couple of follow up points that are not completely clear.
Following your suggestions, the model I am estimating is the following:

eststo: xi: xtdpdgmm L(0/1).y l.x $controls_lag, model(fodev) collapse gmm(y x $controls_endogen, lag(1 3)) iv(l.$controls_exogen) igmm vce(r) small noconstant teffects igmmiterate(100)

First, the estimated coefficients appear to be very sensitive to the maximum number of iteration specified. For instance, my estimated coefficient for the x goes from -2.8 with 100 iterations, to -0.2 with 200 iterations, to .2 with 300 iterations. The issue is that without a limit the number of iteration become then of thousands so it becomes computationally very intensive. Do you know why this might be and how I could address this issue?

Second, I believe that my t statistic and standard errors are wrongly calculated. As you can see, in the picture attached they are all extremely close to 1. I might have a very poorly specified model, but I doubt is that, because with all other estimation strategies I am using, including one and two-step GMM, SE and p-values are more "normal" where some are significant and other not. Could it be that I am doing something wrong with the syntax?

I thank you again for your great and in-depth help

best
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#95

18 Oct 2019, 06:20

I just believe that the iterated GMM estimator does not do a good job when you have a large number of instruments relative to the number of groups. Estimating the weighting matrix becomes very difficult in such a situation. The convergence problems of the iterated GMM estimator, the sensitivity of the estimates, and the unreasonably large standard errors are all symptoms of this same problem. I recommend to stick to the two-step estimator (or even the one-step estimator given that 63 groups is very small and asymptotic efficiency is less of a concern here) and/or further reduce the number of instruments, e.g. by using the collapse option.

https://www.kripfganz.de/stata/
2 likes
Comment
alessandro gst

Join Date: Jun 2019

Posts: 62
#96

18 Oct 2019, 11:34

Thank you very much for clarifying this
Comment
Francesco Nicoli

Join Date: Oct 2019

Posts: 2
#97

29 Oct 2019, 06:43

Dear prof. Kripfganz, dear Sebastian:
my coauthor & myself are working on a paper requiring the use of a fractional dynamic panel model, i.e. a dynamic panel model with fractional dependent variable. The balanced panel dataset (on policy agendas) is composed by 40 time periods, 17 Groups, for a total of N=680. In the late stages of the review process, we have been involved in discussions directly with the editors with regard to the model. To cut short, our previous models were (rightly) considered as biased because of the lagged DepVar. We were leaning towards the use of Paul Allison's xtdpdml (https://statisticalhorizons.com/lagg...dent-variables), but that won't work if Time > Groups. Would you advise us to try out XTDPDGMM? Does it support fractional DVs?

Many thanks for your time and for having made your model available.

All best
Francesco Nicoli
University of Gent
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2571
#98

29 Oct 2019, 07:17

xtdpdgmm estimates a linear model. You can certainly use it as an approximation for fractional outcome variables but there is no guarantee that the predicted values all fall into the [0,1] interval.

The problem with many time periods relative to the number of groups is that the number of moment conditions / instruments can quickly become too large. While you can restrict the number of instruments, I am sceptical whether the usual model specification tests are reliable because there distributions are derived under asymptotics where the number of groups tends to infinity. You could possibly still estimate the model but it becomes hard to justify the validity of the instruments based on statistical tests (maybe unless you feel comfortable to assume homoskedasticity such that - under the anyway required absence of serial error correlation - the one-step "difference GMM" estimator already uses an optimal weighting matrix and clustered standard errors are not required).

https://www.kripfganz.de/stata/
Comment
Francesco Nicoli

Join Date: Oct 2019

Posts: 2
#99

29 Oct 2019, 09:00

Okay, many thanks for the fast reaction. I will try it out and present it along other estimations, including an error-correction model & the potentially biased dynamic panel. The previous three models all pointed in the same direction, so I have some confidence on the underlying robustness of the exercise.
Thanks again
Francesco
Comment
ALKEBSEE RADWAN

Join Date: Mar 2019

Posts: 240
#100

08 Dec 2019, 05:32

Originally posted by Sebastian Kripfganz View Post

The gmmiv() option of xtdpdgmm has the suboptions lagrange() and collapse. Both can be used to reduce the number of instruments. Please see the help file for details.

Your data set with a very small cross-sectional dimension and a larger time dimension is not ideal for this kind of GMM estimators that are designed for small-T, large-N situations. If you nevertheless want to use it, I recommend not to use the noserial option in this setting.

Hello Kripfganz,
thanks for your effort in responding to all questions.
I have a question about how to use the command xtabond2 in order to run the sys-GMM model. here i will show you my main model as follows:

HTML Code:

logit fraud_dummy indp_pay_log num_of_id_degree_expert num_of_id_financial_expertise Fem_indep av_id_age av_id_shareholding board_size the_number_of_shareholders_meeti ceo_duality board_ind soe big4 roa f_size btm1 eps1 loss_total i.year i.Sic

I want to apply GMM with this model , so i did the following

HTML Code:

xtabond2 fraud_dummy l.fraud_dummy indp_pay_log num_of_id_degree_expert num_of_id_financial_expertise Fem_indep av_id_age av_id_shareholding board_size ceo_duality board_ind soe big4 roa btm1 eps1 loss_total, gmm( indp_pay_log local_pay_instrumental, lag(0 2)) iv( l.fraud_dummy indp_pay_log num_of_id_degree_expert num_of_id_financial_expertise Fem_indep av_id_age av_id_shareholding board_size ceo_duality board_ind soe big4 roa btm1 eps1 loss_total, eq(level)) twostep

please Am i right or no?
in case it not correct kindly type the right command.
I know it is not an appropriate way to ask such a favor but i m really confused
please help

Last edited by ALKEBSEE RADWAN; 08 Dec 2019, 05:39.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#101

08 Dec 2019, 10:40

If your dependent variable is a dummy variable, then xtabond2 is probably the wrong command. xtabond2 should provide unreliable results for a categorical dependent variable.
You need to search for something like "dynamic panel logit model". I have no experience with these models, so you might ask a new question instead so that people with experience with these models can guide you further.
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#102

24 Dec 2019, 06:38

Hello,

I use the following command to find out the determinants of corporate cash holdings for an unbalanced dataset of 1696 firms over 16 years.

Code:

xtdpdgmm Cash L.Cash Size Leverage Liquidity Profitability, teffects twostep vce(cluster CompanyID) gmmiv(L.Cash, lag(1 1) model(fodev)) gmmiv(Leverage Liquidity, lag(1 4) collapse model(fodev)) iv(Size Profitability, model(level)) nofootnote

Then, I use the following command to obtain the predicted values of dependent variable.

Code:

predict PCash if e(sample)

In order to check whether predicted values are calculated correctly, for a given firm-year, I multiply actual values of explanatory variables with their respective partial slope coefficients and add them together along with intercept term and respective year dummy coefficient. However, my calculated value and Stata's predicted value do not match exactly. There is a difference after two decimal places. I would like to potential reasons for this inconsistency.

Thanks and Regards

Last edited by Prateek Bedi; 24 Dec 2019, 07:11.
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#103

27 Dec 2019, 03:17

Eagerly waiting for a response!
Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2571

#104

27 Dec 2019, 04:04

I cannot replicate your problem. With the following example, predict gives me exactly the same predicted values as when I calculate them manually:

Code:

. webuse abdata
. xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(0 3) c m(fod)) gmm(L.n w k, l(0 0) d c m(level)) two vce(r) teffects
. predict yhat if e(sample)
. gen yhat_manual = _b[L1.n] * L1.n + _b[w] * w + _b[k] * k + _b[1978.year] * 1978.year + _b[1979.year] * 1979.year + _b[1980.year] * 1980.year + _b[1981.year] * 1981.year + _b[1982.year] * 1982.year + _b[1983.year] * 1983.year + _b[1984.year] * 1984.year + _b[_cons] if e(sample)

Code:

. sum yhat yhat_manual

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        yhat |        891    1.043574    1.426621  -2.375833   5.083724
 yhat_manual |        891    1.043574    1.426621  -2.375833   5.083724

Did you make sure that the coefficients of the year dummies are only added to the predictions of the respective year?

https://www.kripfganz.de/stata/

Comment

Prateek Bedi

Join Date: Sep 2018

Posts: 199
#105

27 Dec 2019, 08:26

Dear Prof. Kripfganz,

Once again, thanks a ton for your precise and brilliant response. I followed your command and the results match now. You are doing a great service. Warm season's greetings and a very happy, healthy and blessed new year to you and your dear ones. Please keep this great work going!!
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment