System GMM command query

Muhammad Ibrahim Shah

Join Date: Nov 2020

Posts: 49
#1

System GMM command query

12 May 2022, 11:51

One of my friends has run a system GMM model and the result looks as follows. I have a confusion about the result. He applied unit root test and then found all the variables to be I(1). So he applied the system GMM on the first differenced data.

QS1:Since system GMM instruments endogenous variables by their lagged level and first differences, I am just wondering whether it is right to apply the system GMM on first differenced variables (rather than on raw data) just because we found that all variables are I(1). As per as my limited knowledge, we are to run system GMM on raw data, not on first differenced data even though we find all the variables to be I(1). Please correct me if I am wrong.

QS2: Do you think the models below are correct? In the result, ΔCOMPL means first differenced of an independent variable COMPL, ΔCOMPL(-1) means first differenced variable of lagged value of that independent variable (used as instrument). But shouldn't we have only one independent variable here out of COMPL? It should be ΔCOMPL(-1), right? Should we have both ΔCOMPL and ΔCOMPL(-1) ? Is there anything wrong in the command he applied? I applied system GMM myself once, but my knowledge is very limited but I saw some papers on world development (including this one Prichard, W., Salardi, P., & Segal, P. (2018). Taxation, non-tax revenue and democracy: New evidence using new cross-country data. World Development, 109, 295-312) and they have only one variable (either the variable is instrumented by the lagged value or if the variable is not endogenous, then the level value is given). So, should we have both ΔCOMPL and ΔCOMPL(-1) here? (ignoring the fact that he applied the model on first differenced data, although my gut says that it should be applied on raw data)

Table: System GMM result (dependent variable: GINI)
ΔGini(-1) -0.294***

ΔCOMPL -0.063***

ΔCOMPL(-1) -0.043**

Note: Here, Δ does not come from the system GMM command, it is written because all the variables were found to be I(1) and then he converted all the raw data to first differenced before running the system GMM command.

Last edited by Muhammad Ibrahim Shah; 12 May 2022, 11:59.
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

12 May 2022, 15:58

Please read the FAQ about how to ask questions.
Comment
Muhammad Ibrahim Shah

Join Date: Nov 2020

Posts: 49
#3

12 May 2022, 20:04

Hi, I read the FAQs (again). But I guess I am not getting what I wrote wrong here. Is it because of the table? I pasted it from a word document at first but it didn't look great. So I created a table here so that people understand what is going on. Nevertheless, the rules should not be a barrier to acquiring knowledge that I don't possess.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#4

13 May 2022, 05:11

If your variables are I(1), then the assumptions for applying system GMM on the raw data are typically violated: First differences are not valid instruments for the level model because they will be correlated with the unobserved "fixed effects". Difference GMM might still be applicable, although your instruments are likely to be week, which could lead to identification failure.

System GMM is usually applied on data with a small number of time periods. For such data sets, unit-root tests may not be very reliable, so I would interpret these test results with caution.

You possibly could run GMM on the first-differenced data if you expect a relationship between the growth rates of the variables.

There is nothing wrong with having both the current and lagged term of a regressor in the model. Often, this could even be desired to obtain a dynamically complete model. Omitted dynamics could otherwise invalidate your results.

https://www.kripfganz.de/stata/
1 like
Comment
Muhammad Ibrahim Shah

Join Date: Nov 2020

Posts: 49
#5

13 May 2022, 12:28

Hi Professor, Thank you for your reply. Just general questions following up:

1) If your variables are I(1), then the assumptions for applying system GMM on the raw data are typically violated: So let's say it's a different case and we have like 40 years of data for a single country. I apply the unit root test and find variables to be I(1). The next task is that of cointegration and long-run estimation via say FMOLS/DOLS method or any other time-series technique. Now before doing the cointegration and long run estimation, should I first convert my raw data to differenced data since my variables are I(1)?

2) System GMM is usually applied on data with a small number of time periods. For such data sets, unit-root tests may not be very reliable, so I would interpret these test results with caution: We have applied the system GMM to 25 years of data for 65 countries. Since N>T, we went for system GMM method but since the time series is long, we applied the panel unit root test. Do you think we should not apply the unit root test here?

3) You possibly could run GMM on the first-differenced data if you expect a relationship between the growth rates of the variables.: Okay, that I understand, but we (my coauthor to be more exact) applied GMM on the first differenced data because he found that variables are I(1). I guess this leads me back to my first question, are we supposed to convert the variables to first differenced form before running the main long-run regression or are we supposed to run cointegration and the main regression still on the raw data which we downloaded from say world bank? Let's say I have two independent variables. One is I(0) and another one is I(1), we will be running ARDL here for sure, but are we supposed to run the ARDL on raw data or differenced data? Perhaps a long running debate but I am still not quite sure.

4) There is nothing wrong with having both the current and lagged term of a regressor in the model: What would be the command in that case for xtabond2 be, please?

xtabond2 GINI L.GINI COMPL L.COMPL X2 L.X2 X3 L.X3 X4 LX4, gmm(GINI L.GINI, lag (2 2)) iv( COMPL L.COMPL) twostep orthogonal

Here GINI and COMPL have been already converted to differenced data before running xtabond2 command (as has been found from unit root test, according to my coauthor). So should it be the above command for getting what my coauthor got in the table of system-GMM regression?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#6

16 May 2022, 03:20

1. Things are a bit different when you have data for a single time series only. With panel data, you need to account for the "fixed effects", which is causing most of the trouble. With time series data, the unit-specific "effect" is just the regression constant. In a dynamic time series model, you do not need to transform the model into first differences. The dynamic nature of the model (i.e. the lagged dependent variable) ensures that a unit root does not cause a spurious-regression problem. If you transform the data into first differences first, then you can only analyze long-run effects for the first-differenced variables (i.e. growth instead of levels). This is typically not what you want. Regarding the interpretation of the effects, the same logic applies to panel data. However, in the latter case it might just not be possible to get reliable GMM results for the levels (raw) data when the variables are highly persistent/nonstationary. If you had sufficiently many time periods, you could use the xtdcce2 command to deal with such kind of data.

2. 25 years is neither very short nor really large. This is a borderline situation. It is hard to say how well the unit-root tests would perform in this case.

3. Transforming the variables into first differences prior to the estimation means that you are ignoring the error-correction mechanism for the variables in levels. In other words, you will not be able to obtain long-run effects for the untransformed variables.

4. Yes, you would simply add the lags of the variables to the list of independent variables (and instruments). Note that you would still need instruments for your X-variables.

More on dynamic panel data GMM estimation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Comment
Muhammad Ibrahim Shah

Join Date: Nov 2020

Posts: 49
#7

17 May 2022, 01:52

Thank you so much, professor.

1. But can xtdcce2 take care of endogeneity? I know it can take care of cross-sectional dependence but I don't know if it can take care of the endogeneity problem.

4. xtabond2 GINI L.GINI COMPL L.COMPL X2 L.X2 X3 L.X3 X4 LX4, gmm(GINI L.GINI, lag (2 2) X2 L.X2 X3 L.X3 X4 LX4) iv( COMPL L.COMPL) twostep orthogonal, is this correct now?
Comment

ΔGini(-1)	-0.294***
ΔCOMPL	-0.063***
ΔCOMPL(-1)	-0.043**

Announcement

System GMM command query

Comment

Comment

Comment

Comment

Comment

Comment