System GMM - Time dummies

Hanna Lindstrom

Join Date: Apr 2017

Posts: 25
#16

21 Jun 2017, 07:36

Dear Mr Kripfganz,
Thank you for your reply, I believe I understand your point now.
Digging deeper into the System GMM framework, there has surfaced more questions related to this estimator and my specific data set. I hope you have the time and effort to reply.

My N is still 288 and my T spans from 2003-2014 (year dummies corresponding to dyy5-dyy16).
I have an endogenous variable (a lagged dependent variable) in my gmmstyle parenthesis and treat remaining variables as exogenous. My code look like following:

Code:

xtabond2 y L.y L.x1 x2 dzone* x3 x4 L.x5 x6 x7 x8 dyy8-dyy16 [aweight=CAB_ipo], gmmstyle(L.y, laglimit(1 5)) ivstyle(L.x1 x2 x3 x4 L.x5 x6 x7 x8) ivstyle(dyy8-dyy16 dzone*, eq(level)) artests(3) twostep cluster(lk) h(2) orthogonal

dzone* are a set of dummy variables for vegetation zone each municipality belongs to (time invariant) dyy8-dyy16 are a set of year dummies where dyy8=2006. [aweight=CAB_ipo] are analytical weights.

Exclusion of year dummies and Orthogonality:
In your post #12 in this Stata thread (https://www.statalist.org/forums/for...r-panel-models) your are suggesting that the user drops the first two years due to lags and one more year to avoid the "dummy trap". However, when using the SystemGMM I make use of the fact that I use lagged differences as instruments for levels. Since I have L.y as endogenous, I would then have D.L2.y as instrument for levels (D.L2.y = L2.y - L3.y). Would this not imply that my first T yo include as a year dummy is T=4?
I am using the orthogonality option, since my panel has gaps in my "main" explanatory variable. Does the orthogonality option change anything about which years I should exclude, considering it makes use of all available future observations?

Eq(diff) or eq(level)
What is the rationale behind using one or the other (or both) of these options within the gmm(.) parenthesis? I have read David Roodman's (2009) paper (http://www.stata-journal.com/sjpdf.h...iclenum=st0159) extensively, especially pp123-125, but I can not understand the difference in analysis or interpretation when using gmm(L.x, eq(diff)) or gmm(L.x, eq(level)) instead of gmm(L.x). Could you possibly elaborate a bit on this? And are there any significant aspects of these options when using the SystemGMM with orthogonality option?

H(#) option when using System GMM with orthogonality condition.
When reading David Roodman's (2009) paper (http://www.stata-journal.com/sjpdf.h...��) p. 117 and p. 123 I understand that h(3) is the default for System GMM in xtabond2, but that h(3) differs slightly from the matrix in the orthogonal deviations case (ibid p.117). So, which h(#) is to be used when using the orthogonality option? I do not wish to use h(1) due to the panel's heteroskedasticity. I see that h(2) imitates DPD for Ox according to Roodman (ibid, p. 123) and I believe that h(2) was the matrix used by Blundell, Bond and Windmeijer (2001) (according to your post in https://www.statalist.org/forums/for...r-panel-models) but what are the implications of using h(2) in my case? I read in your same post that h(3) is not optimal if there are unobserved unit-specific effects, which I believe there are in my case.

Robust standard errors on municipality level or region level?
This is maybe a quite general question, and not specifically pertaining to GMM, but here it goes. My dependent variable is on county level (N=21) and my explanatory variables are on municipality level (N=288), county level, and also on national level. Now, which level do I cluster my standard errors on? I believe that there is correlation between municipalities within the same county (which speaks for the use of clustering SE:s on county level). However, my data also suffers from serial correlation within each municipality, which speaks for the use of clustering SE:s on municipality level. One source of heteroscedasticity on county level is the fact that each county has different numbers of farmers, which means that counties with a smaller farmer's population are more susceptible for shocks compared to those with more farmers. Therefore, I want to use analytic weights, using the square root of number of farmers in each county. Does it make sense to cluster standard errors on municipality level (to account for serial correlation) and to use aweights to address heteroscedasticity on county level?

Best regards
-Hanna
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#17

22 Jun 2017, 06:39

- exclusion of year dummies and orthogonal option:
For the decision, which year dummies to include, it does (in general) not matter how you are instrumenting the other variables. Only the regressors in your model matter.
By the way, you are not using differences of the second lag as an instrument for the equation in levels but differences of the first lag, DL.y.
The option orthogonal does not change anything here because you are instrumenting the time dummies for the model in levels only.

- eq(diff) or eq(level):
For the gmm() option, you do not necessarily have to specify the equation if you want to use instruments for both equations. However, specifying these instruments separately for both equations with the suboptions eq(diff) and eq(level), and the laglimits() suboption, is much less error prone. You get what you specify. Without the suboptions, you are delegating this task to xtabond2. This is all right but you should carefully check the instruments that are reported below the regression table, if that is indeed what you had in mind. Notice that gmm(L.y, eq(level)) does not give you the same instruments for the level equation compared to gmm(L.y, eq(level)). Just compare with the listed instruments below the regression output.
It is a different story when you use the ivstyle() option. Here, my recommendation would be to always specify the equation() suboption because ivstyle() without this option is not the same as when you specify it for both equations separately. You can easily see this by adding one more variable to the ivstyle() option. The number of instruments will only go up by 1 although you would expect it to go up by 2. This recommendation is not only valid for time and other dummies but for any variable. Essentially, I strongly recommend to never use the ivstyle() option without reference to a particular equation.
In a nutshell, always specify the instruments as detailed as possible. This minimizes the risk that xtabond2 is doing something unintended.
Are you sure you want to assume that all of your x variables are completely exogenous, even uncorrelated with the unobserved "fixed" effects? That is what you are implicitly doing when you are instrumenting the variables in the levels equation by themselves.

- h(#) option when using system GMM with orthogonal option:
When using the orthogonal option, h(1) and h(2) are actually identical because the forward-orthogonal transformation leaves the errors serially uncorrelated. None of the three initial weight matrices are actually optimal for one-step system GMM estimation and/or in the case of heteroskedasticity. For two-step estimation, the choice is asymptotically irrelevant. I tend to recommend h(3) irrespective of whether you use the first-difference or the forward-orthogonal transformation. It is the most natural choice because the one-step estimator essentially becomes a conventional 2SLS estimator.

- robust standard errors on municipality level or region level:
There is no easy and short answer on this question. Clustering on the county level automatically accounts also for serial correlation within municipalities because the latter are a subset of the former. That said, 21 counties is not that large. Clustering on that level might not yield reliable estimates of the variance-covariance matrix. This might actually do more harm than ignoring the correlation across municipalities. But I am just speculating here.
By the way, serial correlation in the errors invalidates your instruments for the lagged dependent variable. That would be much more of a concern than obtaining robust standard errors. If the Arellano-Bond test still indicates serial correlation, you might want to add further lags of the dependent variable (or the regressors) to the model.
I do not have much experience on using weights in this context. Sorry.

https://www.kripfganz.de/stata/
Comment
Hanna Lindstrom

Join Date: Apr 2017

Posts: 25
#18

23 Jun 2017, 07:09

Thank you Mr Kripfganz!
You have been to a tremendous help in clarifying some of the details of GMM and xtabond2.
Best regards,
Hanna Lindström
Comment
Annur Wijayakusuma

Join Date: Mar 2020

Posts: 31
#19

13 May 2021, 21:07

Hi Sebastian,

I am sorry to jump into this thread. I am just wondering about years dummy. I read the seminal paper of Mr Roodman and the data that he used in that article, it was a balanced panel and used the year dummy 1 and 0. However, the year was from 1976 - 1984. I am confused why he used year dummy 1 and 0 instead of 0 - 8. What happened if I used the year dummy for the unbalanced panel which there are some missing data for some years, whether I put the year dummy as 1 or 0 based on the Mr Roodman data used?

Honestly, I am confused about whether use the year dummy by creating my data in excel and put 1 or 0, or 0-8 for the panel data. There are some discussion about this as well that the panel data no need to create years dummy as it has already in and just put command: i.year in the model

I appreciate your reply.

Regards,

Annur
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#20

14 May 2021, 06:08

If you insert a year variable that is coded from 0 to 8, then you are essentially adding a linear time trend to your model. This linear trend cannot capture any common anual fluctuations other than a general upward or downward trend. To capture arbitrary time effects common to all groups, then you need to include a full set of time dummies (coded 0/1 each). This is what is automatically done in Stata when you type i.year (instead of just year, which gives the time trend). Thus, you do not need to manually create these dummies beforehand. Just use Stata's factor variable notation.

https://www.kripfganz.de/stata/
Comment
Annur Wijayakusuma

Join Date: Mar 2020

Posts: 31
#21

14 May 2021, 07:37

Dear Sebastian,

I just want to make sure whether the model that I use and the command syntax in STATA is correct or not.
My model is

GROWTH(i,t) = a˛ + RD(it−1) + GROWTH(it−1) + SIZE(it) + LEV(it) + B(ıt) + εit

with B(it) represent the year dummy.

My data is N=487 firms and T=12, it is an unbalanced panel. in order to analyse, I use two-step system GMM and my syntax command is as follow

xtabond2 Growth L.RDIntensity L.Growth Leverage SIZE i.Year, gmm( RDIntensity , lag(2 4) equation(diff)) gmm( Growth, lag(1 1) equation(level)) iv( Leverage SIZE ))small twostep robust

My question is whether my application of year dummy in xtabond2 correct?

Thanks in advance

Annur
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2575
#22

17 May 2021, 05:07

If you specify time dummies in the list of regressors, you must also specify them as instruments. Please see slides 74 to 79 of my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment