XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#481

13 Sep 2022, 10:57

1.A) If the industry dummies are time-invariant, then only code 1.2 would be appropriate. 1 of the industry dummies should normally be omitted due to the "dummy trap", i.e. all 8 industry dummies are perfectly collinear with the intercept. If more dummies are omitted (either from the regressor list or the instrument list), then this indicates that there might be other multicollinearity problems as well, which I cannot tell from the available information.
If the industry dummies vary over time, then 1.1 and 1.3 would also be appropriate codes. A similar qualification as before applies: At least one dummy will be omitted due to perfect collinearity.

1.B) As I said in 1.A, at least one dummy will be omitted. Stata will automatically omit one dummy at random. If you want to omit a specific dummy, which shall serve as a reference industry, then you need to omit it manually.

1.C) You can include all industry dummies as instruments. Stata will automatically omit at least one due to perfect collinearity. It does not matter which dummy is omitted in the list of instruments.

1.D) Option nolevel is recommended if you do not have any instruments for the level model and you want a conventional difference/FOD estimator.

2.1) xtdpdgmmfe automatically selects the appropriate instruments / moment conditions (and therefore the relevant estimator) corresponding to the chosen assumptions.

2.2) xtdpdgmm can estimate a model with the Chudik-Pesaran nonlinear moment conditions even if some regressors are treated as endogenous. However, the resulting estimator would be inconsistent. xtdpdgmm does not check whether you have chosen the options in a consistent way. Therefore, xtdpdgmmfe is less prone to such errors.

3) Both commands can be specified accordingly; see post #450 for an example.

4) Option curtail() of xtdpdgmmfe can be used to set a maximum lag depth of 3 for all sets of instruments. For an endogenous variable, this would use lags 2 and 3. For a predetermined variable, this would use lags 1 to 3. The command is less flexible regarding individual lag orders for different variables. You also cannot easily specify lags 2 to 4 for endogenous but lags 1 to 3 for predetermined variables. This is intentional to reduce the temptation for researchers to search for the "nicest" model. Keeping the maximum lag order constant is the least arbitrary approach. It will give predetermined variables one more instrument than endogenous variables. Again, this is intentional as it utilizes the additional overidentifying restriction from making the stronger predeterminedness assumption.

5) The sequential model selection process is not required. It is merely a suggestion to reduce the arbitrariness of the modeling choice.

6) The doubly-corrected robust standard errors are generally recommended.

7) No, lag() is an abbreviation of lagrange().

8) These commands do not support nonlinear models for limited dependent variables, only the linear probability model.

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#482

19 Sep 2022, 10:50

Dear Professor Sebastian,

Thank you very much for your valuable response. Your cooperation and support are priceless.

1) Regarding my question 4 post #480 “When using your command ‘xtdpdgmmfe’, can the regression model include three lags of each regressor?”.
I did not mean three lags as instruments. I asked if I can include three lags of any variable as regressors i.e., the three lags are regressors. For instance, suppose that my regression model (the right-hand side) includes the following regressors: L(0/1).L.y; L(0/2).L.x1; L(0/2).L.x2; L(0/2).L.x3; L(0/2).L.x4; L(0/2).L.x5; L(0/2).L.x6; L(0/2).L.x7; L(0/2).L.x8; L(0/2).L.x9; x10. Thus, my question is: can I use your command ‘xtdpdgmmfe’ to run such a regression model? If so, do I have to classify each lag as exogenous, predetermined, or endogenous? e.g., for L(0/2).L.x1, do I have to specify L.x1, L2.x1, and L3x1 and classify each of them as exogenous, predetermined, or endogenous when using your command ‘xtdpdgmmfe’?

Also, when using your command ‘xtdpdgmmfe’, do I have to classify the dummies?

2) What should I classify the lag of an endogenous variable? Can I classify the lag of the endogenous variable as predetermined? For instance, L.x1 is the independent variable of my regression model (L.x1 is endogenous). My regression model includes also the first and second lags of L.x1 as regressors. Thus, what should I classify the first and second lags of L.x1, given that L.x1 is endogenous?

3) Regarding post #481 point 1.D) “Option nolevel is recommended if you do not have any instruments for the level model and you want a conventional difference/FOD estimator.”.
Thus, I kindly ask you please to give an example on how to use the option ‘nolevel’ for an unconventional difference/FOD estimator.

4) Regarding post #481 point 8) “These commands do not support nonlinear models for limited dependent variables, only the linear probability model.”.
Do you mean that I cannot use your commands in my research as the dependent variable y is limited and its values lie between 0 and 1? If so, I kindly ask you please for your advice.

5) Can every time-invariant variable be classified as an exogenous variable?

6) If dummies vary over time i.e., they are time-variant. Thus, can I classify them as exogenous? If no, what should the time-variant dummies be classified?

7) Is it normal to classify ‘firm age’ as exogenous?

8) How to decide whether dummies are time-invariant or time-variant?

9) When using your command 'xtdpdgmm' to implement the Difference GMM estimator, do the corresponding findings obtain the coefficients of the differenced variables (variables at differences i.e., ∆) or the coefficients of the variables at level?

10) Regarding post #473 point 4.6) “I would probably not include the diff suboption for iv() when using model(fod), but there is nothing wrong about it. For strictly exogenous variables and for dummy variables, I would personally use model(mdev) instead of model(fod), but note that this is not yet standard practice.”

Thus, my questions are:

10.1) As you would probably not include the diff suboption for iv() when using model(fod), thus, what to include instead?

10.2) Sorry! I did not get what you mean by “For strictly exogenous variables and for dummy variables, I would personally use model(mdev) instead of model(fod)”. Would you please give an example (the entire code) on how to use model(mdev) instead of model(fod)?

11) Regarding post #473 point 4.7) “… model(mdev) is appropriate for strictly exogenous variables or dummy variables. For an estimation without a level equation, I would recommend the following instruments:
Code:
gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(md)) iv(i.fc, model(md)) iv(i.mn, model(md))

For an estimation with a level equation, I would recommend the following instruments:
Code:
gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level))…”

Thus, given that the variable x10 'firm age' is exogenous, my questions are:

11.1) For your code when the estimation is without a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(md)) iv(i.fc, model(md)) iv(i.mn, model(md)) two vce(r)

11.2) For your code when the estimation is with a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) and I also type gmm(x10, diff model(level) lag(0 0)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, diff model(level) lag(0 0)) two vce(r)

11.3) Also, regarding your code when the estimation is with a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) and I also type gmm(x10, model(level) lag(0 0)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, model(level) lag(0 0)) two vce(r)

12) For dummies, is it required to type ‘i.’ before the industry (ind), year (fc), and country (mn) dummies? If so, why?

13) Regarding post #475 point 7) “You normally instrument all variables in the differenced model (possibly excluding dummy variables). If your variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity), then you additionally instrument them in the level model.”

Thus, my question is: How to check whether my variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity)?

Sorry for the long message, professor!

Your patience, support and effort are highly appreciated.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#483

20 Sep 2022, 05:18

1) Yes, you can use xtdpdgmmfe for such a model. You only need to specify each variable once as exogenous/predetermined/endogenous, not separately for each lag. Make sure you allow for sufficient lags used as instrument, i.e. a minimum of 3 if you are using lags 0 to 2 as regressors.

Regarding dummy variables, you would need to specify them as exogenous variables. I am afraid this may not deliver the desired specification, especially if those dummies are time-invariant. I have to think about adding another option to xtdpdgmmfe, but this will not happen very soon.

2) See 1.

3) An "unconventional" diff-GMM estimator might be one where all instruments for exogenous/predetermined/endogenous variables refer to the first-differenced model but you still want the time dummies (or other dummies) to be instrumented in the level model.

4) You can use it, but only for estimating a linear probability model.

5) Time-invariant variables are typically classified as exogenous with respect to the idiosyncratic error term. However, to identify their coefficients you would really need that they (or appropriate instruments for them) are exogenous with respect to the firm-specific effects. This is not what the exogenous() option of xtdpdgmmfe is doing. Currently, you would need to use xtdpdgmm and specify iv(time-invariant instrument, model(level)).

6) One could do this, yes. Yet, one would probably specify those dummies in the same way as if they were time-invariant, unless you only want them instrumented in the first-differenced model.

7) I cannot answer that as it is application specific. There could possibly be reasons to treat it as endogenous, at least with respect to the firm-specific effect.

8) You can check with the xtsum command if your dummies have zero or nonzero within variation.

9) The interpretation of the coefficients is always for the untransformed/level variables.

10.1) You do not need to use anything in addition to model(fod). You could use bodev as in Hayakawa, Qi, Breitung (2019), but personally I am not a fan of it, as you would lose an additional observation for each firm.

10.2) I am not entirely sure anymore what I meant with that sentence. For a strictly exogenous variable I would use the combination of the following two options: gmm(x, lag(0 .) model(fodev)) gmm(x, lag(0 0) model(mdev)), possibly restricting the maximum lag in the first option. For time-varying dummy variables, I would simply use iv(x, model(mdev)). That's basically the same as in your quoted example 11.

11.1) and 11.2) look alright. 11.3) only works if all regressors are exogenous with respect to the firm-specific effects, which is typically not reasonable to assume.

12) Please see help fvvarlist.

13) You would conduct a difference-in-Hansen test between the system GMM and a difference GMM estimator (where the latter just leaves out all of the instruments for the level model). If you use the overid option of xtdpdgmm, you could find this test in estat overid, difference in the row labelled model(level).

https://twitter.com/Kripfganz
Comment

Zainab Mariam

Join Date: Jul 2022
Posts: 51

#484

22 Sep 2022, 17:38

Dear Professor Sebastian,

Many thanks for your useful reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) According to posts #373 and #473 point 2) “The lagged dependent variable L.Y should normally be treated as predetermined (equivalently, the dependent variable Y itself is endogenous).” “Any lag of the dependent variable would be treated as predetermined.”

Thus, my question is: can I classify the first and second lags of L.x1 as predetermined, given that L.x1 is endogenous (L.x1 is the independent variable of my research)?

2) When using your command ‘xtdpdgmm’ to implement the System GMM estimator, do the corresponding findings show the coefficients of the differenced/transformed variables (variables at differences i.e., ∆) or the coefficients of the variables at level?

3) Regarding using your command ‘xtdpdgmm’ to implement the Difference GMM estimator, I have the following questions:

3.1) How to interpret the coefficients of the first lag and the second lag of the dependent variable y (i.e., how to interpret the coefficients of L.y and L2.y)? e.g., the coefficients of the first lag and the second lag of the dependent variable y are 0.5 and 0.02 for L.y and L2.y, respectively.

3.2) L.x1 is the independent variable of my regression model. Thus, how to interpret the coefficient of the independent variable L.x1? e.g., the coefficient of the independent variable L.x1 = 0.001

3.3) Also, my regression model includes the first lag of the independent variable L.x1. Thus, how to interpret the coefficient of L2.x1 (where L2.x1 is the first lag of the independent variable L.x1)? e.g., the coefficient of L2.x1 = -0.0009

4) Is it required for the coefficients of the first and second lags of the dependent variable y (L.y and L2.y) to have opposite signs? if so, why? and what if the coefficients of L.y and L2.y have the same sign?

5) L.x1 is the independent variable of my regression model. Also, my regression model includes the first lag of the independent variable L.x1. Thus, my question is: is it required for the coefficients of L.x1 and L2.x1 to have opposite signs? if so, why? and what if their coefficients (i.e., the coefficients of L.x1 and L2.x1) have the same sign?

6) Is the interpretation of the coefficients obtained by the System GMM estimator different from the interpretation of the coefficients obtained by the Difference GMM estimator?

7) Regarding post #483 point 13), I kindly ask you please to explain how the findings of a difference-in-Hansen test check whether my variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity). Suppose we have the following outcomes of the difference-in-Hansen test.

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(diff)	14.6666	6	0.0230	1.5296	3	0.6754
2, model(diff)	4.0234	3	0.2590	12.1728	6	0.0582
3, model(level)	15.8404	8	0.0447	0.3558	1	0.5509
4, model(level)	12.0861	7	0.0978	4.1102	2	0.1281
model(diff)	0.0000	0	.	16.1962	9	0.0629
model(level)	8.0920	6	0.2314	8.1042	3	0.0439

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	8.9323	6	0.1774	3.7500	7	0.8081
2, model(fodev)	9.8897	6	0.1294	2.7926	7	0.9035
3, model(fodev)	9.2784	6	0.1585	3.4039	7	0.8453
4, model(fodev)	6.2261	6	0.3983	6.4561	7	0.4876
5, model(level)	9.6163	8	0.2930	3.0659	5	0.6898
model(fodev)	.	-15	.	.	.	.

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	30.5644	30	0.4370	1.0296	7	0.9943
2, model(fodev)	25.8607	29	0.6329	5.7333	8	0.6771
3, model(fodev)	26.6376	29	0.5913	4.9564	8	0.7622
4, model(fodev)	27.3258	30	0.6061	4.2682	7	0.7484
5, model(fodev)	25.8421	29	0.6339	5.7518	8	0.6750
6, model(fodev)	27.0201	29	0.5706	4.5739	8	0.8020
7, model(mdev)	31.5847	36	0.6786	0.0093	1	0.9233
8, model(level)	31.3841	35	0.6434	0.2099	2	0.9004
9, model(level)	28.2006	32	0.6594	3.3934	5	0.6396
model(fodev)	.	-9	.	.	.	.
model(level)	28.1268	30	0.5637	3.4672	7	0.8387

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	25.4072	29	0.6570	2.3428	7	0.9385
2, model(fodev)	23.1059	28	0.7277	4.6440	8	0.7949
3, model(fodev)	22.3165	28	0.7664	5.4334	8	0.7104
4, model(fodev)	26.3066	29	0.6091	1.4433	7	0.9842
5, model(fodev)	23.2937	28	0.7182	4.4563	8	0.8138
6, model(fodev)	22.9352	28	0.7363	4.8147	8	0.7772
7, model(mdev)	27.4318	35	0.8154	0.3181	1	0.5727
8, model(level)	25.3010	31	0.7541	2.4489	5	0.7842
nl(noserial)	27.1247	35	0.8268	0.6253	1	0.4291
model(fodev)	.	-10	.	.	.	.

Also, what do the dots ‘.’ in the difference-in-Hansen test’s findings refer to?

Your patience, support and effort are highly appreciated.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#485

27 Sep 2022, 05:44

1) You can normally do this, yes.

2) The coefficients are always interpreted for the level model, no matter whether you use the difference GMM or the system GMM estimator.

3.1) It is difficult to separately interpret those short-run adjustment coefficients if there are multiple lags. The first lag's coefficient measures the strength of the adjustment to a shock in the previous period, all else equal. For the second lag, you cannot simply extend this argument because the response to a shock 2 periods before also depends on the cumulative one-period responses. It is an additional delayed impulse on top of the first-order response. For the long-run adjustment, the sum of the two coefficients tells you how quickly the process reverts back to its equilibrium. If the coefficients sum up to 1, the shocks have a permanent effect. If they sum up to 0, the equilibrium is restored instantly because the initial response would be fully counteracted immediately afterwards.

3.2) This coefficient has the standard interpretation. Depending on whether your variables are measured in logs, this could be (semi-)elasticities. It is a "short-run" effect, i.e. telling you the instant response of the dependent variable to a change in that independent variable.

3.3) That would be a delayed effect after accounting for the instantaneous effect in 3.2.

4) It is not required for them to have opposite signs. For stability of the dynamic system, they should normally sum up to a value between 0 and 1. Opposite signs indicate that the initial response "overshoots" and is corrected by the delayed response.

5) There is no requirement here, not even on the sum of those coefficients. Again, opposite signs can indicate an "overshooting" effect.

6) No. The coefficients in the model are interpreted independently of the chosen estimator.

7) Dots mean that the respective test cannot be computed due to insufficient degrees of freedom. In the first table, you would look at the last row labelled "model(level)". First, you check the "Excluding" column, which is a Hansen test for the model excluding those level instruments; i.e. it effectively is a Hansen test for a difference GMM estimator. If this test passes with a sufficiently high p-value, then you move on to the "Difference" column. The latter is the actual Difference-in-Hansen test, which compares the system GMM estimator to the difference GMM estimator (which is why we need to check the Excluding test first; otherwise this would not be a valid comparison). Here, we would reject the validity of the level instruments because the p-value is too small. With the other tables, you would proceed similarly. In tables 2 and 4, you would check the 5th and 8th row, respectively, because those are the only instruments for the level model. In table 3, you would check again the last row because you are interested in testing all level instruments jointly.

Please note that I will be unlikely to respond to further questions over the next weeks due to heavy teaching loads.

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#486

27 Sep 2022, 17:00

Dear Professor Sebastian,

Many thanks for your beneficial response. I do not know how to thank you, Professor! Indeed, saying "thank you very much" is not enough. Your cooperation and support are priceless. You are an invaluable source of information.

I am very grateful to you for all your support and effort.

I have no further questions. Wish you the best of luck and success in your teaching.

Your patience, support and effort are highly appreciated, Professor!
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#487

01 Dec 2022, 15:30

Sebastian,

Thanks to your prior help, I am using xtdpdgmm with success for a project I am working on that involves as a dependent variable ratings given to government bonds in various countries. As a small part of this project, I want to show that the raters are more severe when downgrading a bond's rating than they are generous when upgrading a bond's rating, all else being equal. To test this, I have created two dummy variables. One dummy variable is zero (0) if, in a given year, there is no change in rating or there is an upgrade in rating and one (1) if there is a downgrade. The other dummy variable is zero(0) if there is no change in rating or there is a downgrade in rating and one (1) if there is an upgrade. I include a number of control variables that are standard for studying bond ratings. After running system GMM with both dummy variables included, I compare the coefficients for each of the dummy variables to see if they are statistically different in the direction just stated. In your mind, is this method proper? One of my concerns is whether I am using dependent variable factors on both sides of the equation: (1) bond rating as the formal dependent variable; and (2) whether a bond rating goes up or down on the independent variable side of the equation. I would appreciate your thoughts on this.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#488

02 Dec 2022, 03:28

Comparing those dummy coefficients could possibly be a reasonable strategy. Certainly, those dummy variables have to be treated as endogenous variables because they are effectively functions of the dependent variable. But you can possibly still use lags of those dummies as instruments, assuming that positive/negative rating changes are autocorrelated over time.

Last edited by Sebastian Kripfganz; 02 Dec 2022, 03:33.

https://twitter.com/Kripfganz
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#489

02 Dec 2022, 09:20

Thanks so much. I have a couple of follow-up questions. When using these positive/negative dummy variables, I note that the coefficient of my main independent variable of interest for the project drops a lot and is no longer statistically significant. Is that something I should be concerned with, or is it just a product of including dummy variables that don't really belong in the model except for the specific purpose of testing whether the negative direction of the bond rating change is stronger than the positive direction? Also, when I include the dummy variables in my models, the overidentification test results are fine for both rating companies I am looking at, but underidentification for one company is about p=.09 and for the other company about .300. How concerned about underidentification at these levels should I be?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#490

02 Dec 2022, 09:37

If there are differences in the direction of the effects, then those dummy variables have a place in your model. If you change the model in this or any other way, it is not surprising that the effects of other variables can change as well, especially if this other variable is strongly correlated with the dummy variables. This could go in any direction. If you believe in the model with dummy variables, then the smalle reffect of your main variable is not a concern but a feature of this model.

For the underidentification tests, a p-value of 0.3 might indeed be worrying. Using lags of dummy variables as instruments can often lead to weak instrument. I am afraid, I don't have a general solution for this problem.

Last edited by Sebastian Kripfganz; 02 Dec 2022, 10:34. Reason: Incorrect statement about underidentification tests amended

https://twitter.com/Kripfganz
Comment
Joseph L. Staats

Join Date: Aug 2015

Posts: 28
#491

02 Dec 2022, 10:12

Thanks again. I'm a bit confused about your comments concerning my underidentification results. Don't I want low p-value results? That's what slides 111 and 114 of your 2019 London Stata Conference presentation seem to suggest. Is it possible you thought I had asked about overidentification?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#492

02 Dec 2022, 10:34

I am sorry for the confusion. Yes, you are absolutely right. I have edited my previous post.

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#493

18 Dec 2022, 08:45

Dear Professor Sebastian,

Many thanks for all your prior help, and sorry for coming back!

1) Regarding dummy variables, I have the following questions, please!

1.1) Can time-invariant dummies be classified as exogenous?

1.2) Can time-variant dummies be classified as exogenous?

1.3) Can all dummies be classified as exogenous?

2) My regression model includes the dummy variable (cf) that takes the value of One for the 3 years 2008, 2009, 2010. Thus, my questions are:

2.1) Do I have to put ‘i.’ before the dummy variable (cf)? i.e., do I have to put ‘i.cf’ in the regression?

2.2) Also, do I have to put ‘i.’ before the dummy variable (cf) when instrumenting the dummy variable (cf) i.e., ‘iv(i.cf, …)’?

2.3) Can I consider the dummy variable (cf) as exogenous or as endogenous?

3) Suppose I have two countries (Japan and UK), thus, my regression model includes the dummy variable (mn) that takes the value of One if the firm is in Japan, and Zero otherwise. Thus, my questions are:

3.1) Do I have to put ‘i.’ before the dummy variable (mn)? i.e., do I have to put ‘i.mn’ in the regression?

3.2) Also, do I have to put ‘i.’ before the dummy variable (mn) when instrumenting the dummy variable (mn) i.e., ‘iv(i.mn, …)’?

3.3) Can I consider the dummy variable (mn) as exogenous or as endogenous?

4) For the dummy variable cf (that takes the value of 1 for the 3 years 2008, 2009, 2010) and regarding the dummy variable mn {that takes the value of One if the firm is in Japan, and Zero otherwise, given I have two countries (Japan and UK)}, my question is: do I have to include lags in the iv() option for dummies for (cf) and (mn) when instrumenting these dummies?

5) Do I have to include lags in the iv() option for dummy variables when instrumenting the dummies?

6) Regarding time dummies, I have the following questions, please!

6.1) Why to include the time dummies in the regression model? i.e., what is the rationale behind including the time dummies in the regression model?

6.2) If I am not using the teffects option, then how do I have to include the time dummies explicitly in my regression model? i.e., how do I have to express/type the time dummies explicitly in my regression model? Suppose the research’s time period is 2000-2020.

6.3) My regression model includes the dummy variable (cf) that takes the value of One for the 3 years 2008, 2009, 2010. Thus, my question is: Is it correct to include both the dummy variable (cf) and the time dummies in the same regression model (in the same code)? If so, how do I have to express/type both the dummy variable (cf) and the time dummies using the teffects option and without using the teffects option?

7) Tables of research and articles show ‘Year Effects’ and ‘Country Effects’. Regarding the ‘Year Effects’ and the ‘Country Effects’, those tables show ‘Yes’, and sometimes show ‘No’. Thus, my questions are:

7.1) How can I get ‘Yes’ ‘No’ from my regression using your command xtdpdgmm? i.e., what do I have to apply/perform in order to obtain/get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’? Is there any option/expression I have to include in the regression model to get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’? Is there any option/test/expression I have to apply/perform to get ‘Yes’ ‘No’ regarding the ‘Year Effects’ and the ‘Country Effects’?

7.2) How to know/decide whether there is ‘Year Effects’ or there is no ‘Year Effects’? Also, how to know/decide whether there is ‘Country Effects’ or there is no ‘Country Effects’?

Your patience, support and effort are highly appreciated, Professor!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2540
#494

03 Jan 2023, 12:37

1.1) We need to be careful here what type of "exogeneity" we have in mind. In the dynamic panel data literature, exogeneity typically refers to the stochastic relationship between the respective variables and the idiosyncratic error component. Thus, we typically call a variable "strictly exogenous" if it is uncorrelated with the idiosyncratic error component for all time periods, even though it might be correlated with the unobserved group-specific error component (aka "fixed effects"). Strictly speaking, the latter correlation still turns those variables endogenous in the classical sense. Now, when it comes to time-invariant regressors, they may or may not be correlated with either of the error components, although typically we would assume it to be uncorrelated with the idiosyncratic time-varying error component. In this regard, time-invariant regressors would be strictly exogenous in the dynamic panel data sense, but this is not of much help because we cannot use the typical instruments (lagged differences for levels or lagged levels for differences), because the differences of time-invariant regressors vanish.

1.2) The question whether a dummy variable is exogenous or not is no different to the same question for any other regressor. It may be exogenous, predetermined, or endogenous. It may be correlated with the group-specific effects or not.

1.3) Dummy variables are often treated as exogenous, but this should not be an automatism. Whether you can treat a dummy variable as uncorrelated with the group-specific effects typically depends on what unobserved characteristics you think those group-specific effects represent. Considering time dummies, there is usually no reason not to treat them as exogenous; but we would not give them any structural interpretation anyway.

2.1) Without the factor-variable prefix i., you would include a linear time trend instead of separate time dummies for every year. This would be fine if there is such a linear trend in the time effects indeed.

2.2) If you use i. for the regressors, you should also use i. for the instruments.

2.3) Time dummies are usually treated as exogenous.

3.1) For a binary dummy variable which takes only values 1 or 0, the i. prefix is optional. The results will be the same with or without the prefix.

3.2) See above.

3.3) This depends on what you think the unobserved group-specific error component represents and whether you want to give the country dummy a structural interpretation. If these should be a Japan-specific effect conditional on some other unobserved time-invariant characteristic which differs systematically across countries, then you need to find an alternative instrument which also differs systematically across countries but is uncorrelated with the unobserved characteristic you want to hold fixed. Normally, you would not care too much about such a structural interpretation, and then can just treat the country dummy as exogenous.

4) You don't normally have to include lags of those dummies. Normally, those lags would be dropped because of collinearity anyway.

5) Same as in 4).

6.1) Time dummies are often included to account for global shocks which affect all firms simultaneously. If a global shock affects both the dependent and the independent variables, then omitting the time dummies could lead to spuriously significant coefficient estimates.

6.2) You would include i.cf in the list of independent variables, together with option iv(i.cf).

6.3) Including both cf and i.cf leads to a problem of perfect collinearity. There is no need (and usually no reason) to include cf once you included i.cf (or teffects).

7.1) You need to include the "yes" or "no" manually in the tables of your research paper. The command is not producing anything like that. If you have included time dummies (i.cf or option teffects), you can write "yes"; similarly for country dummies. People still write "yes" even if those dummies are not statistically significant. It is usually just an indication that those dummies are included in the model.

7.2) Whether there are time effects or country effects could be assessed by checking their (joint) statistical significance; but again, the "yes"/"no" in 7.1) is typically not based on such a test.

https://twitter.com/Kripfganz
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#495

04 Jan 2023, 11:00

Dear Professor Sebastian,

Thank you very much for your valuable reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) According to the table on slide 122 of your 2019 London Stata Conference presentation, can we say that there is ‘Year effect’ or can we say that there is no ‘Year effect’? Also, according to the table on slide 81 of your 2019 London Stata Conference presentation, can we say that there is ‘Year effect’ or can we say that there is no ‘Year effect’?

2) According to the table on slide 122 of your 2019 London Stata Conference presentation, how to interpret/comment on the findings of ‘year’?

3) According to the table on slide 81 of your 2019 London Stata Conference presentation, how to comment/ interpret the findings of ‘year’?

4) Your code on slide 86 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)

When you write ‘iv(i.ind, model(level))’, I have the following questions, please!

4.1) Are you instrumenting the industry dummies in the differenced model or in the level model?

4.2) Are you using the differenced instruments or the level instruments for the industry dummies?

5) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, model(level)) two vce(r)

When you write ‘iv(yr1980-yr1982, model(level))’, I have the following questions:

5.1) Are you instrumenting the year dummies in the differenced model or in the level model?

5.2) Are you using the differenced instruments or the level instruments for the year dummies?

6) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) two vce(r)

When you write ‘iv(yr1980-yr1982, diff)’, I have the following questions:

6.1) Are you instrumenting the year dummies in the differenced model or in the level model?

6.2) Are you using the differenced instruments or the level instruments for the year dummies?

7) Your code on slide 77 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level)) two vce(r)

When you write ‘iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level))’, I have the following questions, please!

7.1) Are you instrumenting the year dummies in the differenced model or in the level model or in both models?

7.2) Are you using the differenced instruments or the level instruments or both instruments for the year dummies?

8) Your code on slide 38 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

When you write ‘gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff)’, I have the following questions, please!

8.1) Are you instrumenting the variables n, w, k in the differenced model or in the level model i.e., are you instrumenting the differenced model or the level model?

8.2) Are you using the differenced instruments or the level instruments for the variables n, w, k?

Your patience, support and effort are highly appreciated, Professor!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment