Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dear Prof. Sebastian Kripfganz

    The Hausman test shows that the random effect is more accurate to estimate my model. In this case, would this affect anything when specifying my sys-GMM estimator?

    Thanks

    Comment


    • I would be very appreciative if somebody could answer my questions in #600. Thank you for your help in advance.

      Best,

      Taka

      Comment


      • Taka Sakamoto
        If you assume that smei4 is an exogenous variables, which is also uncorrelated with any group-specific effects as under a conventional random-effects assumption, then the use of iv(smei4, model(level)) is fine. Lagging the instrument is only necessary if there is concern of contemporaneous correlation (endogeneity) with the idiosyncratic error term. There is no general need to use lag(0 1), but using the extra lagged instrument could potentially provide additional identification strength for some of the other regressors. So, yes, this could be done. Eventually, it all depends on what you assume about the correlations of your variables with the group-specific and the idiosyncratic error component.

        In general, the iv() option serves the same purpose as specifying instrumental variables with other estimation commands, such as ivregress.

        You might find it easier to use the xtdpdgmmfe command, which is part of the same package but has easier syntax. However, it does not support random-effects type assumptions, as would be required for the above example.

        You might also find the following YouTube video useful, where I further explain how to estimate dynamic panel data models with GMM in Stata: https://www.youtube.com/watch?v=5EnP...dvBvwClYxkjp0-

        Sarah Magd
        I assume you have used the traditional Hausman test for the static model. This requires that all regressors are strictly exogenous, which rules out dynamic models. Under these strong assumptions, no GMM estimator is needed. If you want to account for potential endogeneity, then the traditional Hausman test is not applicable and the results you obtained should not be trusted.
        Last edited by Sebastian Kripfganz; 24 Aug 2023, 05:49.
        https://twitter.com/Kripfganz

        Comment


        • Thank you, Sebastian. It's a big help. Thank you for your generous help.

          Taka

          Comment


          • Thanks Prof. Sebastian Kripfganz ,

            Could you please explain your answer more?

            I applied the robust standard Hausman test on my static model, and it says that the Random estimator should be applied. However, in the second step, I follow the dynamic specification for my analysis. Also, my independent variables suffer from potential endogeneity due to a reverse causality. Given these two points, can I use the sys-GMM estimator to estimate the dynamic model?

            Thanks

            Comment


            • If the dynamic model is the correct model, then any results you obtain for the static model are not reliable because they suffer from an omitted-variable bias. Consequently, the Hausman test for the static model is not reliable. The same is true when some independent variables suffer from endogeneity. You can estimate a dynamic model by GMM in this case, yes, but the Hausman test for the static model has no relevance.
              https://twitter.com/Kripfganz

              Comment


              • Dear Prof. Sebastian Kripfganz ,

                I am employing GMM estimation using the xtdpdgmm command. Following is my code:

                xtdpdgmm L(0/1).EPS L(0/1).ROAA TDtoTA SalesGrowth lnTA l.EnvDisclScore i.t, model(diff) gmm(L(0/1).EPS L(0/1).ROAA, lag(2 3) collapse) iv(TDtoTA SalesGrowth lnTA CFOtoSales CapitalIntensity l.EnvDisclScore, diff model(level)) iv(i.t) igmm vce(cluster id)

                My endogenous variables are L(0/1).EPS and L(0/1).ROAA while the other variables are exogenous. I have the following questions:

                1. Do you think my code is correct?
                2. I get very different results when I include model(level) suboption in iv() and without it. Could you please help me understand what it means to include 'diff' suboption alongside 'model(level)'suboption? Isn't the first difference transformation automatically done by xtdpdgmm when model (diff) suboption is specified?
                3. Finally, are the variables mentioned in gmm() automatically differenced?

                Thank you very much for your time.

                Kind regards,
                Rupali

                Comment


                • 1. It is slightly unusual not to specify the exogenous variables for the first-differenced model as well. I would add:
                  gmm(TDtoTA SalesGrowth lnTA CFOtoSales CapitalIntensity l.EnvDisclScore, lag(1 3) collapse)

                  2. The diff option and the model(diff) option do different things. diff first-differences the instruments, while model(diff) specifies that these are instruments for the first-differenced regressors.

                  3. There is no automatic differencing. You need to specify explicitly what you want the command to do.
                  https://twitter.com/Kripfganz

                  Comment


                  • Hi Sebastian,

                    I hope you are doing well. Sebastian, I have few concerns regarding system gmm and marginal plots contrast effects and their interpretation. I used the following model:

                    Di;t = β1Di;t−1 + β2D2i;t−1 + β3FCR + β4Crisis + β5ΔCAi;t + β6ΔCAi;t−1 + β7CSi;t + β8CSi;t−1 + β9SIi;t + β10SIi;t−1 + β11Li;t + β12Li;t−1 + β13Bi;t-1 + β14Gi;t + β15FA + β16EZ + β17(FCR * ΔCAi;t) + β18(FCR * Crisis * ΔCAi;t) + β19(FCR * FA* ΔCAi;t) + β20(FCR * EZ * ΔCAi;t)+ αj + dt j;t.

                    Here, FCR, CRISIS, FA & EZ are factor/ binary variables while rest are continuous. Following literature, I treated all financial variables to be potentially endogenous in nature. Furthermore, to resolve endogeneity concerns, I used t-2 to t-5 lagged levels for equation in differences as both the overidentification and serial correlation were satisfied at the 5th lag. For the equation in levels, I used lagged differences dated t-1 as instruments. Also, both the firm fixed and time effects are included.

                    System GMM command

                    xtdpdgmm L(0/1).D D^2 I.FCR I.CRISIS CA CA_L1 CS CS_L1 SI SI_L1 L L_L1 B G FA EZ I.FCR#CA I.FCR#C.CA#I.CRISIS I.FCR#I.FA#C.CA I.FCR#I.EZ#C.CA , model(diff) collapse gmm(D_L1 CA CS CS_L1 SI L B G FCR FA , lag(2 5)) gmm(D^2 EZ I.FCR#CA I.FCR#C.CA#I.CRISIS I.CRISIS I.FCR#I.FA#C.CA I.FCR#I.EZ#C.CA , lag(1 2)) gmm(D_L1 CA CS CS_L1 SI L B G FCR FA , lag(1 1) diff model(level)) gmm(D^2 EZ I.FCR#CA I.FCR#C.CA#I.CRISIS I.CRISIS I.FCR#I.FA#C.CA I.FCR#I.EZ#C.CA, lag(0 1) diff model(level)) teffects two vce(r)

                    Can you please advise if the system GMM command is in accordance with above description of model.


                    Second, I have few ambiguities regarding marginal plots and contrast command. First, I ran marginal plots command for an interaction term of two factor (FCR& CRISIS) and 1 continuous variable (CA) which leads to following result:

                    margins fcr, at (ca = (-2 (0.5) 2) crisis=(0 1))
                    marginsplot, recast(line) x(ca) by(crisis) yline(0)

                    1ST CRISIS EFFECTS.gph

                    Because the confidence intervals of fcr are overlapping, does that mean that there is no significant difference between the two categories of fcr in terms of the effect of ca on D (dependent variable)?? Also, am I right to interpret that for crisis=0, three lines of fcr=1 above the red line are statistically significant and different from zero and the same is true for crisis =1 while for fcr= 1, 6 lines out of 9 show statiscally significant results and same goes for crisis=1. So, there is no effect of crisis?? and there is no statical difference between fcr=0 & 1 as confidence intervals are overlapping?

                    I also run the contrast command to compare the effects and I took fcr=1 as the reference group: Command is as follows:

                    margins rb1.fcr, at (ca = (-2 (0.5) 2) crisis=(0 1))
                    marginsplot, recast(line) x(ca) by(crisis) yline(0)

                    Results show the following effects:

                    fcr 1 CRISIS EFFECTS.gph

                    Further, I also run contrast command with fcr= 0 as the reference category

                    margins r.fcr, at (ca = (-2 (0.5) 2) crisis=(0 1))
                    marginsplot, recast(line) x(ca) by(crisis) yline(0)

                    fcr 0 CRISIS EFFECTS.gph

                    Can you please advise if I am doing it right and whether the interpretation is correct?

                    Second, when I ran system GMM, the results show a negative relationship between ca and D but when an interaction term is run between ca#fcr#crisis, the sign is positive. So, given the graphs ( fcr 1 CRISIS EFFECTS.gph fcr 0 CRISIS EFFECTS.gph) how do I interpret these 2 opposite direction FCR graphs?

                    (fcr & crisis are dummy/factor variables while ca is continuous variables).

                    Results of GMM are attached for reference: Generalized method of moments estimation.pdf

                    Please advise on interpretation of results and overlapping CIs .
                    Attached Files
                    Last edited by Zeenat Murtaza; 01 Sep 2023, 17:34.

                    Comment


                    • Dear Prof. Sebastian Kripfganz,

                      Thank you very much for your previous response. I am using xtdpdgmm and the gmm technique for the first time and I would really appreciate your help in understanding if I am doing it correctly.

                      In my model, I consider EPS and ROAA as endogenous. The predetermined variables are - l.ROAA l.EPS TDtoTA SalesGrowth lnTA CFOtoSales CapitalIntensity l.EnvDisclScore. And, the exogenous variables include the time dummies.

                      I intend to incorporate fixed effects by first differences transformation. And, I wish to take care of the correlation between the error and the lagged EPS by considering their lags as their instruments.

                      I am using the following code:

                      xtdpdgmm L(0/1).EPS L(0/1).ROAA TDtoTA SalesGrowth lnTA l.EnvDisclScore i.t, model(diff) collapse gmm(EPS ROAA, lag(2 3)) gmm(EPS ROAA, lag(1 1) diff model(level)) gmm(l.ROAA l.EPS TDtoTA SalesGrowth lnTA CFOtoSales CapitalIntensity l.EnvDisclScore, lag(1 3)) gmm(l.ROAA l.EPS TDtoTA SalesGrowth lnTA CFOtoSales CapitalIntensity l.EnvDisclScore i.t, lag(0 0) diff model(level)) iv(i.t) two vce(cluster id) small overid

                      And I obtain the following output:
                      WC-Robust
                      EPS Coefficient std. err. t P>t [95% conf. interval]
                      EPS
                      L1. 1.009836 0.0749899 13.47 0 0.8620847 1.157588
                      ROAA 0.2411051 0.1097758 2.2 0.029 0.0248154 0.4573949
                      L1.ROAA 0.0063018 0.0585874 0.11 0.914 -0.1091322 0.1217357
                      TDtoTA 0.0151328 0.0461458 0.33 0.743 -0.0757876 0.1060532
                      SalesGrowth -0.0158857 0.0163982 -0.97 0.334 -0.0481948 0.0164234
                      lnTA 0.9796653 1.435696 0.68 0.496 -1.849068 3.808398
                      EnvDisclScore
                      L1. 0.0485014 0.0524756 0.92 0.356 -0.0548905 0.1518933
                      t
                      2011 0 (empty)
                      2012 1.378857 1.268998 1.09 0.278 -1.121434 3.879147
                      2013 -0.3298332 1.159062 -0.28 0.776 -2.613518 1.953851
                      2014 -1.774592 1.015847 -1.75 0.082 -3.776101 0.2269177
                      2015 0.5190836 1.136322 0.46 0.648 -1.719796 2.757963
                      2016 0.6554844 0.9437869 0.69 0.488 -1.204046 2.515015
                      2017 -0.2927053 0.8384453 -0.35 0.727 -1.944683 1.359272
                      2018 -2.61427 1.03919 -2.52 0.013 -4.661772 -0.5667682
                      2019 -0.5594183 0.8990458 -0.62 0.534 -2.330796 1.21196
                      2020 0 (omitted)
                      _cons -11.50282 14.16942 -0.81 0.418 -39.42064 16.415
                      Instruments corresponding to the linear moment conditions:
                      1, model(diff):
                      L2.EPS L3.EPS L2.ROAA L3.ROAA
                      2, model(level):
                      L1.D.EPS L1.D.ROAA
                      3, model(diff):
                      L3.L.ROAA L3.L.EPS L1.TDtoTA L2.TDtoTA L3.TDtoTA L1.SalesGrowth
                      L2.SalesGrowth L3.SalesGrowth L1.lnTA L2.lnTA L3.lnTA L1.CFOtoSales
                      L2.CFOtoSales L3.CFOtoSales L1.CapitalIntensity L2.CapitalIntensity
                      L3.CapitalIntensity L1.L.EnvDisclScore L2.L.EnvDisclScore
                      L3.L.EnvDisclScore
                      4, model(level):
                      D.TDtoTA D.SalesGrowth D.lnTA D.CFOtoSales D.CapitalIntensity
                      D.L.EnvDisclScore
                      5, model(diff):
                      2013bn.t 2014.t 2015.t 2016.t 2017.t 2018.t 2019.t 2020.t
                      6, model(level):
                      _cons

                      I just want to make sure if the code above does what I want to do. I have the following doubts, I would greatly appreciate your help on these:
                      1. Does the model above specify the endogenous, predetermined and exogeneous variables correctly?
                      2. I have introduced ‘CFOtoSales’ and ‘CapitalIntensity’ variables in my set of instruments as external instruments as I wish to instrument for ROAA (which I consider endogenous). Am I correct in doing that?
                      3. When I use iv(i.t) – am I right in thinking that i.t acts as an instrument for all variables?
                      4. Is my model first differenced?
                      5. The text below the table shows that the lagged values of the endogenous variables act as instruments of the differenced model and that the level model has differenced variables as instruments (which I have specified) - is that the correct way to go about it?
                      Apart from the above questions, Could you please help me understand what is the benefit of specifying instruments for both level and difference models?

                      Thank you very much in advance!

                      Kind regards,
                      Rupali

                      Comment


                      • Dear Sebastian Kripfganz,

                        I have a question regarding the Sequential model selection process you presented at the 2019 London Stata Conference.

                        In step 3, you recommend to "remove lags or interaction effects with (very) high p-values in individual or joint significance tests". What if, after removing all lags of one regressor due to their high individual p-values, the remaining contemporaneous effect of that regressor still has a high p-value? Should it then be entirely removed? If yes, what should be done with the respective IVs?

                        I really appreciate any help you can provide.

                        Comment


                        • Zeenat, Rupali: Apologies; I do not currently have the time to respond to longer queries.

                          Jupp: If this regressor is not one of your main regressors of interest, you could remove it entirely if all coefficients have high p-values. In principle, you could retain the respective instruments if you believe that they are strong instruments for the other regressors; otherwise, you can remove them as well.
                          https://twitter.com/Kripfganz

                          Comment


                          • Dear Sebastian Kripfganz,

                            thank you so much for your help. I have one more question that I hope may add to the collection of advice you have already given in this thread.

                            I struggle to find a model specification. This is a summary of my sample:
                            • N = ~25.000 households
                            • T = 12 years
                            • ~150.000 observations
                            • Unbalanced panel:
                              • Minimum observations per household: 1
                              • Maximum observations per household: 12
                            The goal is to estimate the relationship between consumption of a good (dependent variable) and (1) price of this good, (2) household income, and (3) various household covariates such as household size, employment status of household head, dwelling type, ... + federal state and time dummies. Some of the household covariates are binary or categorical variables. I am also interested in the interaction effect of price and household income.

                            I tested fixed and random effects models as benchmarks with the following specifications:
                            Code:
                            L(0/1)y c.x_price##c.x_income X_covariates time_effects
                            I would like to have the same combinations of regressors in the GMM model to compare the results with the fixed and random effects models.

                            I started the model selection process. However, it seems impractical in my case as a single estimation run lasts ~20 minutes. Also, I struggle to find an initial candidate model that passes the specification tests. I assume this is due to the very large N and that the specification tests can already detect relatively small deviations from the null hypotheses.

                            My questions are:
                            • Do you have any advice on how to configure IV lags and how to specify the variable types (endogenous/predetermined/exogenous) for the different variables?
                            • Would you agree that I should use forward orthogonal deviations because of the unbalanced panel?
                            • Would you agree that I should use the two-step estimator because of the large sample size?
                            I really appreciate any help you can provide.

                            Jupp

                            Comment


                            • Originally posted by Jupp Peters View Post
                              Also, I struggle to find an initial candidate model that passes the specification tests. I assume this is due to the very large N and that the specification tests can already detect relatively small deviations from the null hypotheses.
                              This would normally be my answer, yes.

                              If statistical tests do not seem to be helpful, you might have to resort to economic theory as a guide for model specification / variable classification.

                              Another possibility might be to consider smaller subsets of the data where the Hansen test has less power. This might sound odd, but if the Hansen test does not detect small deviations from the null hypothesis anymore, it might be helpful to single out models with more severe model specifications.

                              Forward-orthogonal deviations indeed seem to be appropriate given the unbalanced nature of the panel.

                              With such a large sample size, the two-step estimator might deliver substantial efficiency gains and is therefore highly recommended.
                              https://twitter.com/Kripfganz

                              Comment


                              • Sebastian Kripfganz Dear Sebastian, I have a question to you about Sargan-Hansen test. How should I interpret diverged 2-step and 3-step weighting matrix results like statistics below?


                                2-step moment functions, 2-step weighting matrix chi2(2) = 3.1408
                                Prob > chi2 = 0.2080

                                2-step moment functions, 3-step weighting matrix chi2(2) = 11.5937
                                Prob > chi2 = 0.0030

                                Thanks in advance!

                                Best regards,
                                Nursena

                                Comment

                                Working...
                                X