Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • If the variables w k are strictly exogenous (with respect to the idiosyncratic error component), then any serial error correlation does not affect the validity of them (or any of their lags) as instruments. If there is serial error correlation due to the omission of relevant lags of w k as regressors, e.g. due to delayed direct effects of L2.(w k), then w k would not be strictly exogenous in the first place in a model with those omitted lags. Thus, saying that w k are strictly exogenous effectively is also a statement about the correct specification of the model dynamics.

    In this regard, I wonder what your motivation is for including L.(w k) as regressors instead of w k. Sometimes, people do this to avert simultaneous feedback from the dependent variable. In that case, however, L.(w k) may not be endogenous any more, but they cannot be strictly exogenous either. At best, they would be predetermined (weakly exogenous). For predetermined variables, serial error correlation does matter for the validity of the instruments. Probably even more important, simply lagging the regressors for this argument typically creates model misspecification, which then puts the whole analysis in jeopardy.
    https://twitter.com/Kripfganz

    Comment


    • Dear Professor @Sebastian Kripfganz


      Understood. This is helpful. Thank you!

      Comment


      • Dear Sebastian,

        I am going to use a Micro dataset for an upcoming study. However, this dataset consists of random samples for each year, Essentially, it's a pooled dataset rather than panel data. Moreover, I suspect an endogeneity issue between the dependent and independent variables in the model I'm aiming to estimate. Additionally, the dataset encompasses roughly 100,000 units per year, spanning across seven years.

        Given that the xtdpdgmm command is designed for linear (dynamic) panel data, do you recommend it for analyzing a pooled dataset?

        Comment


        • What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.
          https://twitter.com/Kripfganz

          Comment


          • What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.
            Thank you very much for the quick response.

            Comment


            • Dear Prof. Sebastian Kripfganz

              1) Can we use the sys-gmm with a sample that has 28 countries and 20 years? Is this considered a big T or can we still use the sys-GMM?
              2) When we define the lagged dependent variable as a predetermined variable, the estimated coefficient of this variable is 0.542. However, when we specify the variable as an endogenous, its magnitude becomes .745. Does the magnitude of the lagged dependent variable have to be close to 1?

              Could you please guide us on these two points.


              Thanks

              Comment


                1. I would call this a small N, moderately small T sample. You probably do not need to be concerned much with asymptotic efficiency; it might thus be a good idea to use the one-step insted of the two-step estimator, to avoid estimating the weighting matrix. Also, use the available options (collapsing and lag restrictions) to limit the number of instruments. You could still use the system GMM estimator if you can theoretically justify its assumptions. With such a data set, testing these assumptions empirically is challenging and probably not very reliable.
                2. From the outset, we do not know what the true value of the coefficient of the lagged dependent variable is; that is why we are estimating it. There can be different reasons for the observed differences: (i) sampling variability due to the small data set; (ii) endogeneity of the lagged dependent variable (due to neglected serial correlation in the error term) such that the model treating it as predetermined is misspecified; (iii) weak instruments when treating the lagged dependent variable as endogenous, to name a few.
                https://twitter.com/Kripfganz

                Comment


                • Dear Prof. Sebastian Kripfganz

                  Thanks for your constructive replies.

                  1. Are there any issues if we restrict our sample to 28 countries and 13 years? We use a one-step system GMM estimator to estimate our model with this sample. Could you please let us know if we still have any issues with this setup?
                  2. Given this sample, can we use the diff-GMM for robustness checks? or would you recommend another estimator for robustness?

                  Comment


                    1. N=28 is still small; therefore, my previous comments still apply.
                    2. Yes, you can (and probably should) use a diff-GMM estimator as a robustness check (again, preferably one-step only).
                    https://twitter.com/Kripfganz

                    Comment


                    • Dear Prof. Sebastian Kripfganz

                      Thanks for your constructive replies.

                      Does the specification of the system GMM have to be the same as the specification of the Diff-GMM? For example, if we use lags(1 3) in the system GMM, do we have to specify the same range of lags in the Diff-GMM? or can the two estimators have different specifications for the range of lags?





                      Comment


                      • The lags for those instrument that refer to the first-differenced model should be the same for the two estimators; otherwise the results become less easy to compare.
                        https://twitter.com/Kripfganz

                        Comment


                        • A new update is available for xtdpdgmm on my personal website. Version 2.6.6 fixes a few bugs in the postestimation command estat serialpm.

                          Code:
                          net install xtdpdgmm, from(https://www.kripfganz.de/stata) replace
                          https://twitter.com/Kripfganz

                          Comment


                          • Dear sebastian, I have some cross-sectional (categorical) data collected from a questionnaire in 2021, which are integrated into a longitudinal dataset collected at different points in time for a period of 6 years, from 2015 to 2020. Knowing that the sample is the same for both data collection method, and my categorical data (institutional support, corporate governance) are dynamic, not static, I want to know if integrating them into my panel data is feasible.

                            Comment


                            • Dear Professor @Sebastian Kripfganz


                              I have a quick question regarding a difference gmm model. The output below comes after successfully reproducing the results for the difference gmm model in xtdpdgmm with xtivreg2 in order to access the instrument diagnostics available for the latter. In general, the diagnostics look fine. The Arellano-Bond autocorrelation test of the residuals look fine as well-- statistically significant ar(1) but statistically insignificant for higher-order autocorrelation in residuals. However, both statistics for the Weak identification test look quite low in magnitude and to complicate things, the "Stock-Yogo weak ID test critical values" are <not available>. My questions are:

                              1) Is this a matter for concern given the low values of the statistics for the Weak identification test?
                              2) Is there anything to be done to obtain valid "Stock-Yogo weak ID test critical values"?
                              3) Do you find these diagnostics concerning?
                              4) Is there anything to be done at all?

                              Thank you in advance!

                              HTML Code:
                              Underidentification test (Kleibergen-Paap rk LM statistic):             98.401
                                                                                 Chi-sq(14) P-val =   0.0000
                              ------------------------------------------------------------------------------
                              Weak identification test (Cragg-Donald Wald F statistic):                1.345
                                                       (Kleibergen-Paap rk Wald F statistic):          1.879
                              Stock-Yogo weak ID test critical values:                       <not available>
                              ------------------------------------------------------------------------------
                              Hansen J statistic (overidentification test of all instruments):        15.589
                                                                                 Chi-sq(12) P-val =   0.1780
                              -endog- option:
                              Endogeneity test of endogenous regressors:                              17.543
                                                                                 Chi-sq(3) P-val =    0.0004
                              Last edited by Arkangel Cordero; 09 Mar 2024, 18:30.

                              Comment


                              • Dear Professor @Sebastian Kripfganz

                                As a follow up, I ran the "weakiv" test after ivreg2 (ssc install weakiv) and obtained the diagnostics below for the same model. Can I conclude that the instruments are strong enough despite the low magnitude of the Weak identification test statistics that come by default in ivreg2?

                                HTML Code:
                                ----------------------------------------
                                 Test |       Statistic         p-value
                                ------+---------------------------------
                                  CLR | stat(.)   =   137.16     0.0000
                                    K | chi2(32)  =    99.89     0.0000
                                    J | chi2(13)  =    42.29     0.0000
                                  K-J |        <n.a.>            0.0000
                                   AR | chi2(44)  =   142.18     0.0000
                                ------+---------------------------------
                                 Wald | chi2(32)  =   146.58     0.0000
                                ----------------------------------------

                                Comment

                                Working...
                                X