Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The R² for IV/2SLS/GMM regressions is of limited to no use. See for example the following Stata FAQ:
    https://www.stata.com/support/faqs/s...least-squares/
    the R2 really has no statistical meaning in the context of 2SLS/IV
    For the random-effects model, please see the Remarks and Examples section in the Stata Manual entry for xtreg.
    https://twitter.com/Kripfganz

    Comment


    • Hi Sebastian,
      I'm having problems when conducting the Arellano-Bond test for autocorrelation.

      First, I go with:

      Code:
      xtdpdgmm lead_zjsat_6items  L.lead_jsat_6items i.lead_vol1##c.log_leaving i.wave , gmmiv(L.lead_jsat_6items , collapse) iv(i.wave) vce(robust) overid
      
      Group variable: id                           Number of obs         =     91850
      Time variable: wave                          Number of groups      =     15287
      
      Moment conditions:     linear =      28      Obs per group:    min =         1
                          nonlinear =       0                        avg =  6.008373
                              total =      28                        max =        14
      
                                          (Std. Err. adjusted for 15,287 clusters in id)
      ----------------------------------------------------------------------------------
                       |               Robust
      lead_zjsat_6it~s |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      lead_jsat_6items |
                   L1. |   .3542155   .1051563     3.37   0.001     .1481129    .5603181
                       |
           1.lead_vol1 |  -7.993102   4.104824    -1.95   0.052    -16.03841    .0522057
           log_leaving |  -.7183377    .343703    -2.09   0.037    -1.391983   -.0446922
                       |
             lead_vol1#|
         c.log_leaving |
                    1  |   3.885735   .7894465     4.92   0.000     2.338448    5.433021
                       |
                  wave |
                    1  |          0  (empty)
                    2  |   .0678619    .049292     1.38   0.169    -.0287486    .1644724
                    3  |   .0074792   .0487522     0.15   0.878    -.0880733    .1030317
                    4  |          0  (omitted)
                    5  |   .0847805   .0469901     1.80   0.071    -.0073183    .1768793
                    6  |   .0679861   .0464155     1.46   0.143    -.0229866    .1589588
                    7  |   .0552715   .0479401     1.15   0.249    -.0386893    .1492324
                    8  |   .1307863   .1192254     1.10   0.273    -.1028912    .3644638
                    9  |   .0752563    .076418     0.98   0.325    -.0745202    .2250328
                   10  |   .0612896   .0672975     0.91   0.362    -.0706111    .1931904
                   11  |   .0365584   .0620174     0.59   0.556    -.0849935    .1581103
                   12  |    .102438   .0816722     1.25   0.210    -.0576366    .2625126
                   13  |   .1125361   .0718052     1.57   0.117    -.0281995    .2532717
                   14  |   .0727399   .0759155     0.96   0.338    -.0760518    .2215315
                   15  |   .1186052   .0606086     1.96   0.050    -.0001854    .2373958
                   16  |          0  (empty)
                       |
                 _cons |   -1.87803   1.056312    -1.78   0.075    -3.948363    .1923031
      ----------------------------------------------------------------------------------
      Instruments corresponding to the linear moment conditions:
       1, model(level):
         L.lead_jsat_6items L1.L.lead_jsat_6items L2.L.lead_jsat_6items
         L3.L.lead_jsat_6items L4.L.lead_jsat_6items L5.L.lead_jsat_6items
         L6.L.lead_jsat_6items L7.L.lead_jsat_6items L8.L.lead_jsat_6items
         L9.L.lead_jsat_6items L10.L.lead_jsat_6items L11.L.lead_jsat_6items
         L12.L.lead_jsat_6items L13.L.lead_jsat_6items
       2, model(level):
         3bn.wave 4.wave 5.wave 6.wave 7.wave 8.wave 9.wave 10.wave 11.wave 12.wave
         13.wave 14.wave 15.wave
       3, model(level):
         _cons
      But when I try the test I get:

      Code:
      estat serial, ar(1/3)
      
      Arellano-Bond test for autocorrelation of the first-differenced residuals
      D.0b:  operator invalid
      r(198);
      Am I doing something wrong?

      Thanks.
      Ed


      Comment


      • Edgar Kausel
        There was a bug in estat serial that (I thought) I fixed with the latest update. Could you please tell me which version of xtdpdgmm you are using? You can find your version by typing the following in Stata's command window:
        Code:
        which xtdpdgmm
        If you do not have version 2.2.7, please update to the latest version which should hopefully solve your problem:
        Code:
        adoupdate xtdpdgmm, update
        https://twitter.com/Kripfganz

        Comment


        • Hi Sebastian and Statalisters,

          I am using xtdpdgmm command to run system gmm but I am getting this error r(2000). It says that "You have requested some statistical calculation and there are no observations on which to perform it. Perhaps you specified if or in and inadvertently filtered all the data."

          N is 45 and T is 10.

          The command is xtdpdgmm dv l.dv iv1 iv2 iv3 iv4 , twostep vce(cluster id) teffects gmmiv(l.dv, lag(1 2) collapse model(fodev)) gmmiv(iv1 , lag(1 2) collapse model(fodev)) gmmiv(iv2 , lag(1 2) collapse model(fodev)) gmmiv(iv3 , lag(1 2) collapse model(fodev)) gmmiv(iv4 , lag(0 0) collapse model(level)) nofootnote

          Please help.

          Comment


          • r(2000) is a "no observations" error. One reason might be that you did not properly xtset your data. For example, if your time periods are more than 1 time unit (e.g. year) apart, then you need to specify this with the delta() option of xtset. Another reason might be that you have many gaps (missing values) in your data set such that you do not have 3 consecutive time periods. Can you share with us the output from the following command?
            Code:
            xtdescribe
            https://twitter.com/Kripfganz

            Comment


            • Sebastian Kripfganz

              That was it. I was using version 2.2.0. Thanks!

              Ed

              Comment


              • Hi,

                In order to estimate dynamic panels accurately, I read the paper titled "Microeconometric dynamic panel data methods: Model specification and selection issues" by Jan F. Kiviet. Concerning this paper, I have the following doubts:

                1. The author repeatedly reiterates in his paper that as long as Arellano–Bond results are unsatisfactory, applying Blundell–Bond does not make sense. So how does one make a choice between Arellano-Bond's difference GMM and Blundell-Bond system GMM? Is there a criteria for the same? In this regard, the author also talks about the concept of effect-stationarity and effect non-stationarity. What do these concepts imply?

                2. The author states: "When the errors of the level equation are serially uncorrelated indeed, those of the first-differenced equation have negative first-order serial correlation of moving average form, with a first-order serial correlation coefficient −0.5 and zero second and higher-order serial correlation coefficients". How is this exact figure of -0.5 derived? Also, how is the author so sure about zero second and higher-order serial correlation coefficients? Is there a mathematical proof for the same?

                3. The author states: "lags of exogenous regressors will establish strong and valid instruments for any non-exogenous regressors, especially for regressors affected by immediate or lagged feedbacks from the dependent variable, in particular the lagged dependent regressor variables themselves." However, I thought a particular variable's lags/lead can serve as instruments for the same variable only. How come lags of exogenous variables serve as valid instruments of non-exogenous regressors?

                4. The author states: "Anyhow, if at least twice lagged regressors turn out to be invalid instruments this implies that the regression equation has not yet been specified adequately and requires additional explanatories". I could not understand author's point here. Is he saying that if lag(1 2) turn out to be invalid instrument (as indicated by the difference-in-Hansen test), we should include more lags of the variable as regressors in the model?

                5. The author states: "an exogenous regressor is predetermined, but a predetermined regressors is usually not exogenous". I could not understand how an exogenous regressor is predetermined?

                6. The author writes: "This finding instigates to start our model specification search by including at least one lag of all regressors, because validity of internal instruments constructed from lagged not strictly exogenous regressors requires white-noise disturbances, and obtaining white noise disturbances is promoted by using sufficiently large orders of all lag polynomials." So should we include at least one lag of all independent variables as regressor?

                7. The author states: "one could move on to stage 4, or first verify whether any of the coefficients for the longest lag of a variable x(m) or of yi,t has a t-value below 0.5, say, or a p-value above 0.6 or 0.7, say. If so, impose the least significant one of them to be zero, re-estimate the model, and repeat the same procedure until the coefficients of all longest lags have absolute t -values (well) above 0.5, and the m1, m2, J and incJ tests still produce satisfactory results." I could not understand what the author is trying to convey here.

                8. The author states: "Useful additional evidence can be produced by also testing the joint significance of groups of single coefficient restrictions already imposed on the MSM and verifying whether the p -value is high indeed. Such joint significance tests can also be obtained by using the “test” option." Again, I could not understand author's viewpoint here.

                9. I suppose we should always use the two-step estimator. Is this correct?

                Thanks and Regards

                Comment


                  1. The Blundell/Bond system GMM estimator extends the Arellano/Bond difference GMM estimator by adding further moment conditions (i.e. instruments). If some of the instruments for the difference GMM estimator are invalid, they will still be invalid if you add further instruments. With xtdpdgmm you could use the overid option and then the estat overid, difference postestimation command after the system GMM estimation. The last line in the test output that starts with model(level) can be used to make the desired assessment. If the test in the column headed "Excluded" does not reject the null hypothesis, then the difference GMM estimator is fine and you can use the column headed "Difference" to test the additional instruments used for the system GMM estimator. If the test in column headed "Excluded" rejects the null hypothesis, then the difference GMM estimator is misspecified and the corresponding "Difference" test becomes useless.
                  2. Given homoskedasticity and no serial correlation of the idiosyncratic error term \(e_{it}\), this is a simple algebraic relationship: \(Corr(\Delta e_{it}, \Delta e_{i,t-1}) = Corr(e_{it}-e_{i,t-1}, e_{i,t-1}-e_{i,t-2}) = -Var(e_{it}) / Var(\Delta e_{it}) = -Var(e_{it}) / (2 Var(e_{it})) = -1/2\). Similarly, all higher-order correlations are zero because of the non-overlapping time periods in the numerator.
                  3. There is no mapping of specific instruments to specific regressors. All instruments instrument all regressors. It is reasonable to believe that lags of a specific regressor have particularly strong predictive power for that specific regressor but that does not exclude the possibility that they may also have predictive power for other regressors. In fact, if a regressor is a predictor of the dependent variable, then it is reasonable to believe that the lags of such a regressor are also good predictors for the lagged dependent variable.
                  4. If you assume that a variable is endogenous, you could use lags(2 .) as instruments if the model is correctly specified. If the difference-in-Hansen test rejects those instruments, then this is evidence that there is still some misspecification present. This could be omitted variables such as omitted dynamics in the form of lags of the regressors, or omitted interaction terms.
                  5. In the terminology of (strictly) exogenous, predetermined, and endogenous regressors, all instruments (lags) that are valid for a predetermined variable are also valid for a strictly exogenous variable, but not the other way round.
                  6. You want to start your specification search with a model that is correctly specified such that the estimation is consistent (although possibly inefficient). Otherwise, your difference-in-Hansen test might compare two misspecified models with each other which would not be a meaningful comparison; see point 1 above. The more lags of the regressors you include in the regressor list, the less likely it is that there will still be serial correlation in the error term which might invalidate some of the instruments.
                  7. This is a suggestion for a model specification algorithm. Essentially the idea is to start with a possibly overspecified model (that yields consistent estimation) and then to remove some of the lagged regressors if their coefficients are statistically insignificant and the model specification tests still not reject the model after you removed those regressors. Jan Kiviet promotes a conservative view on the use of p-values, i.e. to use p-values as threshold that are much higher than 0.05 to make sure that you are on the safe side.
                  8. Instead of just testing for the significance of a single coefficient, you could also use joint significance tests for multiple coefficients in your specification search.
                  9. I would say that there are at least 2 situations where a one-step estimator is justified: (i) if you are using the difference GMM estimator with the added homoskedasticity assumption such that the one-step weighting matrix is already optimal (which is strong assumption and instead of imposing it you might just run the two-step estimator to let the data speak for itself); (ii) if your estimation sample is relatively small because the efficient estimation of the optimal weighting matrix requires a large number of groups. Both the one-step and the two-step estimator are consistent estimators but in general the two-step estimator is efficient while the one-step estimator may not be efficient. However, keep in mind that efficiency is an asymptotic concent. When your sample is very small, the finite-sample properties might be very different and the estimation of the optimal weighting matrix might lack robustness.
                  https://twitter.com/Kripfganz

                  Comment


                  • Dear Prof. Kripfganz,

                    Your responses are enlightening as always! I got to know some completely new things which I never thought of. Thank you so very much! I have some follow-up queries:

                    1.I have the following output for the difference-in-Hansen test for my model. Do you think I should stick to system-GMM or switch to difference-GMM?

                    Code:
                    2-step weighting matrix from full model
                    
                                      | Excluding                   | Difference                  
                    Moment conditions |       chi2     df         p |        chi2     df         p
                    ------------------+-----------------------------+-----------------------------
                      1, model(fodev) |    94.4909    106    0.7808 |      0.0107      1    0.9175
                      2, model(fodev) |    94.2920    106    0.7851 |      0.2096      1    0.6471
                      3, model(fodev) |    94.4466    106    0.7817 |      0.0550      1    0.8146
                      4, model(fodev) |    93.8516    104    0.7522 |      0.6500      3    0.8849
                      5, model(fodev) |     0.5946      2    0.7428 |     93.9070    105    0.7727
                      6, model(fodev) |    94.2131    105    0.7658 |      0.2885      2    0.8657
                      7, model(level) |    92.4271    106    0.8235 |      2.0745      1    0.1498
                      8, model(fodev) |    94.1725    106    0.7877 |      0.3291      1    0.5662
                      9, model(fodev) |    94.4090    106    0.7826 |      0.0926      1    0.7610
                     10, model(level) |    83.8859     93    0.7396 |     10.6156     14    0.7159
                    Moreover, are there any issues if we apply system-GMM estimator when difference-GMM estimator is sufficient for a model?

                    2. How should we check for heteroscedasticity and serial correlation for our model?

                    3. How do we check for joint significance tests for multiple coefficients in our model?

                    Thank you!

                    Comment


                      1. If anything, then only the moment conditions number 7 might be slightly worrying. All the other p-values are definitely fine. A system GMM estimator would produce more efficient / more precise estimates than a difference GMM estimator, at the added risk that it might be stronger biased if the extra instruments are weak or invalid.
                      2. To check for serial correlation, use the estat serial postestimation command. Out of the top of my head, I am not aware of an easily applicable command for heteroskedasticity testing of the residuals. The only feasible option that comes to my mind is utilizing the nonlinear moment conditions nl(noserial) and nl(iid), where the latter make the additional homoskedasticity assumption, and then to use a generalized Hausman test with estat hausman to test the additional moment restrictions imposed by nl(iid) compared to nl(noserial). See slides 63 to 65 of my 2019 London Stata Conference presentation.
                      3. You can simply use the test command as you would do after any other estimation command.
                      https://twitter.com/Kripfganz

                      Comment


                      • As enquired here: https://www.statalist.org/forums/for...2-and-xtdpdgmm, I wonder why I cannot use the estat serial command after xtdpdgmm. I get an error: (r5), not sorted. Would really appreciate if someone could guide me on this and the other question in that post. Thank you.

                        Comment


                        • Kristian Szakali
                          Could you please check whether you have the latest version of xtdpdgmm, which should be 2.2.7. If you do not have the latest version, please update it and try your code again:
                          Code:
                          adoupdate xtdpdgmm, update
                          If you still get the same error message with the latest version, would it be possible for you to send me your data set per e-mail? Otherwise it is difficult to replicate the issue.
                          https://twitter.com/Kripfganz

                          Comment


                          • Originally posted by Kristian Szakali View Post
                            As enquired here: https://www.statalist.org/forums/for...2-and-xtdpdgmm, I wonder why I cannot use the estat serial command after xtdpdgmm. I get an error: (r5), not sorted. Would really appreciate if someone could guide me on this and the other question in that post. Thank you.
                            Please see my response #2 in the topic you have linked.
                            https://twitter.com/Kripfganz

                            Comment


                            • There is another update of the xtdpdgmm package to version 2.3.0 available on my personal website.
                              Code:
                              net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
                              This version fixes the problem reported in #221 above, and it adds a new feature for the estimation with nonlinear moment conditions:

                              Under the assumption of a serially uncorrelated idiosyncratic error term \(u_{it}\), the option nl(noserial) incorporates the following nonlinear moment conditions:
                              \[E[(\alpha_i+u_{iT}) \Delta u_{it}] = 0\]
                              for t=1,2,...,T-1.

                              So far, that is nothing new (see slide 58 of my 2019 London Stata Conference presentation). If we suspect first-order serial correlation of \(u_{it}\), we could still obtain valid nonlinear moment conditions by restricting them to the observations t=1,2,...,T-2. If there is second-order serial correlation, change the upper limit to T-3. This can be achieved with a new lag() suboption, e.g. when we suspect first-order serial correlation we could specify
                              Code:
                              nl(noserial, lag(2))
                              When you just specify nl(noserial) without the suboption, the default is lag(1), i.e. no serial correlation. I am grateful to Professor Seung Ahn for proposing this additional feature.
                              Last edited by Sebastian Kripfganz; 26 Aug 2020, 07:55.
                              https://twitter.com/Kripfganz

                              Comment


                              • Dear Sebastian,

                                I am using xtdpdgmm for my research. As far as i know xtdpdgmm (and gmm estimation in general) does not account for cross sectional dependency.

                                My data (maybe most of the panel data) suffers from cross sectional dependency and i tried to use extra variables to capture time varying common factors across cross sections to eliminate this problem. But later i realised that we are already using time dummies as regressors with the teffects option (or manually). Since (strong) cross-sectional dependence arise from time varying common shocks, aren't we eliminating it by adding year dummies as regressors? Will any other extra variables to capture time varying common shocks other than time dummies be redundant in this occasion?

                                Thanks in advance.

                                Comment

                                Working...
                                X