Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Within-between estimator with a lagged dependent variable

    Hey,

    I have some CSTS data.
    I wish to use the "within-between" random effects estimator (Bell and Jones 2015) to analyze it instead of the more familiar two-way fixed effects.
    A common practice in the field I'm working in is to use a lagged dependent variable.
    I am not sure, however, if one can add a lagged dependent variable when using the within-between estimator? And if so, how: should one add a "regular" variable, or one needs to "split" it to the between and the within variables, like the other variables?

    Many thanks,
    Dan

  • #2
    I wish to use the "within-between" random effects estimator (Bell and Jones 2015) to analyze it instead of the more familiar two-way fixed effects.

    The random effects (RE) estimator existed way before 2015. For references, see Balestra and Nerlove (1966) and Theil and Goldberger (1961). Unfortunately, you cannot just "choose" to use the RE estimator, it relies on very strong assumptions. A Hausman test will tell you whether it is consistent.


    A common practice in the field I'm working in is to use a lagged dependent variable.
    I am not sure, however, if one can add a lagged dependent variable when using the within-between estimator?
    See

    Code:
    help tsvarlist
    Essentially, you just include L.depvar as a regressor, where "depvar" is the name of your outcome variable. However, there is a bias in including the lagged dependent variable as a regressor. This bias is of order \(\frac{1}{T}\), so you can only justify doing this if \(T\) is large.

    And if so, how: should one add a "regular" variable, or one needs to "split" it to the between and the within variables, like the other variables?
    What do you mean by a between variable and a within variable? You can talk about within variation (variation over time) and between variation (variation across cross-sections), but that is not a description of a variable.


    References:

    Balestra, P., Nerlove, M. (1966). Pooling cross-section and time series data in the estimation of a dynamic model: the demand for natural gas. Econometrica 34: 585–612.

    Theil, H. and Goldberger, A.S. (1961). On pure and mixed statistical estimation in economics. International Economic Review 2(1): 65 – 78.
    Last edited by Andrew Musau; 15 Jan 2022, 07:18.

    Comment


    • #3
      Dan:
      in addition to Andrew's excellent reply, as per FAQ you're kindly requested to provide full reference of the source you quote, so that interested listers are not forced to skim through the web or PubMed to spot what you're referring to.
      This time was easy (you're probably referring to https://www.cambridge.org/core/servi...847014000077): other times, especially when there are many versions (say working papers and published articles) of the same sources, with possible differences/amendments among them, it is more difficult to be aligned on the very same reference(s). Thanks.
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        I am not going to comment on the Bell-and-Jones article here. Essentially, the "within-between" estimator is a Mundlak/correlated random effects/hybrid estimator, which has been around for a long time. I recommend to have a look at the following article:
        Schunk, R., and F. Perales (2017). Within- and between-cluster effects in generalized linear mixed models: A discussion of approaches and the xthybrid command. Stata Journal 17 (1), 89-115.

        As Andrew said, if you want to include a lagged dependent variable, you need large T. If you are interested in between effects of your other variables, then you should not include the mean of the lagged dependent variable on the right-hand side. This would remove the between variation, such that you are effectively only explaining the within variation of the dependent variable.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Hey all,
          Thank you very much for your kind help.
          I see there are many constraints, so I am not sure how to continue.
          I'll give you some more details that might help you understand my dilemma.

          I'm currently working on a paper investigating the determinants of foreign direct investment. N=100, T=10, unit of analysis is country-year. I suspect there is a difference between the covariates' cross-sectional and temporal effects. For example, countries that are more democratic enjoy higher levels of FDI, but the democratization of a country will not result in more FDI from t-1 to t.
          The only method I encountered that allows differentiating between the cross-sectional and temporal effects is to use the within-between random effects estimator: Bell and Jones,2015, "Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data", Political Science Research and Methods.
          A common practice in the literature on the determinants of foreign direct investment is to use a lagged dependent variable. I'm unsure how to use the within-between/Mundlak/correlated random effects/hybrid estimator with a lag dependent variable.

          Given your answers, I have several questions:
          1. Is a T of 10 considered too short?
          2. Sebastian, should I include just the lagged values (without the decomposition)?
          3. Do you recommend a different method that separates the cross-sectional from the temporal variation?

          Thank you so much,
          Dan

          Comment


          • #6
            One more way I thought of taking is to fit one model with country FE, another with time FE, and compare the two.
            The issue here is that it seems that gold-standard practice is the two-way fixed effects, and I'm afraid to get biased results when using just one-way FE.
            Any thoughts?

            Thanks (:

            Comment


            • #7
              Using these conventional least-squares estimators, T=10 is certainly too short for estimating an unobserved-effects model with a lagged dependent variable. Keyword: Nickell bias.

              You would need to resort to an estimator that is commonly used for dynamic panel data methods, e.g. GMM, maximum likelihood, bias correction. You might want to have a look at some of the commands that I developed.

              I recommend to separate the modeling step from the estimation step. First, make yourself clear what model you want to estimate, i.e. how should your regression equation look like. You can still include within and between effects of exogenous regressors in your dynamic panel model. Just leave the lagged dependent variable untouched. Second, find an appropriate estimator which deals with any potential complications arising from that regression equation. Since the between effects are the coefficients of time-invariant within-group averages, the following paper might be of interest as well:
              Kripfganz, S., and C. Schwarz (2019). Estimation of linear dynamic panel data models with time-invariant regressors. Journal of Applied Econometrics 34 (4), 526-546.
              https://www.kripfganz.de/stata/

              Comment


              • #8
                Thank you so much, Sebastian!
                One last question: I have the option to extend the time period to 18 years (while reducing N to ~80) . Would 18 be enough?

                Comment


                • #9
                  It depends on the strength of the autocorrelation. If FDI is very persistent, then 18 years might still be too short. With macroeconomic data sets, I would normally not feel comfortable ignoring the Nickell bias when T is not at least 30. Moreover, trading N for T may create problems elsewhere.
                  https://www.kripfganz.de/stata/

                  Comment

                  Working...
                  X