Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First differences and fixed effects giving very different point estimates in large panel

    Hello all,

    I have an unbalanced yearly panel of firms from 1966 to 2019, N varying from 4000 to 11000. The firm identifier is gvkey, and the year one is fyear. I want to estimate how one variable (defdollars) effects some outcome variables (sale here).

    I have run two regressions:

    Code:
    . xtreg lsale ldefdollars, fe vce(cluster gvkey)
    
    Fixed-effects (within) regression               Number of obs     =    404,033
    Group variable: gvkey                           Number of groups  =     33,244
    
    R-squared:                                      Obs per group:
         Within  = 0.0106                                         min =          1
         Between = 0.0483                                         avg =       12.2
         Overall = 0.0515                                         max =         54
    
                                                    F(1, 33243)       =     506.04
    corr(u_i, Xb) = 0.1786                          Prob > F          =     0.0000
    
                                 (Std. err. adjusted for 33,244 clusters in gvkey)
    ------------------------------------------------------------------------------
                 |               Robust
           lsale | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     ldefdollars |   .0309712   .0013768    22.50   0.000     .0282727    .0336698
           _cons |   19.38947   .0028799  6732.78   0.000     19.38383    19.39511
    -------------+----------------------------------------------------------------
         sigma_u |  2.6370201
         sigma_e |  .94949394
             rho |  .88523345   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . reg d.lsale d.ldefdollars, vce(cluster gvkey)
    
    Linear regression                               Number of obs     =    368,196
                                                    F(1, 31711)       =     102.95
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.0001
                                                    Root MSE          =     .55901
    
                                 (Std. err. adjusted for 31,712 clusters in gvkey)
    ------------------------------------------------------------------------------
                 |               Robust
         D.lsale | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     ldefdollars |
             D1. |   .0023661   .0002332    10.15   0.000      .001909    .0028232
                 |
           _cons |    .080785   .0009513    84.92   0.000     .0789204    .0826496

    As you can see, the confidence intervals do not overlap at all. Ex-ante the first differences point estimate seems a little absurd as well. It is lvar rather than var as I took the inverse hyperbolic sine as a log-like (I'm aware that's not great) but the same problem occurs when regressing just sale and defdollars in levels as well.


    My code looks like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(gvkey fyear sale defdollars)
    1000 1966  24613310.84450585                  0
    1000 1967 21321025.835530076                  0
    1000 1968  42108496.80268002                  0
    1000 1969  202806039.9322772                  0
    1000 1970  233549293.0123074                  0
    1000 1971 230621977.36524698                  0
    1000 1972   161521176.707492                  0
    1000 1973  168213337.9115217                  0
    1000 1974  205681929.6039222                  0
    1000 1975  191568261.0541805                  0
    1000 1976 235531182.26968852                  0
    1000 1977 260292422.77736828                  0
    1001 1983  55555738.15022751                  0
    1001 1984  67575042.65701024                  0
    1001 1985 110095568.90021476                  0
    1002 1966 129415225.67051686                  0
    1002 1967 224837751.57668054                  0
    1002 1968 148455212.57907015                  0
    1002 1969 151535032.88585508                  0
    1002 1970 109884337.04538473                  0
    1002 1971 112459657.06784469                  0
    1002 1972  124367827.6388692                  0
    1003 1982 28974249.544596586                  0
    1003 1983  30174455.45603812                  0
    1003 1984 29196590.274121124                  0
    1003 1985  49501872.11657114                  0
    1003 1986  72832911.73878749                  0
    1003 1987  73128717.95258583                  0
    1003 1988  62038335.45143017                  0
    1003 1989  35056071.89442273                  0
    1004 1966   28410304.7297446                  0
    1004 1967  27348338.64824531                  0
    1004 1968 56652999.211821936                  0
    1004 1969   85180489.3329165                  0
    1004 1970 109657665.18275009  922142.3502636601
    1004 1971 122972563.31390788  1951556.290080758
    1004 1972 175354970.52188432                  0
    1004 1973 255381266.81814915 294094.84244133596
    1004 1974 223260428.94274917 474100.42392558313
    1004 1975 246222395.54191896  333051.5555911483
    1004 1976  285610018.4311947  198598.8828726256
    1004 1977 309962572.57839173  347297.0000878339
    1004 1978   370310865.431648  795750.0458010251
    1004 1979  376246330.4171706 2048057.6106219585
    1004 1980  349955515.7147325  5499670.775788961
    1004 1981  424543298.6668024   2519401.58141096
    1004 1982  352304873.3063805 12309737.647751423
    1004 1983  388883604.0583084  58576902.72535901
    1004 1984 462251547.77335477 13520497.802839806
    1004 1985 507547162.23800254 21125628.420330066
    1004 1986  598165462.6311699  26757690.03205041
    1004 1987  680545762.6361747 23767420.619508103
    1004 1988  768407034.6879776  77381515.58593714
    1004 1989  809444645.4575862  29206418.54202905
    1004 1990  818175783.0478432 33539556.676119193
    1004 1991  717012976.3323003  79189900.40432733
    1004 1992  634911802.7111324 51390084.143460035
    1004 1993   660662789.658876 117203765.24939635
    1004 1994  716088646.4806716  73704170.45320241
    1004 1995  784636932.6752614 59916633.102194175
    1004 1996  899238515.2047518  59395948.37193432
    1004 1997 1173173132.7371325   45452551.9791335
    1004 1998  1361746404.537615  69957949.38187218
    1004 1999 1498238142.8375666 57387982.352752134
    1004 2000 1250389157.5803041  52310037.76444111
    1004 2001   893354309.325109 125923941.95939985
    1004 2002  835097194.4199641 134346540.91277105
    1004 2003  880601243.2826586  195554173.8737517
    1004 2004  983672710.3442054 235089463.52230775
    1004 2005 1144322047.7315855 263227294.27620092
    1004 2006 1312797378.2097762  300761628.8055568
    1004 2007 1668144392.7556825 285652521.65513384
    1004 2008  1682630653.686196 402325892.40121555
    1004 2009 1588034281.8516088 393495639.57619745
    1004 2010 2060594515.1631072 531648344.97950876
    1004 2011 2358545130.3699007  539468245.2227405
    1004 2012  2418738527.866112 494368015.10558474
    1004 2013  2233341446.764629 503366236.68542755
    1004 2014 1719759437.2219522  334419169.9707764
    1004 2015 1776884593.7424757 356149786.51001954
    1004 2016 1871348084.9330153 252641538.44245732
    1004 2017 1818388534.1301599  277852005.5015617
    1004 2018 2086221411.7994044 352989998.11839515
    1004 2019 2089300000.0000005  427336844.5152645
    1005 1974  24972013.70849408  6159218.438412533
    1005 1975  23975969.85025267  4625300.255175948
    1005 1976 28158484.464440133 3333624.1053619296
    1005 1977 25830214.381532643  2337575.962129651
    1005 1978 25542016.176005457 2861579.5764687844
    1005 1979   39238594.6158823                  0
    1005 1980   61764314.1592207                  0
    1005 1981  86685272.22783822                  0
    1006 1974  22127411.16494058                  0
    1006 1975 22782223.263358552                  0
    1006 1976 22200517.978261366                  0
    1006 1977 22611038.342256956                  0
    1006 1978 21372910.053691063                  0
    1006 1979 21223753.129483245                  0
    1006 1980 20355649.855057508                  0
    1006 1981  22271606.50846911                  0
    end
    As you can see there are lots of 0s for the defdollars data (~90% of observations receive no defdollars in a given year).

    I would be exceptionally grateful for any help at all.

  • #2
    When T > 2, you should not expect the estimated effects to be the same. See https://www.statalist.org/forums/for...rst-difference for an extended discussion of the difference between the first-difference and fixed-effects estimators and their underlying asumptions and uses.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      When T > 2, you should not expect the estimated effects to be the same. See https://www.statalist.org/forums/for...rst-difference for an extended discussion of the difference between the first-difference and fixed-effects estimators and their underlying assumptions and uses.
      Thanks for the reference, but that post illustrates exactly why I'm confused. I have a large N and reasonable T panel- everything I've read implies FD and FE should be roughly the same, because asymptotically they converge. But my FD estimate is an order of magnitude off my FE estimate, and the confidence intervals aren't close to overlapping.

      Comment


      • #4
        To obtain efficiency, the differenced equation needs to be estimated by GLS and not OLS. Therefore, FD-GLS is identical to the within-groups (FE) estimator, not FD-OLS. Nevertheless, the coefficients in your example have the same sign and significance, so you are more or less reaching the same conclusion. Whether the difference in coefficients is material cannot easily be determined just by looking at their absolute values.

        . reg d.lsale d.ldefdollars
        Also note that differencing eliminates the constant, so your FD-OLS command needs to be:

        Code:
        reg D.(L.sale L.defdollars), nocons robust
        Last edited by Andrew Musau; 17 Feb 2024, 01:03.

        Comment


        • #5
          Hi Andrew, thanks for your response. Here is the code with no constant:

          Code:
          . reg d.lsale d.ldefdollars, vce(cluster gvkey) noconstant
          
          Linear regression                               Number of obs     =    368,196
                                                          F(1, 31711)       =     133.99
                                                          Prob > F          =     0.0000
                                                          R-squared         =     0.0002
                                                          Root MSE          =     .56481
          
                                       (Std. err. adjusted for 31,712 clusters in gvkey)
          ------------------------------------------------------------------------------
                       |               Robust
               D.lsale | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
           ldefdollars |
                   D1. |   .0027175   .0002348    11.58   0.000     .0022573    .0031777
          ------------------------------------------------------------------------------
          
          .
          As you can see, still very different I'm afraid. As for needing to use GLS, it seems that (from what I've read) asymptotically it shouldn't matter - they should converge to the same thing in a large, reasonably long panel like mine.

          I understand that in some contexts the fact that both are significant is what matters most, but unfortunately not in mine - the fact that winning defence contracts improves revenue is trivial. The question is by how much, and FD and FE seemingly have very different responses to that.

          I've done some more testing by only looking at a sample of firms, a few years, making sure it's only firms that survive the breadth of the panel (thus forcing a balanced panel) and I still get this anomaly.

          Also, just to clarify lsale and ldefdollars is not l1.sale, l1.defdollars; it is log(sale), log(defdollars), as approximated by asinh. To check that transformation is not the issue I have kept only those observations where defdollars, sale>0 always and used the actual natural log, and the problem still persists (and it persists if I model both variables as linear anyway).

          To give more background, usually in this situation I would just use fixed effects and call it a day. But a) I'm concerned that my data exhibits this anomaly that seems to be exceedingly rare, and b) in my case FD has a natural economic interpretation that allows me to use my IV of choice easily. But I need to understand this discrepancy first.

          Comment


          • #6
            I do not know the context here, but here are some considerations:

            1. Taking the log of a variable that has some observations equal to zero is a wrong approach. You will end up biasing the sample as the zero observations drop out. There are other transformations apart from the log transformation.

            2. With unbalanced panels, you can lose a lot of observations with FD. Most of the equivalence arguments implicitly assume balanced panels.

            3. My comments in #4.

            Again, I don't know the economic significance of your coefficients. But an example can be the following: A 1 percentage point increase in education funding reduces the national debt by $2 vs. A 1 percentage point increase in education funding reduces the national debt by 2 million dollars. The factor is a million, but the decline is not economically significant as the debt is in trillions of dollars.

            Comment

            Working...
            X