Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need Help for Panel Data Regression

    Hello dear Community,
    I would like to do a regression for my research with panel data. More specifically i have firms revenue as my dependent variable and observations of many firms in a time period of 5 years.
    As far as i know i have to -xtset- my data.
    Code:
    xtset company_id year
    now i do not know which equation i should use, -xtreg- or -reghdfe-. I want to consider for company fixed effects. Also i am not sure if clustering would be useful in my case.

    Code:
    reghdfe revenue x1 x2, absorb(company_id)
    also what would be the equivalent with -xtreg- of this equiation ?

  • #2
    Hakan:
    welcome to this forum.
    The equivalent -xtreg- code is:
    Code:
    xtset company_id year
    xtreg revenue x1 x2, fe
    The -vce(cluster company_id)- standard errors make sense in case of heteroskedasticity and/or autocorrelation of the epsilon..
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      I spoke about clustering in a post yesterday where I linked a paper on cluster SEs, so I'll do the same here.

      Comment


      • #4
        Hakan:
        as an aside to Jared's helpful reference, another towering working paper on the very same topic is
        http://cameron.econ.ucdavis.edu/research/Cameron_Miller_JHR_2015_February.pdf
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          This paper was published in JHR, no? It isn't still a working paper is it? Carlo Lazzaro

          Comment


          • #6
            Jared:
            you're right.
            The working paper was published on JHR in 2015 (I cannot share the full reference at the moment).
            Last edited by Carlo Lazzaro; 03 Apr 2022, 13:59.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Cameron and Trivedi's Microeconometrics textbook (Cambridge,2005) also covers this topic in different chapters.
              Last edited by Carlo Lazzaro; 03 Apr 2022, 14:03.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                Thanks for your answers. So to make things clear are these equivalent ?

                Code:
                xtset company_id year
                xtreg revenue x1 x2, fe vce(cluster company_id)
                Code:
                xtset company_id year
                reghdfe revenue x1 x2, absorb(company_id) vce(cluster company_id)
                if not, what how would the correct -reghdfe- code look like ?

                Comment


                • #9
                  Hakan:
                  they are almost equivalent (as you can see from the following toy-example):
                  Code:
                  use "https://www.stata-press.com/data/r17/nlswork.dta"
                  . xtreg ln_wage i.year c.age##c.age, fe vce(cluster idcode)
                  
                  Fixed-effects (within) regression               Number of obs     =     28,510
                  Group variable: idcode                          Number of groups  =      4,710
                  
                  R-squared:                                      Obs per group:
                       Within  = 0.1162                                         min =          1
                       Between = 0.1078                                         avg =        6.1
                       Overall = 0.0932                                         max =         15
                  
                                                                  F(16,4709)        =      79.11
                  corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
                  
                                               (Std. err. adjusted for 4,710 clusters in idcode)
                  ------------------------------------------------------------------------------
                               |               Robust
                       ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                          year |
                           69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
                           70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
                           71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
                           72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
                           73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
                           75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
                           77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
                           78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
                           80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
                           82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
                           83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
                           85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
                           87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
                           88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
                               |
                           age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
                               |
                   c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
                               |
                         _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
                  -------------+----------------------------------------------------------------
                       sigma_u |  .40275174
                       sigma_e |  .30127563
                           rho |  .64120306   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  
                  . reghdfe ln_wage i.year c.age##c.age, abs(idcode) vce(cluster idcode)
                  (dropped 551 singleton observations)
                  (MWFE estimator converged in 1 iterations)
                  
                  HDFE Linear regression                            Number of obs   =     27,959
                  Absorbing 1 HDFE group                            F(  16,   4158) =      79.11
                  Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                                    R-squared       =     0.6593
                                                                    Adj R-squared   =     0.5995
                                                                    Within R-sq.    =     0.1162
                  Number of clusters (idcode)  =      4,159         Root MSE        =     0.3013
                  
                                               (Std. err. adjusted for 4,159 clusters in idcode)
                  ------------------------------------------------------------------------------
                               |               Robust
                       ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                  -------------+----------------------------------------------------------------
                          year |
                           69  |   .0647054   .0155252     4.17   0.000     .0342677    .0951432
                           70  |   .0284423   .0264645     1.07   0.283    -.0234422    .0803268
                           71  |   .0579959   .0384118     1.51   0.131    -.0173118    .1333037
                           72  |   .0510671   .0502685     1.02   0.310    -.0474861    .1496203
                           73  |   .0424104   .0624936     0.68   0.497    -.0801104    .1649313
                           75  |   .0151376   .0862297     0.18   0.861    -.1539187    .1841939
                           77  |   .0340933   .1106863     0.31   0.758    -.1829111    .2510976
                           78  |   .0537334   .1232256     0.44   0.663    -.1878546    .2953214
                           80  |   .0369475   .1473754     0.25   0.802    -.2519871    .3258822
                           82  |   .0391687   .1715655     0.23   0.819    -.2971914    .3755288
                           83  |    .058766   .1836122     0.32   0.749    -.3012121    .4187442
                           85  |   .1042758    .208024     0.50   0.616    -.3035625     .512114
                           87  |   .1242272   .2327373     0.53   0.594    -.3320624    .5805167
                           88  |   .1904977   .2486132     0.77   0.444    -.2969171    .6779125
                               |
                           age |   .0728746   .0136873     5.32   0.000     .0460402    .0997089
                               |
                   c.age#c.age |  -.0010113   .0001076    -9.39   0.000    -.0012224   -.0008003
                               |
                         _cons |   .3956251   .2469216     1.60   0.109    -.0884733    .8797234
                  ------------------------------------------------------------------------------
                  
                  Absorbed degrees of freedom:
                  -----------------------------------------------------+
                   Absorbed FE | Categories  - Redundant  = Num. Coefs |
                  -------------+---------------------------------------|
                        idcode |      4159        4159           0    *|
                  -----------------------------------------------------+
                  * = FE nested within cluster; treated as redundant for DoF computation
                  
                  .
                  That said, in this case I would stick with -xtreg,fe-, as you have one fixed effect only.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    yes. but you did it with -i.year-. I just want to know that would be correct the correct -reghdfe- for my specific case ?

                    Comment


                    • #11
                      Hakan:
                      yes, it is correct, but possibly misspecified.
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        what do you mean with misspecified ? How should i make it then ?

                        Comment


                        • #13
                          Hakan:
                          in the following toy-example:
                          Code:
                          . use "https://www.stata-press.com/data/r17/nlswork.dta"
                          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                          
                          . xtreg ln_wage c.age##c.age, fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1087                                         min =          1
                               Between = 0.1006                                         avg =        6.1
                               Overall = 0.0865                                         max =         15
                          
                                                                          F(2,4709)         =     507.42
                          corr(u_i, Xb) = 0.0440                          Prob > F          =     0.0000
                          
                                                       (Std. err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0539076    .004307    12.52   0.000     .0454638    .0623515
                                       |
                           c.age#c.age |  -.0005973    .000072    -8.30   0.000    -.0007384   -.0004562
                                       |
                                 _cons |    .639913   .0624195    10.25   0.000     .5175415    .7622845
                          -------------+----------------------------------------------------------------
                               sigma_u |   .4039153
                               sigma_e |  .30245467
                                   rho |  .64073314   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . predict fitted, xb
                          (24 missing values generated)
                          
                          . g sq_fitted=fitted^2
                          (24 missing values generated)
                          
                          . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1092                                         min =          1
                               Between = 0.1033                                         avg =        6.1
                               Overall = 0.0881                                         max =         15
                          
                                                                          F(2,4709)         =     523.09
                          corr(u_i, Xb) = 0.0467                          Prob > F          =     0.0000
                          
                                                       (Std. err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                fitted |   2.569185   .7085064     3.63   0.000     1.180181    3.958189
                             sq_fitted |    -.47432   .2153021    -2.20   0.028    -.8964128   -.0522272
                                 _cons |  -1.290258    .580562    -2.22   0.026    -2.428431   -.1520844
                          -------------+----------------------------------------------------------------
                               sigma_u |    .403403
                               sigma_e |  .30238578
                                   rho |  .64025357   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . test sq_fitted=0
                          
                           ( 1)  sq_fitted = 0
                          
                                 F(  1,  4709) =    4.85
                                      Prob > F =    0.0276
                          
                          . xtreg ln_wage c.age##c.age i.year, fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1162                                         min =          1
                               Between = 0.1078                                         avg =        6.1
                               Overall = 0.0932                                         max =         15
                          
                                                                          F(16,4709)        =      79.11
                          corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
                          
                                                       (Std. err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                   age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
                                       |
                           c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
                                       |
                                  year |
                                   69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
                                   70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
                                   71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
                                   72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
                                   73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
                                   75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
                                   77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
                                   78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
                                   80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
                                   82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
                                   83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
                                   85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
                                   87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
                                   88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
                                       |
                                 _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
                          -------------+----------------------------------------------------------------
                               sigma_u |  .40275174
                               sigma_e |  .30127563
                                   rho |  .64120306   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . predict fitted2, xb
                          (24 missing values generated)
                          
                          . g sq_fitted2=fitted2^2
                          (24 missing values generated)
                          
                          . xtreg ln_wage fitted2 sq_fitted2 , fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression               Number of obs     =     28,510
                          Group variable: idcode                          Number of groups  =      4,710
                          
                          R-squared:                                      Obs per group:
                               Within  = 0.1164                                         min =          1
                               Between = 0.1094                                         avg =        6.1
                               Overall = 0.0941                                         max =         15
                          
                                                                          F(2,4709)         =     586.29
                          corr(u_i, Xb) = 0.0619                          Prob > F          =     0.0000
                          
                                                       (Std. err. adjusted for 4,710 clusters in idcode)
                          ------------------------------------------------------------------------------
                                       |               Robust
                               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                               fitted2 |   2.012332   .5365254     3.75   0.000     .9604909    3.064172
                            sq_fitted2 |  -.3040363   .1616996    -1.88   0.060    -.6210431    .0129706
                                 _cons |  -.8379964    .443929    -1.89   0.059    -1.708305    .0323122
                          -------------+----------------------------------------------------------------
                               sigma_u |  .40239556
                               sigma_e |  .30114591
                                   rho |  .64099409   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          . test sq_fitted=0
                          
                           ( 1)  sq_fitted2 = 0
                          
                                 F(  1,  4709) =    3.54
                                      Prob > F =    0.0601
                          
                          .
                          The first regression model is misspecified (sq_fitted test reaches statistical significance), whereas the second one does not support evidence of misspecification (sq_fitted test does not reach statistical significance).
                          The only difference btween the two codes rests on the regressor -i.year- plugged in the right-hand side of the regression equation in the second code.
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment

                          Working...
                          X