Need help for Panel Data Regression with "reghdfe"

jannis bauer

Join Date: Apr 2022

Posts: 13
#1

Need help for Panel Data Regression with "reghdfe"

02 Apr 2022, 09:44

Good day,
I have quarterly data regarding GDP from different European countries, i.e. panel data. With the command

PHP Code:

xtset Country_num Year

I have specified the structure, is this correct ?

For the regression I have to use the command "reghdfe" and control for direct country effects.
Which of the two would be correct for this ?

PHP Code:

reghdfe y x1 x2, absorb(country_num) vce(robust)

PHP Code:

reghdfe y x1 x2, absorb(Land_num Year) vce(robust)

Furthermore, maybe I want to use the cluster command if i get better results for it in my regression, how would it look then ?
Would this be correct ?

PHP Code:

reghdfe y x1 x2, cluster(Land_num) absorb(Land_num)

I would be very happy if you could help me and explain the differences as well
Tags: fixed effects, panel data, reghdfe, regression
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

02 Apr 2022, 09:51

Jannis:
welcome to this forum.
Just out of curiosity: why should you have to use the community-contributed module -reghdfe- to go country -fe-?
Is if a class/home assignment on panel data regression (if it were the case, as per FAQ, the rule is "please do not post, please do not reply").
Please clarify (and use the # toggle to share your Stata codes). Thanks

Kind regards,
Carlo
(Stata 19.0)
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#3

02 Apr 2022, 10:00

I don't use this command but all of it pretty much looks right. My only question is why do you have the year as the time variable in xtset instead of the quarterly variable?
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#4

02 Apr 2022, 11:58

Thanks for your answers. I am new so i do not really know how this works. I just named the quarterly data Year so you know that it is a time variable.
I have read a lot of literature and I was told to cluster the standard errors ? Would it be correct then or would it make sense at all in my case ?

Code:

reghdfe y x1 x2, cluster(Land_num) absorb(Land_num)

It's for a private project I'm working on and I was told that the -reghdfe- command had replaced the -xtset, fe- command
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#5

02 Apr 2022, 12:43

Jannis:
thanks for clarifying.
Whoever told you that the community-contributed module -reghdfe- superseeded -xtreg,fe- is wrong (see related help files for the differences between the two tools).
Your code can be written using -xtreg,fe:

Code:

xtset Land_num xtreg y x1 x2 i.quarter, fe vce(cluster Land_num)

It is mandatory to -xtset- your panel dataset with the -timevar- too if you've planned to use time-series operators, such as lags and leads. Otherwise, you can safely -xtset- your panel dataset with the -panelid- only.

Kind regards,
Carlo
(Stata 19.0)
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#6

02 Apr 2022, 15:30

Thanks for your answers. I want to use lags. I think i -xtset- my timevariable too. To be more precise i did

Code:

xtset Land_num date

. As i said i have quarterly Data so i have GDP for each nation from 01.01.2020, 01.04.2020 etc
I just want the to control for the country fixed effects, would it look like this then ?

Code:

xtreg y x1 x2 i.Land_num, fe vce(cluster Land_num)

Or would it be double if i also cluster for Land_num ?
Sorry for my questions but i want to make sure if i understand what i am doing
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#7

02 Apr 2022, 15:32

it was translated wrong in my first question, so it is just Land_num and not Country_num.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#8

02 Apr 2022, 16:32

Jannis:
as you already -xtset- your data with your -panelid-, you should not add it as a predictor in the right-hand side of your regression equation. Conversely, plugging in -i.quarter- makes sense.

Kind regards,
Carlo
(Stata 19.0)
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#9

03 Apr 2022, 06:30

hmm sorry but i am confused right now, i want to control for the country fixed effects. So why should i plug -i.quarter- in my equation.
As you can see i am very confused. To make things clear, I have quarterly data regarding GDP from different European countries, i.e. panel data. I want to control for the country fixed effects and maybe also cluster.
What should the equation look like and how should I -xtset- my data ?

Thanks in advance
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#10

03 Apr 2022, 06:45

Jannis:
1) it's mandatory that you -xtset- your panel dataset with the -panelid-.
In your case:

Code:

xtset Land_num

2) It is not mandatory (but if you do, it does neither help, nor hamper) to add the -timevar- in -xtset-, provided that you do not plan to use time-series related operators, such as lags and leads. That said, eben if you do nit -xtset- your panel dataset with -timevar- too, you can always add ii to your set of predictors in the righ-hand side of your regerssion equation;
3) It is mandatory to add -timevar- in -xtset-, if you plan to use time-series related operators, such as lags and leads.

In your case, 1) is enough.:

Code:

xtset Land_num xtreg y x1 x2 i.quarter, fe vce(cluster Land_num)

Kind regards,
Carlo
(Stata 19.0)
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#11

03 Apr 2022, 07:14

Oh okay thanks Carlo. But i still do not know why you would put -i.quarter- in there if i only want country fixed effects ?

Code:

xtreg y x1 x2, fe vce(cluster Land_num)

Is this wrong ?
And i do want to use time-series related operators such as lags in the next step, so i should also -xtreg- my -timevar-
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#12

03 Apr 2022, 07:23

Jannis:
1) including the -timevar- in the right-hand side of your panel regression equation is a good idea, no matter the way you -xtset- your data. Please note that the -fe- estimator focuses on the within panel variation and -timevar-, when adjusted for the remaining predictors is often informative. In addition, you're not controlling for the -fe-, but you're investigating whether (or not) the evidence of a panel-wise effect (time-varying unobserved heterogeneity; its time-invariant counterpart is wiped out by the -fe- estimator, along with all the other time-invariant variables) comes alive in your dataset.
2) your code is correct, but incomplete as far as the T dimnension of your panel is concerned;
3) If you want to use time-series related operators such as lags or leads in your regression, -xtset- with -timevar- too is mandatory.

Kind regards,
Carlo
(Stata 19.0)
Comment
jannis bauer

Join Date: Apr 2022

Posts: 13
#13

03 Apr 2022, 10:16

hmm so that is very confusing for me. I guess i will stick with -reghdfe-

So to clarify all my questions would be the following code correct if i only want to investigate country specific fixed effects ?

Code:

xtset Land_num date reghdfe y x1 x2, absorb(Land_num) vce(cluster Land_num)

if not what would the correct code look like ?

Thanks in advance
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

#14

03 Apr 2022, 10:46

Jannis:
your code is correct but my gut-feeling is that you're assigning to the community-contributed module -reghdfe- some extra powers.
I do hope that the following tpy-example clarifies the issue (that is, with one fixed effect only, -xtreg,fe- and the community-contributed module -reghdfe- give back almost identical results:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtset idcode year

Panel variable: idcode (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit

. xtreg ln_wage i.year c.age##c.age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,4709)        =      79.11
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
         69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
         70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
         71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
         72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
         73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
         75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
         77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
         78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
         80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
         82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
         83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
         85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
         87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
         88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
             |
         age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
             |
 c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
             |
       _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. reghdfe ln_wage i.year c.age##c.age, abs(idcode)  vce(cluster idcode)
(dropped 551 singleton observations)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression                            Number of obs   =     27,959
Absorbing 1 HDFE group                            F(  16,   4158) =      79.11
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.6593
                                                  Adj R-squared   =     0.5995
                                                  Within R-sq.    =     0.1162
Number of clusters (idcode)  =      4,159         Root MSE        =     0.3013

                             (Std. err. adjusted for 4,159 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        year |
         69  |   .0647054   .0155252     4.17   0.000     .0342677    .0951432
         70  |   .0284423   .0264645     1.07   0.283    -.0234422    .0803268
         71  |   .0579959   .0384118     1.51   0.131    -.0173118    .1333037
         72  |   .0510671   .0502685     1.02   0.310    -.0474861    .1496203
         73  |   .0424104   .0624936     0.68   0.497    -.0801104    .1649313
         75  |   .0151376   .0862297     0.18   0.861    -.1539187    .1841939
         77  |   .0340933   .1106863     0.31   0.758    -.1829111    .2510976
         78  |   .0537334   .1232256     0.44   0.663    -.1878546    .2953214
         80  |   .0369475   .1473754     0.25   0.802    -.2519871    .3258822
         82  |   .0391687   .1715655     0.23   0.819    -.2971914    .3755288
         83  |    .058766   .1836122     0.32   0.749    -.3012121    .4187442
         85  |   .1042758    .208024     0.50   0.616    -.3035625     .512114
         87  |   .1242272   .2327373     0.53   0.594    -.3320624    .5805167
         88  |   .1904977   .2486132     0.77   0.444    -.2969171    .6779125
             |
         age |   .0728746   .0136873     5.32   0.000     .0460402    .0997089
             |
 c.age#c.age |  -.0010113   .0001076    -9.39   0.000    -.0012224   -.0008003
             |
       _cons |   .3956251   .2469216     1.60   0.109    -.0884733    .8797234
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      idcode |      4159        4159           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

.

Last edited by Carlo Lazzaro; 03 Apr 2022, 10:49.

Kind regards,
Carlo
(Stata 19.0)

Comment

jannis bauer

Join Date: Apr 2022

Posts: 13
#15

04 Apr 2022, 02:46

okay i see the results are almost the same but why would you also use -i.year- in -reghdfe- ?
In your example, which fixed effect did you consider ? Is it the time fixed effect or the i-dcode- fixed effect ?
Comment

Announcement

Need help for Panel Data Regression with "reghdfe"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment