Dynamic panel vs XTREG vs XTEGAR vs other

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#16

29 Mar 2022, 00:58

Giorgio:
1) I fail to get the rationale behind switching between the two estimators conditional on N and T;
2) you can rely on the -group- function available from -egen- and create an hybrid:

Code:

egen wanted=group(panelid timevar)

Despite technically feasible, its coefficient is difficult to read;
3) Time-varying ciefficient are determined by -fe- and -re- estimators. No issue here, as far as I can get your question.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#17

29 Mar 2022, 01:18

Originally posted by Carlo Lazzaro View Post

Giorgio:
1) I fail to get the rationale behind switching between the two estimators conditional on N and T;
2) you can rely on the -group- function available from -egen- and create an hybrid:

Code:

egen wanted=group(panelid timevar)

Despite technically feasible, its coefficient is difficult to read;
3) Time-varying ciefficient are determined by -fe- and -re- estimators. No issue here, as far as I can get your question.

1.Simply I need to compare between full period length, T >N and their sub-periods, N>T, in order to check for differences
2. I already have groups defined .
3.I fail to understand how to detect those time varying parameters or variables
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#18

29 Mar 2022, 01:29

Giorgio:
1) N>T and T>N datasets imply different error structures. Threfore, techinically you can compare -xtregar- with -xtreg- results, but I cannot say what you can get from that comparison;
2) if feasible (ie, depending on the estimator), I would keep -paneild- and -timevar- saparated, and add the latter in the right-hand side of the regression equation as categorical predictor;
3) from your data, you should be able to detect those variables that changes as time goes by.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#19

29 Mar 2022, 08:11

Originally posted by Carlo Lazzaro View Post

Giorgio:
1) N>T and T>N datasets imply different error structures. Threfore, techinically you can compare -xtregar- with -xtreg- results, but I cannot say what you can get from that comparison;
2) if feasible (ie, depending on the estimator), I would keep -paneild- and -timevar- saparated, and add the latter in the right-hand side of the regression equation as categorical predictor;
3) from your data, you should be able to detect those variables that changes as time goes by.

Thanks, Carlo!
1.I have been thinking to stick with xterg varlist, vce(cluster panelid) for both large sample, T>N, T=80 and smaller ones, T<N with T=10 and see what happens. The bias for T>N is expected to be small, but as Sebastian wrote above, but worries me on the second case.

2. You mean something like that i. time (i. Year in the case) in the varlist, right? Will that effect the margins estimation or eat a lot of degrees of freedom? Should I add a I. Country to check for individuals effects or clustering will be enough?

Last edited by Giorgio Di Stefano; 29 Mar 2022, 08:15.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#20

29 Mar 2022, 08:51

Giorgio:
1) I've nothing to add to my previous 1) in #18;
2) You're right: I meant to add -i.year- in the right-hand side of your regression equation. Under -fe- time-invariant predictor are wiped out. Hence, -i:country- as a predictor does not make sense, as far as I can get the issue.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#21

29 Mar 2022, 16:40

Thank you very much, Carlo!
After careful thought, I think will give a try to the user written command reghdfe, because of the number of my dummies and categorical variables I have, and mainly because I can obtain Driscoll-Kraay standard errors with one lag.

Allow me a few brief last points.
The code I will run is the following

Code:

reghdfe growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1, absorb(time panelid) vce(cluster panelid, dkraay(1))

1. Is it correct in this way? Or, should I absorve by country_name instead of panelid?

2. Having already at least one lagged term l.gdp in the RHS of the equation, adding a dkraay(1) lag will that increase the number of lags? That is, will there be two lags for that specific lagged variable or that will not affect it?

3. Should I still add the -i.year- in the right-hand side of your regression equation, as suggested in #18?

And last

4. I was thinking to absorve or cluster by groups when I go to check for the sub-periods and T is reduced to T=10. Will that be correct, if so, which way will be the right one?

Thank you wholeheartedly!

Grazie di cuore!

Giorgio!

Last edited by Giorgio Di Stefano; 29 Mar 2022, 16:44.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#22

30 Mar 2022, 01:03

Giorgio:
1) ii depends on how you -xtset- your data. Please also note that time-invariant variable are wiped out by the -fe- machinery;
2) check it yourself as I cannot say. Please consider that Driscoll-Kraay standard errors implies across panel correlation od the epsilom error
3) if you -absorb()- -year-, -i.yera- will be ruled out due to perfect collinearity with the -year- fixed effect;
4) I would cluster on groups.

Last edited by Carlo Lazzaro; 30 Mar 2022, 01:07.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#23

30 Mar 2022, 06:22

Originally posted by Carlo Lazzaro View Post

Giorgio:
1) ii depends on how you -xtset- your data. Please also note that time-invariant variable are wiped out by the -fe- machinery;
2) check it yourself as I cannot say. Please consider that Driscoll-Kraay standard errors implies across panel correlation od the epsilom error
3) if you -absorb()- -year-, -i.yera- will be ruled out due to perfect collinearity with the -year- fixed effect;
4) I would cluster on groups.

There are serial correlation and heteroschedacity in the data. I read in a post, written by Wooldridge here in the forum, that Driscoll-Kraay standard errors will take care of them.

I am xtset-ing as panelId year, so I guess:
If I absorb (year), then it is similar to adding indirectly i. year, no need to add again I. year in the RHS, right?

and on your point 4
in my code in #21, given my xtset to panelid year, if I cluster by groups instead of panelid that will be more correct, right? Or should I leave it as it is?
But does that mean for the entire sample estimation or just only for the sub-sample periods?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#24

30 Mar 2022, 07:10

Giorgio:
1) serial correlation is within panel; DK standard errors were conceived to deal with cross-sectional dependence: which one you detected?
2) correct about -i.year-;
3) if I had to choose between Clustering on -panelid- or groups- I would go -panelid- for the whole sample.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#25

30 Mar 2022, 16:28

Originally posted by Carlo Lazzaro View Post

Giorgio:
1) serial correlation is within panel; DK standard errors were conceived to deal with cross-sectional dependence: which one you detected?
2) correct about -i.year-;
3) if I had to choose between Clustering on -panelid- or groups- I would go -panelid- for the whole sample.

That what I had in mind
I will cluster first for the panel id and then for the groups as I become interest in the sub-period.
I was referring to this on DK standard errors

https://www.statalist.org/forums/for...-in-panel-data

I detected serial correlation and heteroschedacity . The test for serial dependence is positive as well. A big mess I guess... I think clustering by panelid or groups as discussed above, will be sufficient to deal with them or it is just a condition to live with, accepted as per data structure?

Grazie infinite di tutto!!!!
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#26

30 Mar 2022, 23:48

Originally posted by Giorgio Di Stefano View Post

Carlo,
I am dealing with T>N, but I switch also to N>T to estimate subperiods.

I have these follow point to ask before ending this topic.

1. If I alternate between xtreg, vce(clustering panelsid) when T.>N and reghdfe when the opposite N>T, will I get significant different results by using two different commnad?

2. How could I get individual -country and time effects without creating a dummy variable and thus eating me , missing a lot of degree of freedom ?

3. .How can I include or capture time varying coefficients in the model ?

I have been trying. all night with

Code:

reghdfe,

to place

Code:

absorb(ts id) vce(cluster id,dkraay(2))

, in order to get the dkraay somewhere but tragically I am failing Several errors came up. Any of you could be of help?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#27

31 Mar 2022, 01:23

Giorgio:
let's hope that Sergio Correia will chime in.

Kind regards,
Carlo
(Stata 19.0)
Comment

Giorgio Di Stefano

Join Date: Oct 2021
Posts: 154

#28

01 Apr 2022, 07:32

Originally posted by Carlo Lazzaro View Post

Giorgio:
let's hope that Sergio Correia will chime in.

Mystery solved!

Driscoll-Kraay are currently implemented as part of the ivreghdfe package

Nonetheless, ivreghdfe does not allow me to cluster on panelid but forces to cluster on time

I therefore run ivreghdfe, varlist absorb(year id) cluster (year) dkraay(1). Perhaps, including twice time variables I was wrong as they were mutually wiped out , if I get it right. Then I only add is as cluster to ivreghdfe. But this is not what I want. I wanted to cluster on panelid and groups

I am highly worry that from my 48 panels I now get only 27 panels when I cluster, and many of my dummies are dropped as collinear, my key variable the indicators included. Same goes for my groups, when I mainly use reghdfe and ivreghdfe.
I am getting better statistically results with xtreg.

Here are my screens for my results with three models xterg reghdfe ivreghdfe

Code:

 xtreg  DGDP  L.GDP varalist (dummies ,categorical rtc)  c.indicator*##c.indicator*, vce (cluster id)fe

Code:

Fixed-effects (within) regression            Number of obs     =    412
Group variable: id            Number of groups  =    27

R-sq:            Obs per group:
within  = 0.9386            min =    1
between = 0.0781            avg =    15.3
overall = 0.0097            max =    33

            F(28,26)          =    .
corr(u_i, Xb)  = -1.0000            Prob > F          =    .

    (Std.    Err.    adjusted for 27 clusters    in id)


      |
        _cons |   48.72558    41.1911     1.18   0.248    -35.94394    133.3951
--------------+----------------------------------------------------------------
      sigma_u |  317.84804
      sigma_e |  .62119343
          rho |  .99999618   (fraction of variance due to u_i)

Code:

reghdfe DGDP  L.GDP varalist (dummies ,categorical rtc)  c.indicator*##c.indicator* ,absorb(ts id) vce(cluster id)

Code:

HDFE Linear regression                            Number of obs   =        454
Absorbing 2 HDFE groups                           F(  53,     27) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.9373
                                                  Adj R-squared   =     0.9162
                                                  Within R-sq.    =     0.8911
Number of clusters (id)      =         28         Root MSE        =     0.6735

                                     (Std. Err. adjusted for 28 clusters in id)




Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ts |        35           1          34     |
          id |        28          28           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation





-------------------------------------------------------------------------------

Code:

Model 3 ivreghdfe    
 ivreghdfe    DGDP  L.GDP varalist (dummies ,categorical rtc)  c.indicator*##c.indicator, absorb(ts id) cluster (ts)  dkraay(1)

Code:


OLS estimation
--------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on ts
and kernel-robust to common correlated disturbances (Driscoll-Kraay)
  kernel=Bartlett; bandwidth=1
  time variable (t):  ts
  group variable (i): id

Number of clusters (ts) =           32                Number of obs =      408
                                                      F(127,    31) =  1.6e+08
                                                      Prob > F      =   0.0000
Total (centered) SS     =  1209.913988                Centered R2   =   0.9359
Total (uncentered) SS   =  1209.913988                Uncentered R2 =   0.9359
Residual SS             =  77.53058225                Root MSE      =    .5503


Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ts |        32          32           0    *|
          id |        26           1          25     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
(est4 stored)

I am getting better results with xterg. What am I doing wrong?

Last edited by Giorgio Di Stefano; 01 Apr 2022, 07:42.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#29

01 Apr 2022, 07:43

Giorgio:
looking at the within R:sq and u, correlation values, your model is clearly misspecified.
I suspect (but you do not post it) that your coefficients do not reach statistical significance.
In addition, why going -ivreghfde- if your model does not suffer for endogeneity and/or why going -xtreg- if your model suffers from endogeneity?
You shoud be consistent with your methodological approach, as endogeneity depends on data, not on estimator.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#30

01 Apr 2022, 08:09

Originally posted by Carlo Lazzaro View Post

Giorgio:
looking at the within R:sq and u, correlation values, your model is clearly misspecified.
I suspect (but you do not post it) that your coefficients do not reach statistical significance.
In addition, why going -ivreghfde- if your model does not suffer for endogeneity and/or why going -xtreg- if your model suffers from endogeneity?
You shoud be consistent with your methodological approach, as endogeneity depends on data, not on estimator.

My expected main variables have some significance, not all of them but for their majority What are the steps to take in order to cure?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment