Which Panel Data Model should I use?

Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#1

Which Panel Data Model should I use?

28 Sep 2022, 11:26

Dear Statalist,

I am conducting a panel data analysis. The datasets is country years. My panel is N (cross-sectional dimension) > T (time-series dimension);
in other words: i have more countries than years in my dataset.

My interest, it's too control for the effect of years, and to analyze the difference among countries. Neither a country nor a two-way fixed-effect model is appropriate for this analysis.

So, what is the best model I could use in those case? And how to implement a year fixed-effect in a correct way?

I was thinking to use a simple reg dv iv1 iv2 etc i.year

Is that correct?

best,
Tags: None
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

28 Sep 2022, 13:19

Nobody can advise you on what model to use without a better idea of what the context is. The most I can say now is that you appear to have some high-dimensionality. Are you doing causal inference..? Like we'll need more details to comment further
1 like
Comment
Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#3

28 Sep 2022, 14:23

Dear Jared, thank you for your prompt answer.
You have some good question: i am not doing causal inference indeed, that's not my goal. My intent is more "descriptive".

It's a database with macro-variables (aggregated data); by country and year.

I want to understand how my IV varies, focusing on countries, so, on the cross-sectional dimension. That's why i want to control for time, with a one-way fixed-effect on years.

I am not interested how this data varies over time.

I want to be able to say, that DV might have a positive / negative effect on this IV: countries with higher "DV" also shows a positive / negative on my DV.

no causality claims. just a correlation.

Last edited by Martinо Cоmelli; 28 Sep 2022, 14:26.
Comment
Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#4

29 Sep 2022, 01:34

to sum up:
x/t country / years

N > T

I am not interested in causality

I am interested in the cross section aspect of it, not the longitudinal.

My main questions are:
what's the correct way to do a time fixed effect?

What best model / command for my use case? Is a pooled OLS "reg dv iv1 iv2 etc i.year" enough?

does it make sense to VCE cluster not the country but the years? or i shouldn't care about this aspect, and let all the observation to be independent (it feels wrong)

My statistical knowledge is very limited. Any help would be very much appreciated.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

29 Sep 2022, 02:19

Martino:
1) if you have a N>T panel dataset with a continuous regressand, your first choice should be -xtreg- (probably with the -fe- specification);
2) dealing with a panel dataset but disregarding is longitudinal (or T) dimension sounds weird, unless your panel dataset is actually a repeated cross-sectional study (which is a different beast).
3) if you are actually interested in detecting difference among countries controlling for -i.year- and you have a real panel dataset, you're implicitly doing a panel data regression with two-way fixed effect, as you can see from the following toy-example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. reg ln_wage i.idcode c.age##c.age i.year if idcode<=3, vce(cluster idcode)

Linear regression                               Number of obs     =         39
                                                F(1, 2)           =          .
                                                Prob > F          =          .
                                                R-squared         =     0.8139
                                                Root MSE          =     .21943

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      idcode |
          2  |  -.4183815   .0165036   -25.35   0.002    -.4893909   -.3473721
          3  |   .6579353   .7215294     0.91   0.458    -2.446555    3.762426
             |
         age |   .0773019   .0106911     7.23   0.019     .0313017    .1233021
             |
 c.age#c.age |  -.0045583    .002264    -2.01   0.182    -.0142995    .0051828
             |
        year |
         69  |   .3367906   .0914392     3.68   0.066    -.0566406    .7302218
         70  |   .2089384   .2867011     0.73   0.542    -1.024637    1.442514
         71  |   .3144116   .1619035     1.94   0.192     -.382203    1.011026
         72  |   .5888124   .4958888     1.19   0.357    -1.544825     2.72245
         73  |   .8912873   .5219448     1.71   0.230     -1.35446    3.137034
         75  |   1.246958   .6073839     2.05   0.176    -1.366404     3.86032
         77  |   1.560689   .8626802     1.81   0.212    -2.151125    5.272502
         78  |   1.941522   1.278416     1.52   0.268    -3.559059    7.442103
         80  |    2.34498   1.525965     1.54   0.264    -4.220718    8.910678
         82  |   2.698954   1.663018     1.62   0.246    -4.456435    9.854344
         83  |   2.994437    1.81452     1.65   0.241    -4.812813    10.80169
         85  |   3.538578   2.210833     1.60   0.251    -5.973868    13.05102
         87  |   3.965153   2.460506     1.61   0.248    -6.621548    14.55185
         88  |    4.40786   2.688929     1.64   0.243    -7.161667    15.97739
             |
       _cons |   1.341224   .1489003     9.01   0.012     .7005575     1.98189
------------------------------------------------------------------------------

. xtset idcode year

Panel variable: idcode (unbalanced)
 Time variable: year, 68 to 88, but with gaps
         Delta: 1 unit

. xtreg ln_wage c.age##c.age i.year if idcode<=3, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =         39
Group variable: idcode                          Number of groups  =          3

R-squared:                                      Obs per group:
     Within  = 0.7404                                         min =         12
     Between = 0.4068                                         avg =       13.0
     Overall = 0.4014                                         max =         15

                                                F(4,2)            =          .
corr(u_i, Xb) = -0.8560                         Prob > F          =          .

                                 (Std. err. adjusted for 3 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0773019   .0101936     7.58   0.017     .0334424    .1211613
             |
 c.age#c.age |  -.0045583   .0021586    -2.11   0.169    -.0138461    .0047294
             |
        year |
         69  |   .3367906   .0871839     3.86   0.061    -.0383313    .7119126
         70  |   .2089384   .2733588     0.76   0.525    -.9672295    1.385106
         71  |   .3144116   .1543689     2.04   0.179    -.3497843    .9786076
         72  |   .5888124   .4728115     1.25   0.339    -1.445531    2.623156
         73  |   .8912873   .4976548     1.79   0.215    -1.249948    3.032523
         75  |   1.246958   .5791178     2.15   0.164    -1.244785    3.738701
         77  |   1.560689   .8225333     1.90   0.198    -1.978387    5.099764
         78  |   1.941522   1.218922     1.59   0.252    -3.303077    7.186121
         80  |    2.34498   1.454951     1.61   0.248    -3.915167    8.605128
         82  |   2.698954   1.585626     1.70   0.231    -4.123442     9.52135
         83  |   2.994437   1.730077     1.73   0.226    -4.449484    10.43836
         85  |   3.538578   2.107946     1.68   0.235    -5.531183    12.60834
         87  |   3.965153      2.346     1.69   0.233     -6.12887    14.05918
         88  |    4.40786   2.563793     1.72   0.228    -6.623251    15.43897
             |
       _cons |   1.465543   .3990418     3.67   0.067    -.2513952    3.182481
-------------+----------------------------------------------------------------
     sigma_u |  .54258328
     sigma_e |  .21942548
         rho |  .85944136   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Please note that -vce(cluster idcode)- has been invoked just to give you an idea about how to code it. In practice, with less than 30/50 panels, they non-default standard errors might be misleading vs. their default counterparts.

Kind regards,
Carlo
(Stata 19.0)

Comment

Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#6

29 Sep 2022, 02:31

Carlo, thank you for your answer! grazie! I am indeed trying a repeated cross-sectional study. The data i have is cross-sectional, some macro-economic variable for a number of countries and years. Sorry for not specifying it before. It shouldn't be seen as as proper panel, I realize the title of my post is misleading. In my field, we call them pseudo-panels - but it's probably just adding confusion. The proper name is repeated cross-sectional indeed.

You say it's a is a different beast. How should I act?

Not using FE I think it makes sense, because (a) with FE my results are hard to interpret (I guess they are close to a "pure" causality, but that's not what i'm looking for) and (b) I want to focus on the variation across countries.

Given this, what model would be more appropriate for my use case?

Also, i do have less than 30/50 (pseudo) panels.

Last edited by Martinо Cоmelli; 29 Sep 2022, 02:35.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

29 Sep 2022, 02:42

Martino:
you may want to take a look at: Lebo, C., and J. MacKay. 2015. An Effective Approach to the Repeated Cross-Sectional Design American Journal of Political Science 59: 242–258.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#8

29 Sep 2022, 04:09

I was looking at it, honestly i'm more confused ahaha

"2) dealing with a panel dataset but disregarding is longitudinal (or T) dimension sounds weird, unless your panel dataset is actually a repeated cross-sectional study (which is a different beast)"

As far I understood fixed effects are well suited for removing variation between units, focusing on variation over time, within each unit. That is often desirable when we want to test causal claims, but not always. In my case I want to focus on variation over place.

Is a pooled ols with "i.year" enough?

Or, do you have some references about time-only fixed effect?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

29 Sep 2022, 04:22

Martino:
1) panel fixed effect estimator wipes out time-invariant variables. It suffers when time-varying variables show a poor within-panel variation (that is, along the T dimension).
2) you can probably go pooled OLS with -i.time- with the proviso that in RCS studies units are correlated both within and across waves (Lebo and Weber, 2015). A possible fix (that is not recommended for panel datasets) is to cluster the standard errors on -i.year- (and here the need of having a sufficient number of clusters strikes back!) (Cameron, A. C., and D. L. Miller. 2015. A Practitioner’s Guide to Cluster-Robust Inference. Journal of Human Resources 50: 317–372).

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#10

29 Sep 2022, 05:13

I’m confused about the data. How is it not a panel data set on countries?

Assuming it is a panel, it’s your choice of estimator. If you don’t want to control for unobserved country differences then don’t use country fixed effects. Pooled OLS with time dummies is fine. But you need to deal with serial correlation in standard error calculations. If T is not “too large” relative to N then use vce(cluster country).
1 like
Comment
Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#11

29 Sep 2022, 06:25

Carlo, thank you for your ideas.

1) Got it.
2) Units are definitely correlated "both within and across waves", like it happens for all macro-economics data. I will look into clustering years. Thank you for the useful references.

Jeff

It's a pseudo panel, no? But not a panel. It's not some observation of the same individual in 2 points in time. It's an abstraction, in which the same information is asked to an independent sample at each wave.

"Pooled OLS with time dummies is fine." thank you!

"If T is not “too large” relative to N then use vce(cluster country)."
T is large relative to N. what to do? Why clustering on years is not an option - in your opinion?

I can assume that the same macroeconomic dynamics impact all the countries at the same time, instead of acting individually in different countries.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2159
#12

29 Sep 2022, 08:55

Are you saying the data are at the individual level, with different samples for each time period? As per the FAQ, showing us an extract of the data is very helpful for getting useful advice. Maybe I read too quickly, but I still don't see what the cross-sectional unit is. Individual? Firm? If it's observations on the same countries over time, that's a panel data set. It doesn't matter how the variables were created.

Last edited by Jeff Wooldridge; 29 Sep 2022, 08:57.
1 like
Comment

Martinо Cоmelli

Join Date: Jul 2014
Posts: 40

#13

29 Sep 2022, 11:12

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3 ccode int year double(value1_2_pct_gdp value1_2_pct_gov value3_pct_gdp value3_pct_gov value4_pct_gdp value4_pct_gov value5_pct_gdp value5_pct_gov)
"ABW" 1970 . . . . . . . .
"ABW" 1971 . . . . . . . .
"ABW" 1972 . . . . . . . .
"ABW" 1973 . . . . . . . .
"ABW" 1974 . . . . . . . .
"ABW" 1975 . . . . . . . .
"ABW" 1976 . . . . . . . .
"ABW" 1977 . . . . . . . .
"ABW" 1978 . . . . . . . .
"ABW" 1979 . . . . . . . .
"ABW" 1980 . . . . . . . .
"ABW" 1981 . . . . . . . .
"ABW" 1982 . . . . . . . .
"ABW" 1983 . . . . . . . .
"ABW" 1984 . . . . . . . .
"ABW" 1985 . . . . . . . .
"ABW" 1986 . . . . . . . .
"ABW" 1987 . . . . . . . .
"ABW" 1988 . . . . . . . .
"ABW" 1989 . . . . . . . .
"ABW" 1990 . . . . . . . .
"ABW" 1991 . . . . . . . .
"ABW" 1992 . . . . . . . .
"ABW" 1993 . . . . . . . .
"ABW" 1994 . . . . . . . .
"ABW" 1995 . . . . . . . .
"ABW" 1996 . . . . . . . .
"ABW" 1997 . . . . . . . .
"ABW" 1998 . . . . . . . .
"ABW" 1999 . . . . . . . .
"ABW" 2000 . . . . . . . .
"ABW" 2001 . . . . . . . .
"ABW" 2002 . . . . . . . .
"ABW" 2003 . . . . . . . .
"ABW" 2004 . . . . . . . .
"ABW" 2005 . . . . . . . .
"ABW" 2006 . . . . . . . .
"ABW" 2007 . . . . . . . .
"ABW" 2008 . . . . . . . .
"ABW" 2009 . . . . . . . .
"ABW" 2010 . . . . . . . .
"ABW" 2011 . . . . . . . .
"ABW" 2012 . . . . . . . .
"ABW" 2013 . . . . . . . .
"ABW" 2014 . . . . . . . .
"ABW" 2015 . . . . . . . .
"ABW" 2016 . . . . . . . .
"ABW" 2017 . . . . . . . .
"ABW" 2018 . . . . . . . .
"ABW" 2019 . . . . . . . .
"AFG" 1970 . . . . . . . .
"AFG" 1971 . . . . . . . .
"AFG" 1972 . . . . . . . .
"AFG" 1973 . . . . . . . .
"AFG" 1974 . . . . . . . .
"AFG" 1975 . . . . . . . .
"AFG" 1976 . . . . . . . .
"AFG" 1977 . . . . . . . .
"AFG" 1978 . . . . . . . .
"AFG" 1979 . . . . . . . .
"AFG" 1980 . . . . . . . .
"AFG" 1981 . . . . . . . .
"AFG" 1982 . . . . . . . .
"AFG" 1983 . . . . . . . .
"AFG" 1984 . . . . . . . .
"AFG" 1985 . . . . . . . .
"AFG" 1986 . . . . . . . .
"AFG" 1987 . . . . . . . .
"AFG" 1988 . . . . . . . .
"AFG" 1989 . . . . . . . .
"AFG" 1990 . . . . . . . .
"AFG" 1991 . . . . . . . .
"AFG" 1992 . . . . . . . .
"AFG" 1993 . . . . . . . .
"AFG" 1994 . . . . . . . .
"AFG" 1995 . . . . . . . .
"AFG" 1996 . . . . . . . .
"AFG" 1997 . . . . . . . .
"AFG" 1998 . . . . . . . .
"AFG" 1999 . . . . . . . .
"AFG" 2000 . . . . . . . .
"AFG" 2001 . . . . . . . .
"AFG" 2002 . . . . . . . .
"AFG" 2003 . . . . . . . .
"AFG" 2004 . . . . . . . .
"AFG" 2005 . . . . . . . .
"AFG" 2006 . . . . . . . .
"AFG" 2007 . . . . . . . .
"AFG" 2008 . . . . . . . .
"AFG" 2009 . . . . . . . .
"AFG" 2010 . . . . . . . .
"AFG" 2011 . . . . . . . .
"AFG" 2012 . . . . . . . .
"AFG" 2013 . . . . . . . .
"AFG" 2014 . . . . . . . .
"AFG" 2015 . . . . . . . .
"AFG" 2016 . . . . . . . .
"AFG" 2017 . . . . . . . .
"AFG" 2018 . . . . . . . .
"AFG" 2019 . . . . . . . .
end

The example generated has only missing data

but that's the structure.

There are so many models i could apply pooled ols, xtreg, xtgls, xtpcse, xtregar but keeping reading discussion online is SO confusing. I hope AI can figure this out one day ahaha.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#14

29 Sep 2022, 12:15

Martino:
I'm still unclear with your dataset structure.
As far as I can get it, if "ABW" and what follows is -panelid- you clearly have a panel dataset.
While the T dimension is not negligible, you may give -xtreg,fe robust- a shot, provided that you do not have across panels correlation too.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martinо Cоmelli

Join Date: Jul 2014

Posts: 40
#15

29 Sep 2022, 12:18

ABW is a country code (the mighty Aruba), then Afghanistan and so on. It is a repeated cross-sectional of macro data
Comment

Announcement