Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which Panel Data Model should I use?

    Dear Statalist,

    I am conducting a panel data analysis. The datasets is country years. My panel is N (cross-sectional dimension) > T (time-series dimension);
    in other words: i have more countries than years in my dataset.

    My interest, it's too control for the effect of years, and to analyze the difference among countries. Neither a country nor a two-way fixed-effect model is appropriate for this analysis.

    So, what is the best model I could use in those case? And how to implement a year fixed-effect in a correct way?

    I was thinking to use a simple reg dv iv1 iv2 etc i.year

    Is that correct?

    best,

  • #2
    Nobody can advise you on what model to use without a better idea of what the context is. The most I can say now is that you appear to have some high-dimensionality. Are you doing causal inference..? Like we'll need more details to comment further

    Comment


    • #3
      Dear Jared, thank you for your prompt answer.
      You have some good question: i am not doing causal inference indeed, that's not my goal. My intent is more "descriptive".

      It's a database with macro-variables (aggregated data); by country and year.

      I want to understand how my IV varies, focusing on countries, so, on the cross-sectional dimension. That's why i want to control for time, with a one-way fixed-effect on years.

      I am not interested how this data varies over time.

      I want to be able to say, that DV might have a positive / negative effect on this IV: countries with higher "DV" also shows a positive / negative on my DV.

      no causality claims. just a correlation.
      Last edited by Martinо Cоmelli; 28 Sep 2022, 14:26.

      Comment


      • #4
        to sum up:
        • x/t country / years
        • N > T
        • I am not interested in causality
        • I am interested in the cross section aspect of it, not the longitudinal.
        My main questions are:
        • what's the correct way to do a time fixed effect?
        • What best model / command for my use case? Is a pooled OLS "reg dv iv1 iv2 etc i.year" enough?
        • does it make sense to VCE cluster not the country but the years? or i shouldn't care about this aspect, and let all the observation to be independent (it feels wrong)
        My statistical knowledge is very limited. Any help would be very much appreciated.

        Comment


        • #5
          Martino:
          1) if you have a N>T panel dataset with a continuous regressand, your first choice should be -xtreg- (probably with the -fe- specification);
          2) dealing with a panel dataset but disregarding is longitudinal (or T) dimension sounds weird, unless your panel dataset is actually a repeated cross-sectional study (which is a different beast).
          3) if you are actually interested in detecting difference among countries controlling for -i.year- and you have a real panel dataset, you're implicitly doing a panel data regression with two-way fixed effect, as you can see from the following toy-example:
          Code:
          . use "https://www.stata-press.com/data/r17/nlswork.dta"
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          . reg ln_wage i.idcode c.age##c.age i.year if idcode<=3, vce(cluster idcode)
          
          Linear regression                               Number of obs     =         39
                                                          F(1, 2)           =          .
                                                          Prob > F          =          .
                                                          R-squared         =     0.8139
                                                          Root MSE          =     .21943
          
                                           (Std. err. adjusted for 3 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                idcode |
                    2  |  -.4183815   .0165036   -25.35   0.002    -.4893909   -.3473721
                    3  |   .6579353   .7215294     0.91   0.458    -2.446555    3.762426
                       |
                   age |   .0773019   .0106911     7.23   0.019     .0313017    .1233021
                       |
           c.age#c.age |  -.0045583    .002264    -2.01   0.182    -.0142995    .0051828
                       |
                  year |
                   69  |   .3367906   .0914392     3.68   0.066    -.0566406    .7302218
                   70  |   .2089384   .2867011     0.73   0.542    -1.024637    1.442514
                   71  |   .3144116   .1619035     1.94   0.192     -.382203    1.011026
                   72  |   .5888124   .4958888     1.19   0.357    -1.544825     2.72245
                   73  |   .8912873   .5219448     1.71   0.230     -1.35446    3.137034
                   75  |   1.246958   .6073839     2.05   0.176    -1.366404     3.86032
                   77  |   1.560689   .8626802     1.81   0.212    -2.151125    5.272502
                   78  |   1.941522   1.278416     1.52   0.268    -3.559059    7.442103
                   80  |    2.34498   1.525965     1.54   0.264    -4.220718    8.910678
                   82  |   2.698954   1.663018     1.62   0.246    -4.456435    9.854344
                   83  |   2.994437    1.81452     1.65   0.241    -4.812813    10.80169
                   85  |   3.538578   2.210833     1.60   0.251    -5.973868    13.05102
                   87  |   3.965153   2.460506     1.61   0.248    -6.621548    14.55185
                   88  |    4.40786   2.688929     1.64   0.243    -7.161667    15.97739
                       |
                 _cons |   1.341224   .1489003     9.01   0.012     .7005575     1.98189
          ------------------------------------------------------------------------------
          
          . xtset idcode year
          
          Panel variable: idcode (unbalanced)
           Time variable: year, 68 to 88, but with gaps
                   Delta: 1 unit
          
          . xtreg ln_wage c.age##c.age i.year if idcode<=3, fe vce(cluster idcode)
          
          Fixed-effects (within) regression               Number of obs     =         39
          Group variable: idcode                          Number of groups  =          3
          
          R-squared:                                      Obs per group:
               Within  = 0.7404                                         min =         12
               Between = 0.4068                                         avg =       13.0
               Overall = 0.4014                                         max =         15
          
                                                          F(4,2)            =          .
          corr(u_i, Xb) = -0.8560                         Prob > F          =          .
          
                                           (Std. err. adjusted for 3 clusters in idcode)
          ------------------------------------------------------------------------------
                       |               Robust
               ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   age |   .0773019   .0101936     7.58   0.017     .0334424    .1211613
                       |
           c.age#c.age |  -.0045583   .0021586    -2.11   0.169    -.0138461    .0047294
                       |
                  year |
                   69  |   .3367906   .0871839     3.86   0.061    -.0383313    .7119126
                   70  |   .2089384   .2733588     0.76   0.525    -.9672295    1.385106
                   71  |   .3144116   .1543689     2.04   0.179    -.3497843    .9786076
                   72  |   .5888124   .4728115     1.25   0.339    -1.445531    2.623156
                   73  |   .8912873   .4976548     1.79   0.215    -1.249948    3.032523
                   75  |   1.246958   .5791178     2.15   0.164    -1.244785    3.738701
                   77  |   1.560689   .8225333     1.90   0.198    -1.978387    5.099764
                   78  |   1.941522   1.218922     1.59   0.252    -3.303077    7.186121
                   80  |    2.34498   1.454951     1.61   0.248    -3.915167    8.605128
                   82  |   2.698954   1.585626     1.70   0.231    -4.123442     9.52135
                   83  |   2.994437   1.730077     1.73   0.226    -4.449484    10.43836
                   85  |   3.538578   2.107946     1.68   0.235    -5.531183    12.60834
                   87  |   3.965153      2.346     1.69   0.233     -6.12887    14.05918
                   88  |    4.40786   2.563793     1.72   0.228    -6.623251    15.43897
                       |
                 _cons |   1.465543   .3990418     3.67   0.067    -.2513952    3.182481
          -------------+----------------------------------------------------------------
               sigma_u |  .54258328
               sigma_e |  .21942548
                   rho |  .85944136   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          
          .
          Please note that -vce(cluster idcode)- has been invoked just to give you an idea about how to code it. In practice, with less than 30/50 panels, they non-default standard errors might be misleading vs. their default counterparts.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Carlo, thank you for your answer! grazie! I am indeed trying a repeated cross-sectional study. The data i have is cross-sectional, some macro-economic variable for a number of countries and years. Sorry for not specifying it before. It shouldn't be seen as as proper panel, I realize the title of my post is misleading. In my field, we call them pseudo-panels - but it's probably just adding confusion. The proper name is repeated cross-sectional indeed.

            You say it's a is a different beast. How should I act?

            Not using FE I think it makes sense, because (a) with FE my results are hard to interpret (I guess they are close to a "pure" causality, but that's not what i'm looking for) and (b) I want to focus on the variation across countries.

            Given this, what model would be more appropriate for my use case?

            Also, i do have less than 30/50 (pseudo) panels.
            Last edited by Martinо Cоmelli; 29 Sep 2022, 02:35.

            Comment


            • #7
              Martino:
              you may want to take a look at: Lebo, C., and J. MacKay. 2015. An Effective Approach to the Repeated Cross-Sectional Design American Journal of Political Science 59: 242–258.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                I was looking at it, honestly i'm more confused ahaha

                "2) dealing with a panel dataset but disregarding is longitudinal (or T) dimension sounds weird, unless your panel dataset is actually a repeated cross-sectional study (which is a different beast)"

                As far I understood fixed effects are well suited for removing variation between units, focusing on variation over time, within each unit. That is often desirable when we want to test causal claims, but not always. In my case I want to focus on variation over place.

                Is a pooled ols with "i.year" enough?

                Or, do you have some references about time-only fixed effect?

                Comment


                • #9
                  Martino:
                  1) panel fixed effect estimator wipes out time-invariant variables. It suffers when time-varying variables show a poor within-panel variation (that is, along the T dimension).
                  2) you can probably go pooled OLS with -i.time- with the proviso that in RCS studies units are correlated both within and across waves (Lebo and Weber, 2015). A possible fix (that is not recommended for panel datasets) is to cluster the standard errors on -i.year- (and here the need of having a sufficient number of clusters strikes back!) (Cameron, A. C., and D. L. Miller. 2015. A Practitioner’s Guide to Cluster-Robust Inference. Journal of Human Resources 50: 317–372).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    I’m confused about the data. How is it not a panel data set on countries?

                    Assuming it is a panel, it’s your choice of estimator. If you don’t want to control for unobserved country differences then don’t use country fixed effects. Pooled OLS with time dummies is fine. But you need to deal with serial correlation in standard error calculations. If T is not “too large” relative to N then use vce(cluster country).

                    Comment


                    • #11
                      Carlo, thank you for your ideas.

                      1) Got it.
                      2) Units are definitely correlated "both within and across waves", like it happens for all macro-economics data. I will look into clustering years. Thank you for the useful references.

                      Jeff

                      It's a pseudo panel, no? But not a panel. It's not some observation of the same individual in 2 points in time. It's an abstraction, in which the same information is asked to an independent sample at each wave.

                      "Pooled OLS with time dummies is fine." thank you!

                      "If T is not “too large” relative to N then use vce(cluster country)."
                      T is large relative to N. what to do? Why clustering on years is not an option - in your opinion?

                      I can assume that the same macroeconomic dynamics impact all the countries at the same time, instead of acting individually in different countries.

                      Comment


                      • #12
                        Are you saying the data are at the individual level, with different samples for each time period? As per the FAQ, showing us an extract of the data is very helpful for getting useful advice. Maybe I read too quickly, but I still don't see what the cross-sectional unit is. Individual? Firm? If it's observations on the same countries over time, that's a panel data set. It doesn't matter how the variables were created.
                        Last edited by Jeff Wooldridge; 29 Sep 2022, 08:57.

                        Comment


                        • #13
                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str3 ccode int year double(value1_2_pct_gdp value1_2_pct_gov value3_pct_gdp value3_pct_gov value4_pct_gdp value4_pct_gov value5_pct_gdp value5_pct_gov)
                          "ABW" 1970 . . . . . . . .
                          "ABW" 1971 . . . . . . . .
                          "ABW" 1972 . . . . . . . .
                          "ABW" 1973 . . . . . . . .
                          "ABW" 1974 . . . . . . . .
                          "ABW" 1975 . . . . . . . .
                          "ABW" 1976 . . . . . . . .
                          "ABW" 1977 . . . . . . . .
                          "ABW" 1978 . . . . . . . .
                          "ABW" 1979 . . . . . . . .
                          "ABW" 1980 . . . . . . . .
                          "ABW" 1981 . . . . . . . .
                          "ABW" 1982 . . . . . . . .
                          "ABW" 1983 . . . . . . . .
                          "ABW" 1984 . . . . . . . .
                          "ABW" 1985 . . . . . . . .
                          "ABW" 1986 . . . . . . . .
                          "ABW" 1987 . . . . . . . .
                          "ABW" 1988 . . . . . . . .
                          "ABW" 1989 . . . . . . . .
                          "ABW" 1990 . . . . . . . .
                          "ABW" 1991 . . . . . . . .
                          "ABW" 1992 . . . . . . . .
                          "ABW" 1993 . . . . . . . .
                          "ABW" 1994 . . . . . . . .
                          "ABW" 1995 . . . . . . . .
                          "ABW" 1996 . . . . . . . .
                          "ABW" 1997 . . . . . . . .
                          "ABW" 1998 . . . . . . . .
                          "ABW" 1999 . . . . . . . .
                          "ABW" 2000 . . . . . . . .
                          "ABW" 2001 . . . . . . . .
                          "ABW" 2002 . . . . . . . .
                          "ABW" 2003 . . . . . . . .
                          "ABW" 2004 . . . . . . . .
                          "ABW" 2005 . . . . . . . .
                          "ABW" 2006 . . . . . . . .
                          "ABW" 2007 . . . . . . . .
                          "ABW" 2008 . . . . . . . .
                          "ABW" 2009 . . . . . . . .
                          "ABW" 2010 . . . . . . . .
                          "ABW" 2011 . . . . . . . .
                          "ABW" 2012 . . . . . . . .
                          "ABW" 2013 . . . . . . . .
                          "ABW" 2014 . . . . . . . .
                          "ABW" 2015 . . . . . . . .
                          "ABW" 2016 . . . . . . . .
                          "ABW" 2017 . . . . . . . .
                          "ABW" 2018 . . . . . . . .
                          "ABW" 2019 . . . . . . . .
                          "AFG" 1970 . . . . . . . .
                          "AFG" 1971 . . . . . . . .
                          "AFG" 1972 . . . . . . . .
                          "AFG" 1973 . . . . . . . .
                          "AFG" 1974 . . . . . . . .
                          "AFG" 1975 . . . . . . . .
                          "AFG" 1976 . . . . . . . .
                          "AFG" 1977 . . . . . . . .
                          "AFG" 1978 . . . . . . . .
                          "AFG" 1979 . . . . . . . .
                          "AFG" 1980 . . . . . . . .
                          "AFG" 1981 . . . . . . . .
                          "AFG" 1982 . . . . . . . .
                          "AFG" 1983 . . . . . . . .
                          "AFG" 1984 . . . . . . . .
                          "AFG" 1985 . . . . . . . .
                          "AFG" 1986 . . . . . . . .
                          "AFG" 1987 . . . . . . . .
                          "AFG" 1988 . . . . . . . .
                          "AFG" 1989 . . . . . . . .
                          "AFG" 1990 . . . . . . . .
                          "AFG" 1991 . . . . . . . .
                          "AFG" 1992 . . . . . . . .
                          "AFG" 1993 . . . . . . . .
                          "AFG" 1994 . . . . . . . .
                          "AFG" 1995 . . . . . . . .
                          "AFG" 1996 . . . . . . . .
                          "AFG" 1997 . . . . . . . .
                          "AFG" 1998 . . . . . . . .
                          "AFG" 1999 . . . . . . . .
                          "AFG" 2000 . . . . . . . .
                          "AFG" 2001 . . . . . . . .
                          "AFG" 2002 . . . . . . . .
                          "AFG" 2003 . . . . . . . .
                          "AFG" 2004 . . . . . . . .
                          "AFG" 2005 . . . . . . . .
                          "AFG" 2006 . . . . . . . .
                          "AFG" 2007 . . . . . . . .
                          "AFG" 2008 . . . . . . . .
                          "AFG" 2009 . . . . . . . .
                          "AFG" 2010 . . . . . . . .
                          "AFG" 2011 . . . . . . . .
                          "AFG" 2012 . . . . . . . .
                          "AFG" 2013 . . . . . . . .
                          "AFG" 2014 . . . . . . . .
                          "AFG" 2015 . . . . . . . .
                          "AFG" 2016 . . . . . . . .
                          "AFG" 2017 . . . . . . . .
                          "AFG" 2018 . . . . . . . .
                          "AFG" 2019 . . . . . . . .
                          end
                          The example generated has only missing data but that's the structure.

                          There are so many models i could apply pooled ols, xtreg, xtgls, xtpcse, xtregar but keeping reading discussion online is SO confusing. I hope AI can figure this out one day ahaha.


                          Comment


                          • #14
                            Martino:
                            I'm still unclear with your dataset structure.
                            As far as I can get it, if "ABW" and what follows is -panelid- you clearly have a panel dataset.
                            While the T dimension is not negligible, you may give -xtreg,fe robust- a shot, provided that you do not have across panels correlation too.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              ABW is a country code (the mighty Aruba), then Afghanistan and so on. It is a repeated cross-sectional of macro data

                              Comment

                              Working...
                              X