Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Practicum due soon/autocorrelation unbalanced panel data: xtreg, cluster, xtregar, xtpcse??

    Hi all!

    I am struggling a bit to figure out which is the best command to use for OLS on unbalanced panel data with N (88 countries) > T (12 years). Not every country has the same number of observations (unbalanced). My data is xtset. My DV (corruption index) is obviously autocorrelated. I know there are various methods to deal with autocorrelation, but I am struggling to figure out which one is best for my data.

    My confusion in deciding which regression method to use stems from:

    1) I have both continuous economic covariates that vary greatly within panels (suited to re), but I also have institutional covariates that vary little (suited to fe). I was given the advice that I should choose random effects over fixed effects given my list of covariates.

    2) Stata won't let me use the command corr(AR1) because my panels are unbalanced.

    3) xtpcse seems like it could solve the problem above, but everything I've read says xtpcse is for T > N. Even after looking at the stata manual, I still don't fully understand the subcommands for xtpcse.

    4) I know that using a forwarded DV and including a contemporary IV is an option, but I don't know which commands are most suited to doing so.

    I have run the following models successfully without errors, but the significance of my coefficients varies quite a lot.

    ---------

    xtreg f.vdem_corr $vars $vars2 vdem_corr, re

    xtregar vdem_corr $vars $vars2, re

    reg f.vdem_corr $vars $vars2 vdem_corr, cluster(ccode_qog)

    xtpcse f.vdem_corr $vars $vars2 vdem_corr, hetonly detail

    xtpcse f.vdem_corr $vars $vars2 vdem_corr, independent

    ---------

    Right now xtregar seems like the best option, but I'm unsure. I am not new to stata, but I am somewhat new to working with panel data. I'm working on my 2nd year practicum, and this is my first time posting in a stata forum. I've been browsing forums for hours looking at answers to similar questions, but doing so only expanded my options rather than narrowing them down. Any help would be MUCH appreciated.

    Sincerely,

    burned out grad student

  • #2
    Hello fellow grad student.


    Please read the FAQ about how to ask a question, namely sections 10 and 12. Reading your post, there's little I can guide you on because there's no data example (please see the FAQ about this specific point) and overall not much to work with here.


    Either way, here are my general thoughts. Whoever suggested RE likely doesn't know what they're talking about. I don't care how many covariates you have, you still have unobserved, unmeasured confounding. So unless you can make a great case beyond "I have lots of time varying covariates", use unit FE.

    I've never used xtpcse; why would you want to use it (not at my computer, in bed, and I'm not getting up to check the help file)? You say your outcome data are obviously autocorrelated. Well, this isn't so obvious to me. How can you tell? Also note that there (likely) aren't subcommands, you likely mean options.

    To your 4th point, why? Why do you want to lag your dependent variable? I can't see your data, and you haven't explained what problem this solves or even if the issue exists to begin with.

    Comment


    • #3
      Peyton Day:
      welcome to this forum.
      As an aside to Jared's helpful reply, as you're dealing with a N>T panel dataset and without quantitative details from your side, I think you're best bet is:
      Code:
      xtreg vdem_corr $vars $vars2 vdem_corr, fe vce(cluster panelid)
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Peyton Day:
        welcome to this forum.
        As an aside to Jared's helpful reply, as you're dealing with a N>T panel dataset and without quantitative details from your side, I think you're best bet is:
        Code:
        xtreg vdem_corr $vars $vars2 vdem_corr, fe vce(cluster panelid)
        When I input the command you gave, I get this output. However, the output table fills in when I use the code you gave, but with re. I have not included a screenshot of the output here, but I can if you think it would be helpful.



        My professor is the one who told me to use xtregar, re. However, we only spoke for a minute and he only glanced at my covariates. That is why I'm unsure that xtregar, re is the way to go. This is the output I get when I use:

        xtregar vdem_corr $vars $vars2, re

        Click image for larger version

Name:	Screen Shot 2022-04-23 at 2.38.40 PM.png
Views:	2
Size:	835.4 KB
ID:	1661131


        Because I am somewhat new to panel data, I'm struggling to figure out what the bottom part of the output table means for the fit of this model.

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Peyton Day:
          welcome to this forum.
          As an aside to Jared's helpful reply, as you're dealing with a N>T panel dataset and without quantitative details from your side, I think you're best bet is:
          Code:
          xtreg vdem_corr $vars $vars2 vdem_corr, fe vce(cluster panelid)
          apologies, I selected the wrong size for my second image

          Click image for larger version

Name:	Screen Shot 2022-04-23 at 2.38.40 PM.png
Views:	2
Size:	835.4 KB
ID:	1661146
          Last edited by Peyton Day; 23 Apr 2022, 14:49.

          Comment


          • #6
            Originally posted by Jared Greathouse View Post
            Hello fellow grad student.


            Please read the FAQ about how to ask a question, namely sections 10 and 12. Reading your post, there's little I can guide you on because there's no data example (please see the FAQ about this specific point) and overall not much to work with here.


            Either way, here are my general thoughts. Whoever suggested RE likely doesn't know what they're talking about. I don't care how many covariates you have, you still have unobserved, unmeasured confounding. So unless you can make a great case beyond "I have lots of time varying covariates", use unit FE.

            I've never used xtpcse; why would you want to use it (not at my computer, in bed, and I'm not getting up to check the help file)? You say your outcome data are obviously autocorrelated. Well, this isn't so obvious to me. How can you tell? Also note that there (likely) aren't subcommands, you likely mean options.

            To your 4th point, why? Why do you want to lag your dependent variable? I can't see your data, and you haven't explained what problem this solves or even if the issue exists to begin with.
            Thank you for the reply.

            1. My tenured professor/director of department told me to use RE. According to him, many of my economic variables are subject to random shocks. I too was surprised to get that advice, as I know fe is more common.

            2. xtpsce stands for panel-corrected standard errors, and accounts for spatial correlation. If you are out of bed feel free to look it up in the STATA manual to see if you understand it better than I. I believe it can only utilize fixed effects, and by itself it cannot account for serial correlation. Which brings me to

            3. This is a common practice for AR(1) process:
            Click image for larger version

Name:	Screen Shot 2022-04-23 at 3.02.14 PM.png
Views:	1
Size:	545.0 KB
ID:	1661150


            I was thinking maybe I could use this method in conjunction with the best suited regression command. But like I said, I am somewhat new to panel data and not sure what that is.

            Comment


            • #7
              Yep, it's 7pm by me now and I'm at my computer. When we say "random" effects in econometrics, we usually mean allowing each slope for our units to take on it's own value (not precisely, but that's roughly it). It means when we do xtreg, re, what we're saying is that there's no unobserved, time invariant confounding that unit FE ostensibly soaks up. You don't have this here, so, the :random shocks" isn't the same as random effects.


              I still want to know about my second question though- the reason I asked is because we don't need to overcomplicate things when we don't nee to. So my question for you is "How do you know there's serial correlation between your outcomes? That is, how do you know that corruption at time t-1 is causing the current levels of corruption now?" I'm not saying it's impossible, I'm just asking have you done any empirical test to verify this? If not, you should. No need to assume a data generating process unless there's clearly good reason to do so, and with the case of corruption, I'm not sure I'd buy an AR(1) structure like I would with GDP or maybe financial/price time series.

              Also, you don't need to post images. You can just do
              Code:
              sysuse auto, clear
              reg price weight
              
                    Source |       SS           df       MS      Number of obs   =        74
              -------------+----------------------------------   F(1, 72)        =     29.42
                     Model |   184233937         1   184233937   Prob > F        =    0.0000
                  Residual |   450831459        72  6261548.04   R-squared       =    0.2901
              -------------+----------------------------------   Adj R-squared   =    0.2802
                     Total |   635065396        73  8699525.97   Root MSE        =    2502.3
              
              ------------------------------------------------------------------------------
                     price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                    weight |   2.044063   .3768341     5.42   0.000     1.292857    2.795268
                     _cons |  -6.707353    1174.43    -0.01   0.995     -2347.89    2334.475
              ------------------------------------------------------------------------------
              and it'll be much easier to see than fooling with the size of the image. As I mentioned, FAQ asks that you don't post files or images unless needed, and you don't need that for non-graphical output.


              It might just be enough to use Newey-West standard errors. Note that this is just an exmple, but you can easily account for unit or time FE like this with panel data.
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int year byte month int aces byte(time smokban) float(pop stdpop)
              2002  1  728  1 0 364277.4 379875.3
              2002  2  659  2 0 364277.4 376495.5
              2002  3  791  3 0 364277.4 377040.8
              2002  4  734  4 0 364277.4 377116.4
              2002  5  757  5 0 364277.4 377383.4
              2002  6  726  6 0 364277.4 374113.1
              2002  7  760  7 0 364277.4 379513.3
              2002  8  740  8 0 364277.4 376295.5
              2002  9  720  9 0 364277.4 374653.2
              2002 10  814 10 0 364277.4 378485.6
              2002 11  795 11 0 364277.4 375955.5
              2002 12  858 12 0 364277.4 378349.7
              2003  1  887 13 0 363350.8 376762.4
              2003  2  766 14 0 363350.8 379032.3
              2003  3  851 15 0 363350.8 379360.4
              2003  4  769 16 0 363350.8   376162
              2003  5  781 17 0 363350.8 377972.4
              2003  6  756 18 0 363350.8 381830.7
              2003  7  766 19 0 363350.8 379888.6
              2003  8  752 20 0 363350.8 380872.2
              2003  9  765 21 0 363350.8 380966.9
              2003 10  831 22 0 363350.8 381240.4
              2003 11  879 23 0 363350.8 382104.9
              2003 12  928 24 0 363350.8 381802.7
              2004  1  914 25 0 364700.4 381656.3
              2004  2  808 26 0 364700.4   383680
              2004  3  937 27 0 364700.4 383504.2
              2004  4  840 28 0 364700.4 386462.9
              2004  5  916 29 0 364700.4 383783.1
              2004  6  828 30 0 364700.4 380836.8
              2004  7  845 31 0 364700.4   383483
              2004  8  818 32 0 364700.4 380906.2
              2004  9  860 33 0 364700.4 382926.8
              2004 10  839 34 0 364700.4 384052.4
              2004 11  887 35 0 364700.4 384449.6
              2004 12  886 36 0 364700.4 383428.4
              2005  1  831 37 1 364420.8 388153.2
              2005  2  796 38 1 364420.8 388373.2
              2005  3  833 39 1 364420.8 386470.1
              2005  4  820 40 1 364420.8 386033.2
              2005  5  877 41 1 364420.8 383686.4
              2005  6  758 42 1 364420.8 385509.3
              2005  7  767 43 1 364420.8 385901.9
              2005  8  738 44 1 364420.8 386516.6
              2005  9  781 45 1 364420.8 388436.5
              2005 10  843 46 1 364420.8 383255.2
              2005 11  850 47 1 364420.8 390148.7
              2005 12  908 48 1 364420.8 385874.9
              2006  1 1021 49 1 363832.6 391613.6
              2006  2  859 50 1 363832.6 391750.4
              2006  3  976 51 1 363832.6 394005.6
              2006  4  888 52 1 363832.6 391364.9
              2006  5  962 53 1 363832.6 391664.6
              2006  6  838 54 1 363832.6 389022.3
              2006  7  810 55 1 363832.6 391878.5
              2006  8  876 56 1 363832.6 388575.3
              2006  9  843 57 1 363832.6   392989
              2006 10  936 58 1 363832.6 390018.8
              2006 11  912 59 1 363832.6 390712.3
              end
              cls
              gen rate = aces/stdpop*10^5
              
              g modate  = ym(year,month)
              *log transform the standardised population:
              gen logstdpop = log(stdpop)
              
              tsset modate, m
              *Poisson with the outcome (aces), intervention (smokban) and time as well as the population offset offset
              glm aces smokban time, family(gaussian) link(identity) vce(hac nwest 2)
              With the result being
              Code:
              Iteration 0:   log likelihood = -318.74314  
              
              Generalized linear models                         Number of obs   =         59
              Optimization     : ML                             Residual df     =         56
                                                                Scale parameter =   3038.626
              Deviance         =  170163.0685                   (1/df) Deviance =   3038.626
              Pearson          =  170163.0685                   (1/df) Pearson  =   3038.626
              
              Variance function: V(u) = 1                       [Gaussian]
              Link function    : g(u) = u                       [Identity]
              
              HAC kernel (lags): Newey–West (2)
                                                                AIC             =   10.90655
              Log likelihood   = -318.7431371                   BIC             =   169934.7
              
              ------------------------------------------------------------------------------
                           |                 HAC
                      aces | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                   smokban |  -88.90865   27.18646    -3.27   0.001    -142.1931   -35.62417
                      time |   4.595569   .6220268     7.39   0.000     3.376419    5.814719
                     _cons |   725.8431   13.91007    52.18   0.000     698.5798    753.1063
              ------------------------------------------------------------------------------
              
              
              // Note that I use OLS, this was just a toy example of using Newey-West Standard Errors.
              But, of xtregar does what you seek (maybe more efficiently than what I've done here), use that. Just be sure there's really AR(1) disturbances first.

              Comment

              Working...
              X