Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-Effects (FE) Panel Regression Model with 'reg' 'reghdfe'

    For my assignment, my sample dataset structure is : i (user:1-500) - j (platform=0/1) - t (time:1-24). Number of obs 23,450 (not perfectly structured)
    I was going to estimate the impact using Fixed-Effects (FE) panel regression model.

    If I try to use 'xtreg' : I would get an error "repeated time values ​​within panel" when run 'xtset user time' because there are rows for users 1-500 when platform = 0 and 1 respectively. So if I run 'egen panel_id = group(user platform)' and then 'xtset panel_id time', I think this method is not right either.
    I used 'reg dependent independent i.user i.time, robust' and 'reghdfe dependent independent, absorb (user time) vce (cluster user)

    However, I received a comment that the panel data setup is wrong and should come up with method properly estimated using panel data.
    Even when I ask ChatGPT: "reghdfe is indeed a fixed effects estimator, just implemented in a more flexible and efficient way.".

    Am I missing something here?
    Last edited by Jason Rhee; 21 Feb 2025, 23:07.

  • #2
    Jason;
    welcome to this forum.
    As far as I can see, what you're missing https://www.statalist.org/forums/help#adviceextras #4 .
    That said, just challenge yourself a bit more with -xtreg- abd related stuff.
    Then come back to the list with what you typed and what Stata gave you back.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Carlo:
      I assumed there would not be any issue since it was not part of the grading and was already completed—I was just personally curious. That said, thank you for your comment!

      Comment


      • #4
        You can use reghdfe for pooled data. As for as the comment about panel data setup being wrong, it's not a panel at all.

        Comment


        • #5
          Originally posted by George Ford View Post
          You can use reghdfe for pooled data. As for as the comment about panel data setup being wrong, it's not a panel at all.
          When you say "it is not a panel at all." Do you mean because number of observations are not 500 x 24 x 2 = 24,000 ? Since almost 500 users all have each row at 24 times points at j=0 and 1, I thought it is a panel data but just unbalanced.

          Comment


          • #6
            Jason:
            the (hopefully) useful advice was to read the FAQ before posting.
            I would still be interested in what you typed and what Stata gave you back.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              I saw it was unbalanced and xtset wouldn't work, so I concluded it was a pool.

              You are stacking 2 separate panels of the same users, each with 24 periods?

              Comment


              • #8
                Originally posted by George Ford View Post
                I saw it was unbalanced and xtset wouldn't work, so I concluded it was a pool.

                You are stacking 2 separate panels of the same users, each with 24 periods?
                That is correct. I used another panel dataset with i (user) – t (time) level. When I run the stata codes :

                reg dependent independent i.user i.time, robust
                xtset user time -> xtreg dependent independent i.time, fe
                reghdfe dependent independent, absorb (user time) vce (cluster user)

                I got the same result (coefficient). So, going back to the original dataset where i (user:1-500) - j (platform=0/1) - t (time:1-24), where I stacked 2 separate panels of the same users, each with 24 periods, isn't okay I run the code:

                reg dependent independent i.user i.time, robust
                reghdfe dependent independent, absorb (user time) vce (cluster user)

                and claim "estimate the impact using Fixed-Effects (FE) panel regression model" ? I think that I did come up with method properly estimated using panel data.

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Jason;
                  welcome to this forum.
                  As far as I can see, what you're missing https://www.statalist.org/forums/help#adviceextras #4 .
                  That said, just challenge yourself a bit more with -xtreg- abd related stuff.
                  Then come back to the list with what you typed and what Stata gave you back.
                  I used another panel dataset with i (user) – t (time) level. When I run the stata codes :

                  reg dependent independent i.user i.time, robust
                  xtset user time -> xtreg dependent independent i.time, fe
                  reghdfe dependent independent, absorb (user time) vce (cluster user)

                  I got the same result (coefficient). So, going back to the original dataset where i (user:1-500) - j (platform=0/1) - t (time:1-24), where I stacked 2 separate panels of the same users, each with 24 periods, isn't okay I run the code:

                  reg dependent independent i.user i.time, robust
                  reghdfe dependent independent, absorb (user time) vce (cluster user)

                  and claim "estimate the impact using Fixed-Effects (FE) panel regression model" ? I think that I did come up with method properly estimated using panel data. I do not think fixed effect panel data regression should be done only with xtset-xtreg, am I missing something here?

                  Comment


                  • #10
                    Originally posted by Jason Rhee View Post
                    I assumed there would not be any issue since it was not part of the grading and was already completed—I was just personally curious.
                    Doesn't the instructor provide guidance on completed assignments? It is your right to request it, given that he or she is paid to do so.

                    Comment


                    • #11
                      I'm confused about stacking two panels of the same users. What's the difference platform implies? I suspect you may want to know what that might be. In such case, you'd need to estimate coefficients separately for each platform. If you think the coef are the same, then absorb platform too in reghdfe.

                      Comment


                      • #12
                        Jason:
                        thanks for clarifying a bit more what you are after.
                        Some comments follow:
                        1) -reg dependent independent i.user i.time, robust- does not take within-panel autocorrelation of the epsilon into account. In fact, -robust- in -regress- accounts for heteroskedasticity only. You should impose -vce(cluster panelid)- standard errors instead. Conversely, in -xtreg- both -robust- and -vce(cluster panelid)- call the cluster-robust standard errors (put differently, they do the very same job).
                        That said, assuming you want to go -fe- with -regress-, your code should have been:
                        Code:
                        reg dependent independent i.user i.time, vce(cluster user)
                        You will get the same coefficients that you got with your code and the correct standard errors (assuming that you have at least 30 panels, and not 3 as in the following toy-example).

                        2) if you code the same with -xtreg,fe-, you get:
                        Code:
                        . xtreg ln_wage i.year if idcode<=3, fe vce(cluster idcode)
                        
                        Fixed-effects (within) regression               Number of obs     =         39
                        Group variable: idcode                          Number of groups  =          3
                        
                        R-squared:                                      Obs per group:
                             Within  = 0.5446                                         min =         12
                             Between = 0.2670                                         avg =       13.0
                             Overall = 0.3678                                         max =         15
                        
                                                                        F(3, 2)           =          .
                        corr(u_i, Xb) = -0.0356                         Prob > F          =          .
                        
                                                         (Std. err. adjusted for 3 clusters in idcode)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                                year |
                                 69  |    .208967   3.41e-08  6.1e+06   0.000     .2089668    .2089671
                                 70  |  -.2747772   .2552143    -1.08   0.394    -1.372876    .8233215
                                 71  |  -.3613911   .3640359    -0.99   0.425    -1.927711    1.204929
                                 72  |  -.2056973   .1967664    -1.05   0.406    -1.052315      .64092
                                 73  |  -.0310461   .0967648    -0.32   0.779    -.4473915    .3852993
                                 75  |   .0416271   .1575174     0.26   0.816    -.6361157      .71937
                                 77  |   .0358937   .1303686     0.28   0.809    -.5250371    .5968246
                                 78  |   .2433199   .1906609     1.28   0.330    -.5770276    1.063667
                                 80  |   .2726139   .2105344     1.29   0.325    -.6332423     1.17847
                                 82  |   .1747839   .0767088     2.28   0.150    -.1552673    .5048351
                                 83  |   .2924489    .129739     2.25   0.153    -.2657727    .8506706
                                 85  |   .3712589   .1848931     2.01   0.182    -.4242719     1.16679
                                 87  |   .2960361   .2044639     1.45   0.285    -.5837012    1.175773
                                 88  |   .3038639   .1462331     2.08   0.173    -.3253264    .9330542
                                     |
                               _cons |   1.659677   .0055719   297.86   0.000     1.635703    1.683651
                        -------------+----------------------------------------------------------------
                             sigma_u |  .24956596
                             sigma_e |  .27711004
                                 rho |  .44784468   (fraction of variance due to u_i)
                        ------------------------------------------------------------------------------
                        
                        .
                        Therefore, if your statement is that you can get the same regression coefficient for the -fe- estimator using different Stata codes, you're right.
                        However:
                        1) standard errors calculation differ;
                        2) -xtreg,fe- gives you back more information than -regress- with -i.panelid- and -i.timevar- can do.
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Originally posted by Carlo Lazzaro View Post
                          Jason:
                          thanks for clarifying a bit more what you are after.
                          Some comments follow:
                          1) -reg dependent independent i.user i.time, robust- does not take within-panel autocorrelation of the epsilon into account. In fact, -robust- in -regress- accounts for heteroskedasticity only. You should impose -vce(cluster panelid)- standard errors instead. Conversely, in -xtreg- both -robust- and -vce(cluster panelid)- call the cluster-robust standard errors (put differently, they do the very same job).
                          That said, assuming you want to go -fe- with -regress-, your code should have been:
                          Code:
                          reg dependent independent i.user i.time, vce(cluster user)
                          You will get the same coefficients that you got with your code and the correct standard errors (assuming that you have at least 30 panels, and not 3 as in the following toy-example).

                          2) if you code the same with -xtreg,fe-, you get:
                          Code:
                          . xtreg ln_wage i.year if idcode<=3, fe vce(cluster idcode)
                          
                          Fixed-effects (within) regression Number of obs = 39
                          Group variable: idcode Number of groups = 3
                          
                          R-squared: Obs per group:
                          Within = 0.5446 min = 12
                          Between = 0.2670 avg = 13.0
                          Overall = 0.3678 max = 15
                          
                          F(3, 2) = .
                          corr(u_i, Xb) = -0.0356 Prob > F = .
                          
                          (Std. err. adjusted for 3 clusters in idcode)
                          ------------------------------------------------------------------------------
                          | Robust
                          ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
                          -------------+----------------------------------------------------------------
                          year |
                          69 | .208967 3.41e-08 6.1e+06 0.000 .2089668 .2089671
                          70 | -.2747772 .2552143 -1.08 0.394 -1.372876 .8233215
                          71 | -.3613911 .3640359 -0.99 0.425 -1.927711 1.204929
                          72 | -.2056973 .1967664 -1.05 0.406 -1.052315 .64092
                          73 | -.0310461 .0967648 -0.32 0.779 -.4473915 .3852993
                          75 | .0416271 .1575174 0.26 0.816 -.6361157 .71937
                          77 | .0358937 .1303686 0.28 0.809 -.5250371 .5968246
                          78 | .2433199 .1906609 1.28 0.330 -.5770276 1.063667
                          80 | .2726139 .2105344 1.29 0.325 -.6332423 1.17847
                          82 | .1747839 .0767088 2.28 0.150 -.1552673 .5048351
                          83 | .2924489 .129739 2.25 0.153 -.2657727 .8506706
                          85 | .3712589 .1848931 2.01 0.182 -.4242719 1.16679
                          87 | .2960361 .2044639 1.45 0.285 -.5837012 1.175773
                          88 | .3038639 .1462331 2.08 0.173 -.3253264 .9330542
                          |
                          _cons | 1.659677 .0055719 297.86 0.000 1.635703 1.683651
                          -------------+----------------------------------------------------------------
                          sigma_u | .24956596
                          sigma_e | .27711004
                          rho | .44784468 (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          
                          .
                          Therefore, if your statement is that you can get the same regression coefficient for the -fe- estimator using different Stata codes, you're right.
                          However:
                          1) standard errors calculation differ;
                          2) -xtreg,fe- gives you back more information than -regress- with -i.panelid- and -i.timevar- can do.
                          Thank you so much!!! Helped me a lot to understand more perfectly. Again, thank you!!!

                          Comment

                          Working...
                          X