Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • OLS or XTREG

    Dear Profs and Colleagues,
    I have panel data. year 2010-2019. panel id: id

    my equation is:
    Click image for larger version

Name:	eq.png
Views:	1
Size:	2.3 KB
ID:	1752866


    theta t is year and theta s is sector.
    Code:
     tab sector
    
         sector |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              3 |    681,289        8.47        8.47
              6 |    864,356       10.74       19.21
              7 |  2,274,306       28.27       47.48
              9 |    947,396       11.78       59.26
             10 |    162,513        2.02       61.28
             11 |    348,916        4.34       65.62
             12 |  1,198,517       14.90       80.51
             13 |  1,567,695       19.49      100.00
    ------------+-----------------------------------
          Total |  8,044,988      100.00
    
    . end
    I used this command (OLS Fixed-effects, if I am wrong in calling it so please correct me) to estimate it.
    Code:
    reg ln_labor_productivity_w immi_sh i.year i.sector,robust
    and for 2sls
    ivreg2 ln_labor_productivity_w (immi_sh = IV_normalized) i.year i.sector,first robust
    Are the commands correct? Because I think my dataset is panel and I am not eligible to use reg/ ivreg2 command.

    Any ideas are appreciated.
    Cheers,
    Paris
    Last edited by Paris Rira; 09 May 2024, 15:53.

  • #2
    ssc install reghdfe

    reghdfe ln_labor_productivity immi_sh, absorb(sector year) cluster(sector)

    you need about 30 sectors minimum to not get grief about the clustered errors.

    If you have fewer, all sorts of problems with hypothesis testing creep up. The easiest way to handle it is boottest (ssc install boottest).

    Comment


    • #3
      Thank you so much Prof Ford for backing me. I was really disappointed with my xtreg result while reg result was significant. I did your command and the result is below. Just I dont understand through reading help reghdfe , why cluster in sector while there is region factor variable as well? Moreover, what about 2sls ? Because xtivreg2 does not give significant result just like xtreg.
      Code:
      . reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region  year) cluste
      > r(sector)
      (MWFE estimator converged in 5 iterations)
      
      HDFE Linear regression                            Number of obs   =  1,514,590
      Absorbing 3 HDFE groups                           F(   7,      7) =     368.30
      Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                        R-squared       =     0.1485
                                                        Adj R-squared   =     0.1484
                                                        Within R-sq.    =     0.0866
      Number of clusters (sector)  =          8         Root MSE        =     0.8528
      
                                        (Std. Err. adjusted for 8 clusters in sector)
      -------------------------------------------------------------------------------
                    |               Robust
      ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
            immi_sh |  -.1050354   .0629109    -1.67   0.139    -.2537961    .0437253
            share_9 |   .2070373   .0258953     8.00   0.000     .1458046      .26827
           share_12 |   .4663512   .0569862     8.18   0.000     .3316002    .6011022
          share_uni |   .8243444   .1010726     8.16   0.000     .5853456    1.063343
            logsize |   .1792899   .0306894     5.84   0.001      .106721    .2518589
      lavg_firm_age |   .0631174   .0137143     4.60   0.002     .0306882    .0955466
               lage |   .1139351   .0661673     1.72   0.129    -.0425257    .2703959
              _cons |   8.407244   .2588786    32.48   0.000     7.795093    9.019395
      -------------------------------------------------------------------------------
      
      Absorbed degrees of freedom:
      -----------------------------------------------------+
       Absorbed FE | Categories  - Redundant  = Num. Coefs |
      -------------+---------------------------------------|
            sector |         8           8           0    *|
            region |         8           1           7     |
              year |        10           1           9     |
      -----------------------------------------------------+
      * = FE nested within cluster; treated as redundant for DoF computation

      Comment


      • #4
        You cluster on what you need to (read up on that). Normally, you cluster on the year and unit of observation. Including region is perhaps sensible as well.
        8 clusters is a problem. you'll need to
        Code:
        boottest immi_sh
        after you run it, or randomized inference.

        You have a ton of observations and most everything else is significant, so there's not much there. -1.67 is close, however, but probably too large with 8 clusters. Boottest will tell you.

        Using robust is no help--about a 50-50 change of significance.

        Think about your model, but don't go p-hacking.


        Comment


        • #5
          When I cluster in either region or year the coefficient becomes significant even without clustering. Still, I don't know what I should cluster as I am not familiar with this syntax or clustering. Normally I use robust and it helps.
          Code:
          . reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region  year) cluste
          > r(year)
          (MWFE estimator converged in 5 iterations)
          
          HDFE Linear regression                            Number of obs   =  1,514,590
          Absorbing 3 HDFE groups                           F(   7,      9) =    4671.90
          Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                            R-squared       =     0.1485
                                                            Adj R-squared   =     0.1484
                                                            Within R-sq.    =     0.0866
          Number of clusters (year)    =         10         Root MSE        =     0.8528
          
                                             (Std. Err. adjusted for 10 clusters in year)
          -------------------------------------------------------------------------------
                        |               Robust
          ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          --------------+----------------------------------------------------------------
                immi_sh |  -.1050354   .0139331    -7.54   0.000    -.1365543   -.0735165
                share_9 |   .2070373   .0067646    30.61   0.000     .1917348    .2223398
               share_12 |   .4663512   .0140998    33.07   0.000     .4344551    .4982472
              share_uni |   .8243444   .0170388    48.38   0.000        .7858    .8628888
                logsize |   .1792899   .0059233    30.27   0.000     .1658904    .1926894
          lavg_firm_age |   .0631174   .0031714    19.90   0.000     .0559433    .0702916
                   lage |   .1139351   .0069268    16.45   0.000     .0982656    .1296047
                  _cons |   8.407244   .0341629   246.09   0.000     8.329962    8.484526
          -------------------------------------------------------------------------------
          
          Absorbed degrees of freedom:
          -----------------------------------------------------+
           Absorbed FE | Categories  - Redundant  = Num. Coefs |
          -------------+---------------------------------------|
                sector |         8           1           7     |
                region |         8           1           7     |
                  year |        10          10           0    *|
          -----------------------------------------------------+
          * = FE nested within cluster; treated as redundant for DoF computation

          Comment


          • #6
            sector makes sense. you are attempting to deal with autocorrelation and heteroskedasticity. year alone is no help. sometimes people do sector and year, but you'd need to explain why.

            read this:
            HTML Code:
            https://economics.mit.edu/sites/default/files/2022-09/When%20Should%20You%20Adjust%20Standard%20Errors%20for%20Clustering.pdf

            Comment


            • #7
              But keep in mind you are dealing with a case of way too few clusters. In all likelihood, the standard errors are too small.

              Comment


              • #8
                Thank you so much Porf for the explanation. panelid for clustering is proper, right?
                Code:
                 reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region year) cluster
                > (id)
                (MWFE estimator converged in 5 iterations)
                
                HDFE Linear regression                            Number of obs   =  1,514,590
                Absorbing 3 HDFE groups                           F(   7,1514559) =   18594.07
                Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                                  R-squared       =     0.1485
                                                                  Adj R-squared   =     0.1484
                                                                  Within R-sq.    =     0.0866
                Number of clusters (id)      =  1,514,590         Root MSE        =     0.8528
                
                                              (Std. Err. adjusted for 1,514,590 clusters in id)
                -------------------------------------------------------------------------------
                              |               Robust
                ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                --------------+----------------------------------------------------------------
                      immi_sh |  -.1050354   .0058919   -17.83   0.000    -.1165832   -.0934876
                      share_9 |   .2070373   .0035632    58.10   0.000     .2000535     .214021
                     share_12 |   .4663512   .0039932   116.79   0.000     .4585247    .4741777
                    share_uni |   .8243444    .004805   171.56   0.000     .8149267    .8337621
                      logsize |   .1792899   .0007295   245.76   0.000       .17786    .1807198
                lavg_firm_age |   .0631174   .0008204    76.94   0.000     .0615095    .0647254
                         lage |   .1139351   .0051025    22.33   0.000     .1039344    .1239358
                        _cons |   8.407244   .0201141   417.98   0.000     8.367821    8.446667
                -------------------------------------------------------------------------------
                
                Absorbed degrees of freedom:
                -----------------------------------------------------+
                 Absorbed FE | Categories  - Redundant  = Num. Coefs |
                -------------+---------------------------------------|
                      sector |         8           0           8     |
                      region |         8           1           7     |
                        year |        10           1           9    ?|
                -----------------------------------------------------+
                ? = number of redundant parameters may be higher

                Comment


                • #9
                  Maybe easier to grasp.

                  HTML Code:
                  https://blogs.worldbank.org/en/impactevaluations/when-should-you-cluster-standard-errors-new-wisdom-econometrics-oracle
                  If you've got all sectors, then try clustering on region.

                  Comment


                  • #10
                    What is your panel id? sector?

                    Comment


                    • #11
                      No, panelid is firm ID. sectors are only 8. Actually is firm level datset. I am analyzing the impact of immigrants on firm productivity, with sector-fixed effect, region-fixed effect, dummy year, and other control variables.
                      Last edited by Paris Rira; 09 May 2024, 19:02.

                      Comment


                      • #12
                        Prof, may I ask one more question, please?
                        why does it happen? I installed already all packages and without absorb it works ( I certainly need absorb).
                        Code:
                        . ivreghdfe ln_labor_productivity (immi_sh = IV) share_9 share_12 share_uni logsize lavg_firm_age lage, absorb(sector  region year)
                        option requirements not allowed
                        r(198);
                        Thanks.

                        Comment


                        • #13
                          absorb firm results look good. go with that.

                          run that model then

                          Code:
                          boottest immi_sh
                          what is IV?

                          Comment


                          • #14
                            Wait. You've got individual firms (cluster = obs). Can't do that.

                            Try absorb(sector region)

                            Comment


                            • #15
                              Originally posted by George Ford View Post
                              absorb firm results look good. go with that.

                              run that model then

                              Code:
                              boottest immi_sh
                              what is IV?
                              IV is an instrumental variable. Actually, I uninstalled and installed the related packages, and the problem (not supporting absorb option) disappeared. And I don't get this cod boottest . I mean the connection of it with my regression. Meanwhile, I study the paper, though is not easy to understand

                              Comment

                              Working...
                              X