OLS or XTREG

Paris Rira

Join Date: Dec 2022
Posts: 384

09 May 2024, 14:49

Dear Profs and Colleagues,
I have panel data. year 2010-2019. panel id: id

my equation is:

Click image for larger version

Name: eq.png
Views: 1
Size: 2.3 KB
ID: 1752866

theta t is year and theta s is sector.

Code:

 tab sector

     sector |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |    681,289        8.47        8.47
          6 |    864,356       10.74       19.21
          7 |  2,274,306       28.27       47.48
          9 |    947,396       11.78       59.26
         10 |    162,513        2.02       61.28
         11 |    348,916        4.34       65.62
         12 |  1,198,517       14.90       80.51
         13 |  1,567,695       19.49      100.00
------------+-----------------------------------
      Total |  8,044,988      100.00

. end

I used this command (OLS Fixed-effects, if I am wrong in calling it so please correct me) to estimate it.

Code:

reg ln_labor_productivity_w immi_sh i.year i.sector,robust
and for 2sls
ivreg2 ln_labor_productivity_w (immi_sh = IV_normalized) i.year i.sector,first robust

Are the commands correct? Because I think my dataset is panel and I am not eligible to use reg/ ivreg2 command.

Any ideas are appreciated.
Cheers,
Paris

Last edited by Paris Rira; 09 May 2024, 14:53.

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3152
#2

09 May 2024, 16:48

ssc install reghdfe

reghdfe ln_labor_productivity immi_sh, absorb(sector year) cluster(sector)

you need about 30 sectors minimum to not get grief about the clustered errors.

If you have fewer, all sorts of problems with hypothesis testing creep up. The easiest way to handle it is boottest (ssc install boottest).
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

09 May 2024, 17:04

Thank you so much Prof Ford for backing me. I was really disappointed with my xtreg result while reg result was significant. I did your command and the result is below. Just I dont understand through reading help reghdfe , why cluster in sector while there is region factor variable as well? Moreover, what about 2sls ? Because xtivreg2 does not give significant result just like xtreg.

Code:

. reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region  year) cluste
> r(sector)
(MWFE estimator converged in 5 iterations)

HDFE Linear regression                            Number of obs   =  1,514,590
Absorbing 3 HDFE groups                           F(   7,      7) =     368.30
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.1485
                                                  Adj R-squared   =     0.1484
                                                  Within R-sq.    =     0.0866
Number of clusters (sector)  =          8         Root MSE        =     0.8528

                                  (Std. Err. adjusted for 8 clusters in sector)
-------------------------------------------------------------------------------
              |               Robust
ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      immi_sh |  -.1050354   .0629109    -1.67   0.139    -.2537961    .0437253
      share_9 |   .2070373   .0258953     8.00   0.000     .1458046      .26827
     share_12 |   .4663512   .0569862     8.18   0.000     .3316002    .6011022
    share_uni |   .8243444   .1010726     8.16   0.000     .5853456    1.063343
      logsize |   .1792899   .0306894     5.84   0.001      .106721    .2518589
lavg_firm_age |   .0631174   .0137143     4.60   0.002     .0306882    .0955466
         lage |   .1139351   .0661673     1.72   0.129    -.0425257    .2703959
        _cons |   8.407244   .2588786    32.48   0.000     7.795093    9.019395
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      sector |         8           8           0    *|
      region |         8           1           7     |
        year |        10           1           9     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#4

09 May 2024, 17:23

You cluster on what you need to (read up on that). Normally, you cluster on the year and unit of observation. Including region is perhaps sensible as well.
8 clusters is a problem. you'll need to

Code:

boottest immi_sh

after you run it, or randomized inference.

You have a ton of observations and most everything else is significant, so there's not much there. -1.67 is close, however, but probably too large with 8 clusters. Boottest will tell you.

Using robust is no help--about a 50-50 change of significance.

Think about your model, but don't go p-hacking.
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

09 May 2024, 17:38

When I cluster in either region or year the coefficient becomes significant even without clustering. Still, I don't know

what I should cluster as I am not familiar with this syntax or clustering. Normally I use robust and it helps.

Code:

. reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region  year) cluste
> r(year)
(MWFE estimator converged in 5 iterations)

HDFE Linear regression                            Number of obs   =  1,514,590
Absorbing 3 HDFE groups                           F(   7,      9) =    4671.90
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.1485
                                                  Adj R-squared   =     0.1484
                                                  Within R-sq.    =     0.0866
Number of clusters (year)    =         10         Root MSE        =     0.8528

                                   (Std. Err. adjusted for 10 clusters in year)
-------------------------------------------------------------------------------
              |               Robust
ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      immi_sh |  -.1050354   .0139331    -7.54   0.000    -.1365543   -.0735165
      share_9 |   .2070373   .0067646    30.61   0.000     .1917348    .2223398
     share_12 |   .4663512   .0140998    33.07   0.000     .4344551    .4982472
    share_uni |   .8243444   .0170388    48.38   0.000        .7858    .8628888
      logsize |   .1792899   .0059233    30.27   0.000     .1658904    .1926894
lavg_firm_age |   .0631174   .0031714    19.90   0.000     .0559433    .0702916
         lage |   .1139351   .0069268    16.45   0.000     .0982656    .1296047
        _cons |   8.407244   .0341629   246.09   0.000     8.329962    8.484526
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      sector |         8           1           7     |
      region |         8           1           7     |
        year |        10          10           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#6

09 May 2024, 17:41

sector makes sense. you are attempting to deal with autocorrelation and heteroskedasticity. year alone is no help. sometimes people do sector and year, but you'd need to explain why.

read this:

HTML Code:

https://economics.mit.edu/sites/default/files/2022-09/When%20Should%20You%20Adjust%20Standard%20Errors%20for%20Clustering.pdf
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#7

09 May 2024, 17:42

But keep in mind you are dealing with a case of way too few clusters. In all likelihood, the standard errors are too small.
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 384

09 May 2024, 17:49

Thank you so much Porf for the explanation. panelid for clustering is proper, right?

Code:

 reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region year) cluster
> (id)
(MWFE estimator converged in 5 iterations)

HDFE Linear regression                            Number of obs   =  1,514,590
Absorbing 3 HDFE groups                           F(   7,1514559) =   18594.07
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.1485
                                                  Adj R-squared   =     0.1484
                                                  Within R-sq.    =     0.0866
Number of clusters (id)      =  1,514,590         Root MSE        =     0.8528

                              (Std. Err. adjusted for 1,514,590 clusters in id)
-------------------------------------------------------------------------------
              |               Robust
ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      immi_sh |  -.1050354   .0058919   -17.83   0.000    -.1165832   -.0934876
      share_9 |   .2070373   .0035632    58.10   0.000     .2000535     .214021
     share_12 |   .4663512   .0039932   116.79   0.000     .4585247    .4741777
    share_uni |   .8243444    .004805   171.56   0.000     .8149267    .8337621
      logsize |   .1792899   .0007295   245.76   0.000       .17786    .1807198
lavg_firm_age |   .0631174   .0008204    76.94   0.000     .0615095    .0647254
         lage |   .1139351   .0051025    22.33   0.000     .1039344    .1239358
        _cons |   8.407244   .0201141   417.98   0.000     8.367821    8.446667
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      sector |         8           0           8     |
      region |         8           1           7     |
        year |        10           1           9    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

Comment

George Ford

Join Date: Aug 2014

Posts: 3152
#9

09 May 2024, 17:53

Maybe easier to grasp.

HTML Code:

https://blogs.worldbank.org/en/impactevaluations/when-should-you-cluster-standard-errors-new-wisdom-econometrics-oracle

If you've got all sectors, then try clustering on region.
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#10

09 May 2024, 17:54

What is your panel id? sector?
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#11

09 May 2024, 17:57

No, panelid is firm ID. sectors are only 8. Actually is firm level datset. I am analyzing the impact of immigrants on firm productivity, with sector-fixed effect, region-fixed effect, dummy year, and other control variables.

Last edited by Paris Rira; 09 May 2024, 18:02.
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#12

09 May 2024, 18:49

Prof, may I ask one more question, please?
why does it happen? I installed already all packages and without absorb it works ( I certainly need absorb).

Code:

. ivreghdfe ln_labor_productivity (immi_sh = IV) share_9 share_12 share_uni logsize lavg_firm_age lage, absorb(sector region year) option requirements not allowed r(198);

Thanks.
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#13

09 May 2024, 19:17

absorb firm results look good. go with that.

run that model then

Code:

boottest immi_sh

what is IV?
Comment
George Ford

Join Date: Aug 2014

Posts: 3152
#14

09 May 2024, 19:19

Wait. You've got individual firms (cluster = obs). Can't do that.

Try absorb(sector region)
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 384
#15

09 May 2024, 19:30

Originally posted by George Ford View Post

absorb firm results look good. go with that.

run that model then

Code:

boottest immi_sh

what is IV?

IV is an instrumental variable. Actually, I uninstalled and installed the related packages, and the problem (not supporting absorb option) disappeared. And I don't get this cod boottest . I mean the connection of it with my regression. Meanwhile, I study the paper, though is not easy to understand
Comment

Announcement

OLS or XTREG

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment