OLS or XTREG - Statalist

Paris Rira

Join Date: Dec 2022
Posts: 383

#16

09 May 2024, 20:39

Here is the result :

Code:

 boottest immi_sh

Wild bootstrap-t, null imposed, 999 replications, Wald test, Rademacher weights:
  immi_sh

                      t(1514559) =   -17.9428
                        Prob>|t| =     0.0000

95% confidence set for null hypothesis expression: [−.1136, −.09001]

Click image for larger version

Name: boot.png
Views: 1
Size: 92.7 KB
ID: 1752889

Code:

 reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage, absorb(sector region year) cluster
> (sector region)
(MWFE estimator converged in 5 iterations)
Warning: VCV matrix was non-positive semi-definite; adjustment from Cameron, Gelbach & Miller applied.
warning: missing F statistic; dropped variables due to collinearity or too few clusters

HDFE Linear regression                            Number of obs   =  1,514,590
Absorbing 3 HDFE groups                           F(   7,      7) =          .
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.1485
                                                  Adj R-squared   =     0.1484
Number of clusters (sector)  =          8         Within R-sq.    =     0.0866
Number of clusters (region)  =          8         Root MSE        =     0.8528

                           (Std. Err. adjusted for 8 clusters in sector region)
-------------------------------------------------------------------------------
              |               Robust
ln_labor_pr~y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      immi_sh |  -.1050354   .0567003    -1.85   0.106    -.2391103    .0290395
      share_9 |   .2070373   .0226414     9.14   0.000     .1534988    .2605757
     share_12 |   .4663512    .053889     8.65   0.000      .338924    .5937783
    share_uni |   .8243444    .096452     8.55   0.000     .5962717    1.052417
      logsize |   .1792899   .0254396     7.05   0.000     .1191349     .239445
lavg_firm_age |   .0631174   .0113083     5.58   0.001     .0363775    .0898574
         lage |   .1139351   .0636955     1.79   0.117    -.0366809    .2645512
        _cons |   8.407244   .2552188    32.94   0.000     7.803747    9.010741
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      sector |         8           8           0    *|
      region |         8           8           0    *|
        year |        10           1           9     |
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation

Sector and region do not produce significant results.

Last edited by Paris Rira; 09 May 2024, 20:43.

Comment

George Ford

Join Date: Aug 2014

Posts: 3025
#17

10 May 2024, 08:53

I keep forgetting you can't boottest after reghdfe. and you can't use xtreg since your ID is firm. You'll have to areg, absorb(sector), and include i.region i.year as regressors.

Have you tried to collapse to sector?
Comment
Paris Rira

Join Date: Dec 2022

Posts: 383
#18

10 May 2024, 09:48

Originally posted by George Ford View Post

I keep forgetting you can't boottest after reghdfe.

Thats correct. I did after regression not after reghfe.

Originally posted by George Ford View Post

and you can't use xtreg since your ID is firm.

If I generate g id= _n as a panelid to use xtreg , first is it correct? second, even its correct the coffiencet became insignificant ( you know the nightmare of all students).
clusterng in sector is the same. no significanct result.

I believe that my case study is a not big economy (Portugal) .So dropping clustering and only "reghfe a (region sector year) vce(robust)" would be sufficient.
Comment
George Ford

Join Date: Aug 2014

Posts: 3025
#19

10 May 2024, 12:11

say you have firms that appear repeatedly, called firmname.

why g_id = _n is just a running series of your observations.

egen id = group(firmname)
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 383

#20

10 May 2024, 13:25

Prof Ford, I run xtreg without fe though. Because when I apply id, id omitted because of collinearity.

Code:

 egen id = group(NPC_FIC)
(1 missing value generated)

. 
end of do-file

. do "C:\Users\CeBER\AppData\Local\Temp\STD454c_000000.tmp"

. xtreg ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage i.sector i. region i.year id,vce (robust)
(1 missing value generated)

Random-effects GLS regression                   Number of obs     =  1,514,590
Group variable: NPC_FIC                         Number of groups  =    288,156

R-sq:                                           Obs per group:
     within  = 0.0013                                         min =          1
     between = 0.1346                                         avg =        5.3
     overall = 0.1198                                         max =         10

                                                Wald chi2(31)     =   36073.73
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                           (Std. Err. adjusted for 288,156 clusters in NPC_FIC)
-------------------------------------------------------------------------------
              |               Robust
ln_labor_pr~y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
      immi_sh |  -.0650435   .0084966    -7.66   0.000    -.0816966   -.0483904
      share_9 |    .135753    .005852    23.20   0.000     .1242832    .1472228
     share_12 |   .2311707     .00658    35.13   0.000     .2182742    .2440673
    share_uni |   .3962222   .0079694    49.72   0.000     .3806024     .411842
      logsize |    .058803   .0015836    37.13   0.000     .0556993    .0619068
lavg_firm_age |   .0863299   .0022868    37.75   0.000     .0818478     .090812
         lage |   .0510312   .0077095     6.62   0.000     .0359209    .0661414
              |
       sector |
           6  |   .3647818   .0051526    70.80   0.000     .3546829    .3748807
           7  |   .0198841    .004658     4.27   0.000     .0107545    .0290137
           9  |  -.4893518   .0058703   -83.36   0.000    -.5008574   -.4778462
          10  |   .0453041   .0115703     3.92   0.000     .0226266    .0679815
          11  |   .1865925   .0111949    16.67   0.000     .1646508    .2085342
          12  |   .1199651   .0066864    17.94   0.000     .1068599    .1330703
          13  |   .0263971   .0087363     3.02   0.003     .0092743    .0435199
              |
       region |
           2  |   .0695653   .0040251    17.28   0.000     .0616763    .0774543
           3  |   .1442682   .0042174    34.21   0.000     .1360022    .1525342
           4  |   .0527479   .0076499     6.90   0.000     .0377544    .0677415
           5  |   .0438303   .0069713     6.29   0.000     .0301668    .0574938
           6  |   .0720837   .0123662     5.83   0.000     .0478465    .0963209
           7  |  -.0012828   .0119015    -0.11   0.914    -.0246093    .0220437
           8  |   .9237366   .4257328     2.17   0.030     .0893157    1.758157
              |
         year |
        2011  |   -.061416   .0018981   -32.36   0.000    -.0651363   -.0576957
        2012  |  -.1022061   .0022529   -45.37   0.000    -.1066217   -.0977905
        2013  |  -.0759926   .0024264   -31.32   0.000    -.0807482    -.071237
        2014  |  -.0881164    .002547   -34.60   0.000    -.0931085   -.0831244
        2015  |   -.081666    .002628   -31.08   0.000    -.0868168   -.0765153
        2016  |  -.0781794   .0027474   -28.46   0.000    -.0835643   -.0727945
        2017  |  -.0899256    .002883   -31.19   0.000    -.0955762   -.0842751
        2018  |  -.1217728   .0030893   -39.42   0.000    -.1278277    -.115718
        2019  |  -.1479554   .0033137   -44.65   0.000    -.1544501   -.1414607
              |
           id |   7.04e-08   7.14e-09     9.86   0.000     5.64e-08    8.44e-08
        _cons |    8.78296   .0319378   275.00   0.000     8.720363    8.845557
--------------+----------------------------------------------------------------
      sigma_u |  .77669793
      sigma_e |  .54773912
          rho |  .66785618   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

. xtreg ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage i.sector i. region i.year i.id,fe vce (rob
> ust)
maxvar too small
    You have attempted to use an interaction with too many levels or attempted to fit a model with too many variables.  You need to
    increase maxvar; it is currently 5000.  Use set maxvar; see help maxvar.

    If you are using factor variables and included an interaction that has lots of missing cells, try set emptycells drop to reduce
    the required matrix size; see help set emptycells.

    If you are using factor variables, you might have accidentally treated a continuous variable as a categorical, resulting in lots
    of categories.  Use the c. operator on such variables.
r(907);

end of do-file

r(907);

Moreover, the manual pdf for ivreghdfe has been published? I could not find its pdf on the internet. I need to reference in my article this syntax.

Comment

George Ford

Join Date: Aug 2014

Posts: 3025
#21

10 May 2024, 15:45

you have id in the regression.

Try this:

egen id = group(firmname)
reghdfe ln_labor_productivity immi_sh share_9 share_12 share_uni logsize lavg_firm_age lage , absorb(id sector region year) vce(robust)

I suspect sector may wash out due to the inclusion of id, but it should estimate
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#22

10 May 2024, 16:51

Just because you want to include sector fixed effects does not mean you need to cluster by sector. As George said, 8 sectors is not enough to do much of anything. The key is: at what level is your key variable, immigrant share, or the instrumental variable, varying? Is immigrant share varying at the firm level? If so, then clustering at the firm level is probably sufficient. The real issue is whether you can get away with sector fixed effects or do you need firm FEs. You can do the Mundlak regression and test whether the time averages at the firm level are significant, using a cluster-robust test.
1 like
Comment

Paris Rira

Join Date: Dec 2022
Posts: 383

#23

10 May 2024, 18:10

Originally posted by Jeff Wooldridge View Post

Is immigrant share varying at the firm level? .

Dear Prof Thank you for getting back to me. Yes it is at firm level.
Do I need firm FEs, is a good question indeed. My assumption is that since firms donot move across sector/ region over time , so probably sector and region fixed effects will capture the effects (No need to firm FEs).

Code:

 mundlak  ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage sector region year

The variable region does not vary sufficiently within groups and will not be used to create additional regressors.
0% of the total variance in region is within groups.

+------------------------------------------------+
|             Variable |     RE     |  Mundlak   |
|----------------------+------------+------------|
|              immi_sh |     -0.075 |     -0.006 |
|              share_9 |      0.130 |      0.070 |
|             share_12 |      0.245 |      0.078 |
|            share_uni |      0.465 |      0.086 |
|              logsize |      0.051 |     -0.066 |
|        lavg_firm_age |      0.070 |      0.094 |
|                 lage |      0.097 |      0.010 |
|               sector |     -0.010 |     -0.005 |
|               region |      0.010 |      0.013 |
|                 year |     -0.010 |     -0.012 |
|        mean__immi_sh |            |     -0.148 |
|        mean__share_9 |            |      0.148 |
|       mean__share_12 |            |      0.416 |
|      mean__share_uni |            |      0.858 |
|        mean__logsize |            |      0.252 |
|  mean__lavg_firm_age |            |      0.014 |
|           mean__lage |            |      0.283 |
|         mean__sector |            |     -0.015 |
|           mean__year |            |      0.052 |
|                _cons |     29.020 |    -72.032 |
|----------------------+------------+------------|
|                    N |    1514590 |    1514590 |
|                  N_g | 288156.000 | 288156.000 |
|                g_min |      1.000 |      1.000 |
|                g_avg |      5.256 |      5.256 |
|                g_max |     10.000 |     10.000 |
|                  rho |      0.689 |      0.689 |
|                 rmse |      0.551 |      0.546 |
|                 chi2 |  14815.116 |  42487.877 |
|                    p |      0.000 |      0.000 |
|                 df_m |     10.000 |     19.000 |
|                sigma |      0.984 |      0.984 |
|              sigma_u |      0.817 |      0.817 |
|              sigma_e |      0.549 |      0.549 |
|                 r2_w |      0.000 |      0.003 |
|                 r2_o |      0.081 |      0.111 |
|                 r2_b |      0.079 |      0.116 |
+------------------------------------------------+

Comment

Paris Rira

Join Date: Dec 2022

Posts: 383
#24

11 May 2024, 12:01

Originally posted by Jeff Wooldridge View Post

You can do the Mundlak regression and test whether the time averages at the firm level are significant, using a cluster-robust test.

Prof, I dont understand this part.
What exactly should I do based on the result of Mundlak above? Moreover, in theory, firm FEs should be in the model to control for unobserved heterogeneity at the firm level. I dont add it mainly because makes the result insignificant (I know it is not a firm reason). So I try to find shortcuts either to defend my choice or add firm FEs in a way that does not spoil the significancy.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#25

12 May 2024, 04:34

That command is not very useful if it doesn’t provide standard errors. Also, you need i.sector, i.region, and i.year. After including all controls properly, you do a test on the averages. See my 2023 paper with Papke in Empirical Economics.
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 383
#26

12 May 2024, 07:40

Prof Jeff, I really like your paper. "When analyzing firm-level panel data, removing unobserved heterogeneity at a higher level of aggregation might suffice. In situations where firms are nested within sectors, addressing sector-level heterogeneity could adequately ensure the exogeneity of explanatory variables Papke and Wooldridge (2023)"

I believe that by referencing this paper I could be able to not incorporate firm fixed effects as the aggregated level (sector) does the job.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#27

12 May 2024, 07:59

It is intended for these situations, but to push the idea that aggregated FEs are enough, you really should do the test. It's not so hard. See here:

https://www.dropbox.com/sh/g5okcahdj...mWg5joUba?dl=0
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 383
#28

12 May 2024, 08:19

My situation is like this. I have firm-level data which are categorized into 8 sectors and 8 regions. Aggregated level, the sector might absorb the heterogeneity, please correct me if I am wrong.

Thanks for sharing the files. What exactly I should do in terms of my own data based on these files? meap94_98 or simulation_20221011 or simulation_power_20221011, which one is supposed to do the test? I am not professional, Prof
Comment

Paris Rira

Join Date: Dec 2022
Posts: 383

#29

12 May 2024, 13:35

Could you please assist me in concluding this thread? Based on this result, may I draw a conclusion that sector fixed effect is sufficient? Appreciated.

My data analysis is firm level, assessing the impact of immigrant on labour productivity, but I would like to use only sector-fixed effect. To this end, I run Mundlak (Mundlak, Y. (1978). On the pooling of time series and cross-section data. Econometrica, 46, 69-85). Here is the result.

Code:

bysort sector: egen mean__immi_sh = mean(immi_sh)
bysort sector: egen mean__share_9 = mean( share_9)
bysort sector: egen mean__share_12= mean(share_uni)
bysort sector: egen mean__share_uni = mean(logsize)
bysort sector: egen mean__logsize = mean(lavg_firm_age)
bysort sector: egen mean__lavg_firm_age= mean(lavg_firm_age)
bysort sector: egen mean__lage  = mean(lage)
bysort sector: egen  mean__year = mean(year)

xtset sector

qui xtreg ln_labor_productivity immi_sh share_9 share_12 share_uni  logsize lavg_firm_age lage i.year i.region mean__immi_sh mean__share_9  mean__share_12 mean__share_uni  mean__logsize    mean__lavg_firm_age mean__lage mean__year, vce(cluster sector)
estimates store mundlak
 test mean__immi_sh mean__share_9  mean__share_12 mean__share_uni  mean__logsize mean__lavg_firm_age mean__lage mean__year

 ( 1)  mean__immi_sh = 0
 ( 2)  mean__share_9 = 0
 ( 3)  mean__share_12 = 0
 ( 4)  mean__share_uni = 0
 ( 5)  mean__logsize = 0
 ( 6)  o.mean__lavg_firm_age = 0
 ( 7)  mean__lage = 0
 ( 8)  o.mean__year = 0
       Constraint 6 dropped
       Constraint 8 dropped

           chi2(  6) = 1713.65
         Prob > chi2 =    0.0000

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#30

12 May 2024, 15:03

You haven’t implemented it properly. You include the sector and region dummies, and probably interact the sector and region. The averages are computed by firm, not sector.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment