Using OLS regression for panel data?

Oliver Brock

Join Date: Feb 2023
Posts: 12

Using OLS regression for panel data?

03 Feb 2023, 12:58

Hi all, I have a question about handling panel data for my regression.

I have accounting data for 2018-2021 and one of the things I would like to investigate is the impact of covid time (2019-2020) on my dependent variable. I have solved this by having 2020-2021 take on a dummy variable with a value of 1, while 2018 and 2019 equals 0.

Since this is a panel dataset (with about 4000 company-year observations), I thought of using a fixed-effects structure.

However, in addition to the Covid variable, I also have control variables that I am interested in. Problem here is that some of them are also time invariant variables (dummy variables).

1) Does it make sense in my case to use a normal OLS regression without firm fixed effects?

2) And/or does it make sense to include the industry or year effects? I can't use xtset as far as I know, because I have multiple year values for each industry.

3) I found similar literature, they use year effects. But I don't really get this approach, since I believe that the covid variable (or any other dummy variable for a specific period of more than one year) would be omitted as it is either 0 or 1 for each company and each year? Maybe I have a problem with understand time-fixed-effects in general?

So my ideas are:

[1] using xtreg firm fixed effects

Code:

xt set cid year, yearly
xtreg x covid l_ta g_ta rev g_rev lev roa, fe robust


Fixed-effects (within) regression               Number of obs     =      3,859
Group variable: cid                             Number of groups  =      1,606

R-squared:                                      Obs per group:
     Within  = 0.0921                                         min =          1
     Between = 0.0695                                         avg =        2.4
     Overall = 0.1099                                         max =          4

                                                F(5,1605)         =          .
corr(u_i, Xb) = -0.0520                         Prob > F          =          .

                                (Std. err. adjusted for 1,606 clusters in cid)
------------------------------------------------------------------------------
             |               Robust
           x | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       covid |  -.0140561   .0024709    -5.69   0.000    -.0189026   -.0092095
        l_ta |   3.33e-10   1.22e-10     2.74   0.006     9.44e-11    5.72e-10
        g_ta |  -.0254269   .0113404    -2.24   0.025    -.0476703   -.0031834
         rev |   7.21e-11   2.45e-10     0.29   0.768    -4.08e-10    5.52e-10
       g_rev |  -.0003467   .0001308    -2.65   0.008    -.0006032   -.0000902
         lev |   .0101033   .0278399     0.36   0.717    -.0445031    .0647096
         roa |   .0928653   .0723525     1.28   0.199    -.0490501    .2347807
       _cons |  -.0639366   .0078016    -8.20   0.000    -.0792389   -.0486342
-------------+----------------------------------------------------------------
     sigma_u |   .0640711
     sigma_e |  .06053789
         rho |  .52833159   (fraction of variance due to u_i)
------------------------------------------------------------------------------

[2] using reg (with another control DUMMY named "bigfour"), optional with yearly and industry dummys

Code:

reg x covid ta rev lev roa bigfour g_ta g_rev i.year i.twodigit_sic, vce(cluster cid)

note: 2021.year omitted because of collinearity.

Linear regression                               Number of obs     =      3,859
                                                F(59, 1605)       =          .
                                                Prob > F          =          .
                                                R-squared         =     0.2600
                                                Root MSE          =     .06753

                                (Std. err. adjusted for 1,606 clusters in cid)
------------------------------------------------------------------------------
             |               Robust
           x | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       covid |  -.0116073    .003618    -3.21   0.001    -.0187039   -.0045108
          ta |   1.76e-10   8.33e-11     2.12   0.035     1.28e-11    3.40e-10
         rev |   1.67e-10   1.69e-10     0.99   0.322    -1.64e-10    4.98e-10
         lev |   .0031512   .0017026     1.85   0.064    -.0001883    .0064908
         roa |   .1478102   .0477302     3.10   0.002     .0541902    .2414301
     bigfour |   .0019411   .0031699     0.61   0.540    -.0042765    .0081586
        g_ta |  -.0001111     .00278    -0.04   0.968    -.0055639    .0053417
       g_rev |  -.0001028   .0000502    -2.05   0.041    -.0002012   -4.38e-06
             |
        year |
       2019  |  -.0035314   .0031567    -1.12   0.263    -.0097232    .0026604
       2020  |  -.0045705   .0028864    -1.58   0.114     -.010232    .0010909
       2021  |          0  (omitted)
             |
twodigit_sic |
         11  |  -.0326513   .0180882    -1.81   0.071    -.0681302    .0028277
         13  |  -.0597535   .0186302    -3.21   0.001    -.0962957   -.0232113
         14  |   .0100855   .0110892     0.91   0.363    -.0116653    .0318363
         15  |  -.0331442   .0115617    -2.87   0.004    -.0558219   -.0104666
         16  |  -.0258045   .0121684    -2.12   0.034    -.0496722   -.0019368
         17  |  -.0228705   .0134615    -1.70   0.090    -.0492744    .0035335
         20  |   -.028753   .0110764    -2.60   0.010    -.0504786   -.0070273
         22  |  -.0432182   .0140865    -3.07   0.002    -.0708481   -.0155882
         23  |  -.0609575    .013594    -4.48   0.000    -.0876214   -.0342936
         24  |  -.0233472   .0120563    -1.94   0.053    -.0469949    .0003006
         25  |  -.0344124   .0196741    -1.75   0.080     -.073002    .0041773
         26  |  -.0139861   .0118215    -1.18   0.237    -.0371733    .0092012
         27  |   .0075537   .0133156     0.57   0.571     -.018564    .0336714
         28  |  -.0337714   .0106625    -3.17   0.002    -.0546853   -.0128575
         29  |  -.0253099    .013235    -1.91   0.056    -.0512695    .0006497
         30  |  -.0218135   .0116708    -1.87   0.062    -.0447051     .001078
         31  |  -.0117472   .0122857    -0.96   0.339    -.0358449    .0123506
         32  |  -.0118238    .011187    -1.06   0.291    -.0337664    .0101189
         33  |  -.0330857   .0119646    -2.77   0.006    -.0565535   -.0096179
         34  |  -.0292258     .01224    -2.39   0.017    -.0532339   -.0052177
         35  |  -.0372191   .0108387    -3.43   0.001    -.0584785   -.0159596
         36  |  -.0553792   .0110493    -5.01   0.000    -.0770519   -.0337066
         37  |  -.0354371   .0107114    -3.31   0.001    -.0564469   -.0144274
         38  |  -.0406668   .0121054    -3.36   0.001    -.0644109   -.0169228
         39  |   -.025075   .0124073    -2.02   0.043    -.0494113   -.0007388
         42  |  -.0200684   .0132655    -1.51   0.131     -.046088    .0059511
         43  |  -.0273522    .020539    -1.33   0.183    -.0676384    .0129339
         44  |  -.0285849   .0128329    -2.23   0.026    -.0537559    -.003414
         45  |  -.0292021   .0138622    -2.11   0.035    -.0563921   -.0020121
         47  |  -.0190299   .0143599    -1.33   0.185     -.047196    .0091362
         48  |  -.0329512   .0133756    -2.46   0.014    -.0591867   -.0067156
         49  |  -.0247978   .0103086    -2.41   0.016    -.0450175    -.004578
         50  |  -.0448464   .0138254    -3.24   0.001    -.0719641   -.0177287
         51  |  -.0376785   .0123326    -3.06   0.002    -.0618683   -.0134887
         52  |   .0005802   .0159843     0.04   0.971     -.030772    .0319325
         54  |  -.0198686   .0139814    -1.42   0.155    -.0472923    .0075551
         55  |  -.0271398   .0155502    -1.75   0.081    -.0576406    .0033611
         56  |  -.0279485    .013474    -2.07   0.038    -.0543769     -.00152
         57  |  -.0288976   .0130188    -2.22   0.027    -.0544333   -.0033619
         58  |  -.0227824   .0112934    -2.02   0.044    -.0449338   -.0006311
         59  |  -.0441197   .0130006    -3.39   0.001    -.0696195   -.0186198
         70  |  -.0170041   .0111651    -1.52   0.128    -.0389038    .0048956
         72  |   .0097644   .0127679     0.76   0.445    -.0152791    .0348079
         73  |  -.0596236   .0106738    -5.59   0.000    -.0805597   -.0386876
         75  |  -.0357424   .0177867    -2.01   0.045    -.0706301   -.0008547
         78  |   -.121909   .0227795    -5.35   0.000    -.1665897   -.0772284
         79  |  -.0746395   .0178088    -4.19   0.000    -.1095704   -.0397086
         80  |  -.0278888   .0120545    -2.31   0.021    -.0515331   -.0042445
         83  |   .0039482   .0153572     0.26   0.797    -.0261741    .0340706
         87  |  -.0462577   .0134836    -3.43   0.001    -.0727051   -.0198103
         89  |  -.0202485   .0157858    -1.28   0.200    -.0512115    .0107145
             |
       _cons |  -.0213707   .0099822    -2.14   0.032    -.0409502   -.0017912

Maybe there is somebody who can help me out. Thanks in advance!

Oliver

Last edited by Oliver Brock; 03 Feb 2023, 13:00.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

03 Feb 2023, 13:11

Since this is a panel dataset (with about 4000 company-year observations), I thought of using a fixed-effects structure.

However, in addition to the Covid variable, I also have control variables that I am interested in. Problem here is that some of them are also time invariant variables (dummy variables).

Well, those two goals are in conflict and you cannot do both in the same analysis. As a compromise, consider using -xthybrid-, available from SSC.

And/or does it make sense to include the industry or year effects? I can't use xtset as far as I know, because I have multiple year values for each industry.

There is no model in which you can use both industry and company fixed effects. Pick one. If you pick industry level fixed effects, you cannot -xtset industry year- because of the repeated values. But you can still -xtset industry-. And with that you will still be able to do any meaningful analysis for this kind of data with the -xt- commands. All you lose is the inability to use autoregressive structures or leads and lags, which are undefinable in the presence of repeated time values within panels.

As for year effects, you cannot use year effects and the covid variable. Such a model is unidentifiable and, if you get results at all, they will be wrong and meaningless.

I found similar literature, they use year effects. But I don't really get this approach, since I believe that the covid variable (or any other dummy variable for a specific period of more than one year) would be omitted as it is either 0 or 1 for each company and each year? Maybe I have a problem with understand time-fixed-effects in general?

No, you understand them perfectly. Perhaps you do not understand the literature you have found. But if somebody claims to have done an analysis using both year fixed effects and a variable which defines a subset of the time periods, then they do not understand time fixed-effects. Either they are not describing their analysis properly, or they are reporting garbage results. There is no way around this: it's linear algebra.
2 likes
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#3

17 Feb 2023, 10:51

First of all, thanks for your reply, it was really helpful for me! I have some problems with understanding your suggestions, so maybe you can help me out:

Originally posted by Clyde Schechter View Post

As for year effects, you cannot use year effects and the covid variable. Such a model is unidentifiable and, if you get results at all, they will be wrong and meaningless.

But doesn't that mean that I'd also not be able to use "xtset cid year, yearly" as I described above? Or does this code only say that I want to use firm fixed effects, and not firm + year fixed effects?

Edit: I just saw another thread where you already explained that using xtset with year, yearly does not automatically mean using time-fixed effects (https://www.statalist.org/forums/for...-fixed-effects)

Originally posted by Clyde Schechter View Post

No, you understand them perfectly. Perhaps you do not understand the literature you have found. But if somebody claims to have done an analysis using both year fixed effects and a variable which defines a subset of the time periods, then they do not understand time fixed-effects. Either they are not describing their analysis properly, or they are reporting garbage results. There is no way around this: it's linear algebra.

Maybe you can have a brief look into the following paper: DOI: 10.14505/jaes.v15.2(68).04
Their approach is comparable to what I'd like to do, although they're using an Oil Crisis in GCC countries.
As you can see in Table 2 (page 301), the variable "Oil_Cri" is defined as 1, if year is in the oil crisis (2014-2016). And in their regression results (Table 6, p. 304 or Table 7, p.305) it is said that they used fixed-effects panel models with year-fixed effects (year = yes in the table)? So according to your first answer, this approach shouldn't be suitable? Maybe I understood something wrong in their analysis?

Thanks in advance!

Oliver

Last edited by Oliver Brock; 17 Feb 2023, 11:08.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

17 Feb 2023, 11:11

Oliver:
In table 7 of the paper you mention (BTW: it's a good habit to report full reference, as DOI may be broken) Authors do not report any omission of the -year- variable, and this surely happened, as you can see in the following toy example:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. g wanted=0 if year<=78

. replace wanted=1 if year>78

. xtreg ln_wage c.age##c.age i.wanted i.year, fe vce(cluster idcode)
note: 88.year omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,4709)        =      79.11
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
             |
 c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
             |
    1.wanted |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
             |
        year |
         69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
         70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
         71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
         72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
         73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
         75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
         77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
         78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
         80  |  -.1535502   .1028773    -1.49   0.136    -.3552378    .0481374
         82  |   -.151329   .0787025    -1.92   0.055    -.3056227    .0029647
         83  |  -.1317317    .066753    -1.97   0.049    -.2625987   -.0008646
         85  |  -.0862219    .042357    -2.04   0.042    -.1692615   -.0031824
         87  |  -.0662706   .0187788    -3.53   0.000    -.1030857   -.0294554
         88  |          0  (omitted)
             |
       _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------


.

Kind regards,
Carlo
(Stata 19.0)

Comment

Oliver Brock

Join Date: Feb 2023

Posts: 12
#5

17 Feb 2023, 11:44

Hi Carlo, thanks for you answer (and thanks for your note regarding the references).

As I mentioned in my opening of the thread, I tried to use the following code:

reg abs_DACC_Jones_1991 covid ta rev lev roa bigfour g_ta g_rev i.year i.twodigit_sic, vce(cluster cid) (see results above)

And if I compare it to your example, it seems like my dummy-variable (covid; 0 for 2018-2019; 1 for 2020-2021) of interest is also not omitted.

But why is it like that? As I use year- and industry-fixed effects (i. year and i.twodigit_sic), I do not understand why the covid variable is not omitted, as it does not vary for industries in the years (it's always 0 for non-covid years, no matter which industry).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17673
#6

17 Feb 2023, 11:57

Oliver:
year=2021 was omitted due to collinearity, in addition to year=2018, that was omitted to protect your regression from the dummy trap.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#7

17 Feb 2023, 14:41

I took a brief look at the paper. As best I can tell, their analysis attempted to use both the oil crisis variable (representing years 2014-2016) and year fixed effects. In the actual results in Table 7 they do not really show how they represented year. If they omitted years 2014-2016, along with a base year, from the "year fixed effects", then this would be OK--but then that is not really using year fixed effects, and they should explain that they did it and exactly how. Certainly the regression equation they represent suggests that no care was taken along these lines. So I suspect they did it wrong. But they do not provide enough information for me to be sure.

So I can't interpret the paper for you. If it is critical that you know, then you should contact one of its authors. Suffice it to say, it is mathematically impossible to use year fixed effects and the oil crisis variable together. The model would be unidentifiable and any results obtained by omitting some arbitrary variable(s) would be invalid. Lacking an adequate description of what they did, I would not wager a lot of money on this, but if I had to make a (small) bet I would bet that this paper is just presenting nonsense.

Added: I misspoke above. Where I said "If they omitted years 2014-2016, along with a base year, from the "year fixed effects", then this would be OK..." that is not correct. You simply cannot have year fixed effects and another variable that indicates some subset of the years in an identifiable model.

Last edited by Clyde Schechter; 17 Feb 2023, 15:36.
1 like
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#8

19 Feb 2023, 08:41

Carlo and Clyde, thanks for your helpful answers.

I took some time to think about the approach that I'd like to use and believe, as I am also interested in one (or more) dummy variables, that it'd be the easiest to go with the OLS regression using clustered standard errors. (I saw another thread, where Carlo mentioned that OLS rarely outperformces FE in Panel data, but maybe my study is a case like that (https://www.statalist.org/forums/for...ith-panel-data)).

I also ran the FE:
xtreg y covid x2 x3, fe robust

and the OLS:
reg y covid x2 x3, vce(cluster cid) -> cid is corporate id

and my R2 for the reg is higher (0.09) compared to the FE approach (R2 within: 0.037; R2 between: 0.041) -> note that in my field of research, low R2 (below 0.1) are not unusual.

However, I found another paper which focues on similar research. I'm still confused about two things that we already discused above. As you can see in p.5, authors say that "We also control for
annual fixed effects (Year) and industry fixed effects (Industry)". And as you can see in Table 2 below, they use the covid variable as dummy, as well as 5 control variables with dummies.

(1) How were they able to use year and industry fixed effects with a dummy for a subset of periods?
(2) For example, the control variable "Big" will be nearly time-invariant, as only a few companies in each industry will change from a big to small auditor. So how can they compute unomitted results for these dummies?

Reference for the paper:
Huanmin Yan, Zhenyu Liu, Haoyu Wang, Xuehua Zhang, Xilei Zheng, How does the COVID-19 affect earnings management: Empirical evidence from China, Research in International Business and Finance, Volume 63. https://doi.org/10.1016/j.ribaf.2022.101772.

Thanks in advance.

Oliver
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#9

19 Feb 2023, 10:49

(1) How were they able to use year and industry fixed effects with a dummy for a subset of periods?

Again, I have not carefully read this article, and perhaps I have missed something. But my response to your question is: they weren't. It appears that they have overlooked the invalidity of using time fixed effects and a variable that designates a subset of the time periods in the same regression. Their results with regard to the Covid variable are not valid, not even meaningful. Note that in their report of the regression coefficients (Table 4) all they say about year is "yes." They provide no detail about how they specified year in their regression, they do not show code nor give any explanation of that. Generally, then, one would assume they did the usual inclusion of single-year indicators for all but a single reference category. In order to get any results at all, their software must have eliminated another year indicator (or the covid indicator--but they clearly would have noticed that.) They probably either didn't notice the missing year indicator, or noticed it but failed to understand its implications. Either way, the results are simply invalid.

For example, the control variable "Big" will be nearly time-invariant, as only a few companies in each industry will change from a big to small auditor. So how can they compute unomitted results for these dummies?

Nearly time-invariant is not the same thing as fully time-invariant. The consequences of using a nearly time-invariant variable is that its effects are likely to be estimated with poor precision (high standard error, wide confidence interval). But the estimates are still unbiased. It's a bit like the joke about the statisticians who went deer hunting. One shot and missed by 50 yards to the left, and the other shot and missed by 50 yards to the right, but on average they got the deer. That kind of hunting is what you do when you use variables that are nearly colinear. And that problem can, at least in theory, be overcome by using a sufficiently large sample size. (Whether it is feasible in the real world to get a sufficiently large sample size for this purpose is a different question.) But nearly colinear variables do not present the unresolvable difficulty presented by fully colinear variables. With fully colinear variables, something must be omitted to identify the model, and the estimates obtained for the remaining variables involved in the exact colinearity are no longer meaningful.

Last edited by Clyde Schechter; 19 Feb 2023, 10:54.
1 like
Comment
Oliver Brock

Join Date: Feb 2023

Posts: 12
#10

19 Feb 2023, 14:49

Thank you for your feedback again. It seems to me that I should read papers in general a little more critically, because so far I have assumed all the procedures of others to be uncritical and correct. I reread the paper and also can't see any reasons or explanation for the approach the authors used. Again, thank you very much! Your and Carlos replies were very helpful for me.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

#11

20 Feb 2023, 01:02

Oliver:
as an aside to Clyde's excellent guidance, I'd add:
1) the Rsq within from -xtreg,fe. and the Rsq from -regress- are calculated differently. Therefore a straight comparison between the two might be misleading;
2) while low Raq within van ve totally reliable, I 'd check the functional form of your regressand:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age##c.age i.year, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1162                                         min =          1
     Between = 0.1078                                         avg =        6.1
     Overall = 0.0932                                         max =         15

                                                F(16,4709)        =      79.11
corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
             |
 c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
             |
        year |
         69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
         70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
         71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
         72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
         73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
         75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
         77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
         78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
         80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
         82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
         83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
         85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
         87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
         88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
             |
       _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
-------------+----------------------------------------------------------------
     sigma_u |  .40275174
     sigma_e |  .30127563
         rho |  .64120306   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(24 missing values generated)

. g sq_fitted=fitted^2
(24 missing values generated)

. xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1164                                         min =          1
     Between = 0.1094                                         avg =        6.1
     Overall = 0.0941                                         max =         15

                                                F(2,4709)         =     586.29
corr(u_i, Xb) = 0.0619                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      fitted |   2.012332   .5365254     3.75   0.000     .9604909    3.064172
   sq_fitted |  -.3040363   .1616996    -1.88   0.060    -.6210431    .0129706
       _cons |  -.8379964    .443929    -1.89   0.059    -1.708305    .0323122
-------------+----------------------------------------------------------------
     sigma_u |  .40239556
     sigma_e |  .30114591
         rho |  .64099409   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1,  4709) =    3.54
            Prob > F =    0.0601

.

As the outcome of -test- does not reach statistical significance, there's no evidence of model misspecification.

Kind regards,
Carlo
(Stata 19.0)

Announcement

Using OLS regression for panel data?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment