Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using OLS regression for panel data?

    Hi all, I have a question about handling panel data for my regression.

    I have accounting data for 2018-2021 and one of the things I would like to investigate is the impact of covid time (2019-2020) on my dependent variable. I have solved this by having 2020-2021 take on a dummy variable with a value of 1, while 2018 and 2019 equals 0.

    Since this is a panel dataset (with about 4000 company-year observations), I thought of using a fixed-effects structure.

    However, in addition to the Covid variable, I also have control variables that I am interested in. Problem here is that some of them are also time invariant variables (dummy variables).

    1) Does it make sense in my case to use a normal OLS regression without firm fixed effects?

    2) And/or does it make sense to include the industry or year effects? I can't use xtset as far as I know, because I have multiple year values for each industry.

    3) I found similar literature, they use year effects. But I don't really get this approach, since I believe that the covid variable (or any other dummy variable for a specific period of more than one year) would be omitted as it is either 0 or 1 for each company and each year? Maybe I have a problem with understand time-fixed-effects in general?



    So my ideas are:

    [1] using xtreg firm fixed effects
    Code:
    xt set cid year, yearly
    xtreg x covid l_ta g_ta rev g_rev lev roa, fe robust
    
    
    Fixed-effects (within) regression               Number of obs     =      3,859
    Group variable: cid                             Number of groups  =      1,606
    
    R-squared:                                      Obs per group:
         Within  = 0.0921                                         min =          1
         Between = 0.0695                                         avg =        2.4
         Overall = 0.1099                                         max =          4
    
                                                    F(5,1605)         =          .
    corr(u_i, Xb) = -0.0520                         Prob > F          =          .
    
                                    (Std. err. adjusted for 1,606 clusters in cid)
    ------------------------------------------------------------------------------
                 |               Robust
               x | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           covid |  -.0140561   .0024709    -5.69   0.000    -.0189026   -.0092095
            l_ta |   3.33e-10   1.22e-10     2.74   0.006     9.44e-11    5.72e-10
            g_ta |  -.0254269   .0113404    -2.24   0.025    -.0476703   -.0031834
             rev |   7.21e-11   2.45e-10     0.29   0.768    -4.08e-10    5.52e-10
           g_rev |  -.0003467   .0001308    -2.65   0.008    -.0006032   -.0000902
             lev |   .0101033   .0278399     0.36   0.717    -.0445031    .0647096
             roa |   .0928653   .0723525     1.28   0.199    -.0490501    .2347807
           _cons |  -.0639366   .0078016    -8.20   0.000    -.0792389   -.0486342
    -------------+----------------------------------------------------------------
         sigma_u |   .0640711
         sigma_e |  .06053789
             rho |  .52833159   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    [2] using reg (with another control DUMMY named "bigfour"), optional with yearly and industry dummys

    Code:
    reg x covid ta rev lev roa bigfour g_ta g_rev i.year i.twodigit_sic, vce(cluster cid)
    
    note: 2021.year omitted because of collinearity.
    
    Linear regression                               Number of obs     =      3,859
                                                    F(59, 1605)       =          .
                                                    Prob > F          =          .
                                                    R-squared         =     0.2600
                                                    Root MSE          =     .06753
    
                                    (Std. err. adjusted for 1,606 clusters in cid)
    ------------------------------------------------------------------------------
                 |               Robust
               x | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
           covid |  -.0116073    .003618    -3.21   0.001    -.0187039   -.0045108
              ta |   1.76e-10   8.33e-11     2.12   0.035     1.28e-11    3.40e-10
             rev |   1.67e-10   1.69e-10     0.99   0.322    -1.64e-10    4.98e-10
             lev |   .0031512   .0017026     1.85   0.064    -.0001883    .0064908
             roa |   .1478102   .0477302     3.10   0.002     .0541902    .2414301
         bigfour |   .0019411   .0031699     0.61   0.540    -.0042765    .0081586
            g_ta |  -.0001111     .00278    -0.04   0.968    -.0055639    .0053417
           g_rev |  -.0001028   .0000502    -2.05   0.041    -.0002012   -4.38e-06
                 |
            year |
           2019  |  -.0035314   .0031567    -1.12   0.263    -.0097232    .0026604
           2020  |  -.0045705   .0028864    -1.58   0.114     -.010232    .0010909
           2021  |          0  (omitted)
                 |
    twodigit_sic |
             11  |  -.0326513   .0180882    -1.81   0.071    -.0681302    .0028277
             13  |  -.0597535   .0186302    -3.21   0.001    -.0962957   -.0232113
             14  |   .0100855   .0110892     0.91   0.363    -.0116653    .0318363
             15  |  -.0331442   .0115617    -2.87   0.004    -.0558219   -.0104666
             16  |  -.0258045   .0121684    -2.12   0.034    -.0496722   -.0019368
             17  |  -.0228705   .0134615    -1.70   0.090    -.0492744    .0035335
             20  |   -.028753   .0110764    -2.60   0.010    -.0504786   -.0070273
             22  |  -.0432182   .0140865    -3.07   0.002    -.0708481   -.0155882
             23  |  -.0609575    .013594    -4.48   0.000    -.0876214   -.0342936
             24  |  -.0233472   .0120563    -1.94   0.053    -.0469949    .0003006
             25  |  -.0344124   .0196741    -1.75   0.080     -.073002    .0041773
             26  |  -.0139861   .0118215    -1.18   0.237    -.0371733    .0092012
             27  |   .0075537   .0133156     0.57   0.571     -.018564    .0336714
             28  |  -.0337714   .0106625    -3.17   0.002    -.0546853   -.0128575
             29  |  -.0253099    .013235    -1.91   0.056    -.0512695    .0006497
             30  |  -.0218135   .0116708    -1.87   0.062    -.0447051     .001078
             31  |  -.0117472   .0122857    -0.96   0.339    -.0358449    .0123506
             32  |  -.0118238    .011187    -1.06   0.291    -.0337664    .0101189
             33  |  -.0330857   .0119646    -2.77   0.006    -.0565535   -.0096179
             34  |  -.0292258     .01224    -2.39   0.017    -.0532339   -.0052177
             35  |  -.0372191   .0108387    -3.43   0.001    -.0584785   -.0159596
             36  |  -.0553792   .0110493    -5.01   0.000    -.0770519   -.0337066
             37  |  -.0354371   .0107114    -3.31   0.001    -.0564469   -.0144274
             38  |  -.0406668   .0121054    -3.36   0.001    -.0644109   -.0169228
             39  |   -.025075   .0124073    -2.02   0.043    -.0494113   -.0007388
             42  |  -.0200684   .0132655    -1.51   0.131     -.046088    .0059511
             43  |  -.0273522    .020539    -1.33   0.183    -.0676384    .0129339
             44  |  -.0285849   .0128329    -2.23   0.026    -.0537559    -.003414
             45  |  -.0292021   .0138622    -2.11   0.035    -.0563921   -.0020121
             47  |  -.0190299   .0143599    -1.33   0.185     -.047196    .0091362
             48  |  -.0329512   .0133756    -2.46   0.014    -.0591867   -.0067156
             49  |  -.0247978   .0103086    -2.41   0.016    -.0450175    -.004578
             50  |  -.0448464   .0138254    -3.24   0.001    -.0719641   -.0177287
             51  |  -.0376785   .0123326    -3.06   0.002    -.0618683   -.0134887
             52  |   .0005802   .0159843     0.04   0.971     -.030772    .0319325
             54  |  -.0198686   .0139814    -1.42   0.155    -.0472923    .0075551
             55  |  -.0271398   .0155502    -1.75   0.081    -.0576406    .0033611
             56  |  -.0279485    .013474    -2.07   0.038    -.0543769     -.00152
             57  |  -.0288976   .0130188    -2.22   0.027    -.0544333   -.0033619
             58  |  -.0227824   .0112934    -2.02   0.044    -.0449338   -.0006311
             59  |  -.0441197   .0130006    -3.39   0.001    -.0696195   -.0186198
             70  |  -.0170041   .0111651    -1.52   0.128    -.0389038    .0048956
             72  |   .0097644   .0127679     0.76   0.445    -.0152791    .0348079
             73  |  -.0596236   .0106738    -5.59   0.000    -.0805597   -.0386876
             75  |  -.0357424   .0177867    -2.01   0.045    -.0706301   -.0008547
             78  |   -.121909   .0227795    -5.35   0.000    -.1665897   -.0772284
             79  |  -.0746395   .0178088    -4.19   0.000    -.1095704   -.0397086
             80  |  -.0278888   .0120545    -2.31   0.021    -.0515331   -.0042445
             83  |   .0039482   .0153572     0.26   0.797    -.0261741    .0340706
             87  |  -.0462577   .0134836    -3.43   0.001    -.0727051   -.0198103
             89  |  -.0202485   .0157858    -1.28   0.200    -.0512115    .0107145
                 |
           _cons |  -.0213707   .0099822    -2.14   0.032    -.0409502   -.0017912
    Maybe there is somebody who can help me out. Thanks in advance!

    Oliver
    Last edited by Oliver Brock; 03 Feb 2023, 13:00.

  • #2
    Since this is a panel dataset (with about 4000 company-year observations), I thought of using a fixed-effects structure.

    However, in addition to the Covid variable, I also have control variables that I am interested in. Problem here is that some of them are also time invariant variables (dummy variables).
    Well, those two goals are in conflict and you cannot do both in the same analysis. As a compromise, consider using -xthybrid-, available from SSC.

    And/or does it make sense to include the industry or year effects? I can't use xtset as far as I know, because I have multiple year values for each industry.
    There is no model in which you can use both industry and company fixed effects. Pick one. If you pick industry level fixed effects, you cannot -xtset industry year- because of the repeated values. But you can still -xtset industry-. And with that you will still be able to do any meaningful analysis for this kind of data with the -xt- commands. All you lose is the inability to use autoregressive structures or leads and lags, which are undefinable in the presence of repeated time values within panels.

    As for year effects, you cannot use year effects and the covid variable. Such a model is unidentifiable and, if you get results at all, they will be wrong and meaningless.

    I found similar literature, they use year effects. But I don't really get this approach, since I believe that the covid variable (or any other dummy variable for a specific period of more than one year) would be omitted as it is either 0 or 1 for each company and each year? Maybe I have a problem with understand time-fixed-effects in general?
    No, you understand them perfectly. Perhaps you do not understand the literature you have found. But if somebody claims to have done an analysis using both year fixed effects and a variable which defines a subset of the time periods, then they do not understand time fixed-effects. Either they are not describing their analysis properly, or they are reporting garbage results. There is no way around this: it's linear algebra.

    Comment


    • #3
      First of all, thanks for your reply, it was really helpful for me! I have some problems with understanding your suggestions, so maybe you can help me out:

      Originally posted by Clyde Schechter View Post

      As for year effects, you cannot use year effects and the covid variable. Such a model is unidentifiable and, if you get results at all, they will be wrong and meaningless.
      But doesn't that mean that I'd also not be able to use "xtset cid year, yearly" as I described above? Or does this code only say that I want to use firm fixed effects, and not firm + year fixed effects?

      Edit: I just saw another thread where you already explained that using xtset with year, yearly does not automatically mean using time-fixed effects (https://www.statalist.org/forums/for...-fixed-effects)

      Originally posted by Clyde Schechter View Post

      No, you understand them perfectly. Perhaps you do not understand the literature you have found. But if somebody claims to have done an analysis using both year fixed effects and a variable which defines a subset of the time periods, then they do not understand time fixed-effects. Either they are not describing their analysis properly, or they are reporting garbage results. There is no way around this: it's linear algebra.
      Maybe you can have a brief look into the following paper: DOI: 10.14505/jaes.v15.2(68).04
      Their approach is comparable to what I'd like to do, although they're using an Oil Crisis in GCC countries.
      As you can see in Table 2 (page 301), the variable "Oil_Cri" is defined as 1, if year is in the oil crisis (2014-2016). And in their regression results (Table 6, p. 304 or Table 7, p.305) it is said that they used fixed-effects panel models with year-fixed effects (year = yes in the table)? So according to your first answer, this approach shouldn't be suitable? Maybe I understood something wrong in their analysis?

      Thanks in advance!

      Oliver
      Last edited by Oliver Brock; 17 Feb 2023, 11:08.

      Comment


      • #4
        Oliver:
        In table 7 of the paper you mention (BTW: it's a good habit to report full reference, as DOI may be broken) Authors do not report any omission of the -year- variable, and this surely happened, as you can see in the following toy example:
        Code:
        . use "https://www.stata-press.com/data/r17/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . g wanted=0 if year<=78
        
        . replace wanted=1 if year>78
        
        . xtreg ln_wage c.age##c.age i.wanted i.year, fe vce(cluster idcode)
        note: 88.year omitted because of collinearity.
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1162                                         min =          1
             Between = 0.1078                                         avg =        6.1
             Overall = 0.0932                                         max =         15
        
                                                        F(16,4709)        =      79.11
        corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
        
                                     (Std. err. adjusted for 4,710 clusters in idcode)
        ------------------------------------------------------------------------------
                     |               Robust
             ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
                     |
         c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
                     |
            1.wanted |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
                     |
                year |
                 69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
                 70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
                 71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
                 72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
                 73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
                 75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
                 77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
                 78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
                 80  |  -.1535502   .1028773    -1.49   0.136    -.3552378    .0481374
                 82  |   -.151329   .0787025    -1.92   0.055    -.3056227    .0029647
                 83  |  -.1317317    .066753    -1.97   0.049    -.2625987   -.0008646
                 85  |  -.0862219    .042357    -2.04   0.042    -.1692615   -.0031824
                 87  |  -.0662706   .0187788    -3.53   0.000    -.1030857   -.0294554
                 88  |          0  (omitted)
                     |
               _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
        -------------+----------------------------------------------------------------
             sigma_u |  .40275174
             sigma_e |  .30127563
                 rho |  .64120306   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo, thanks for you answer (and thanks for your note regarding the references).

          As I mentioned in my opening of the thread, I tried to use the following code:

          reg abs_DACC_Jones_1991 covid ta rev lev roa bigfour g_ta g_rev i.year i.twodigit_sic, vce(cluster cid) (see results above)

          And if I compare it to your example, it seems like my dummy-variable (covid; 0 for 2018-2019; 1 for 2020-2021) of interest is also not omitted.

          But why is it like that? As I use year- and industry-fixed effects (i. year and i.twodigit_sic), I do not understand why the covid variable is not omitted, as it does not vary for industries in the years (it's always 0 for non-covid years, no matter which industry).

          Comment


          • #6
            Oliver:
            year=2021 was omitted due to collinearity, in addition to year=2018, that was omitted to protect your regression from the dummy trap.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              I took a brief look at the paper. As best I can tell, their analysis attempted to use both the oil crisis variable (representing years 2014-2016) and year fixed effects. In the actual results in Table 7 they do not really show how they represented year. If they omitted years 2014-2016, along with a base year, from the "year fixed effects", then this would be OK--but then that is not really using year fixed effects, and they should explain that they did it and exactly how. Certainly the regression equation they represent suggests that no care was taken along these lines. So I suspect they did it wrong. But they do not provide enough information for me to be sure.

              So I can't interpret the paper for you. If it is critical that you know, then you should contact one of its authors. Suffice it to say, it is mathematically impossible to use year fixed effects and the oil crisis variable together. The model would be unidentifiable and any results obtained by omitting some arbitrary variable(s) would be invalid. Lacking an adequate description of what they did, I would not wager a lot of money on this, but if I had to make a (small) bet I would bet that this paper is just presenting nonsense.


              Added: I misspoke above. Where I said "If they omitted years 2014-2016, along with a base year, from the "year fixed effects", then this would be OK..." that is not correct. You simply cannot have year fixed effects and another variable that indicates some subset of the years in an identifiable model.

              Last edited by Clyde Schechter; 17 Feb 2023, 15:36.

              Comment


              • #8
                Carlo and Clyde, thanks for your helpful answers.

                I took some time to think about the approach that I'd like to use and believe, as I am also interested in one (or more) dummy variables, that it'd be the easiest to go with the OLS regression using clustered standard errors. (I saw another thread, where Carlo mentioned that OLS rarely outperformces FE in Panel data, but maybe my study is a case like that (https://www.statalist.org/forums/for...ith-panel-data)).

                I also ran the FE:
                xtreg y covid x2 x3, fe robust

                and the OLS:
                reg y covid x2 x3, vce(cluster cid) -> cid is corporate id

                and my R2 for the reg is higher (0.09) compared to the FE approach (R2 within: 0.037; R2 between: 0.041) -> note that in my field of research, low R2 (below 0.1) are not unusual.

                However, I found another paper which focues on similar research. I'm still confused about two things that we already discused above. As you can see in p.5, authors say that "We also control for
                annual fixed effects (Year) and industry fixed effects (Industry)". And as you can see in Table 2 below, they use the covid variable as dummy, as well as 5 control variables with dummies.

                (1) How were they able to use year and industry fixed effects with a dummy for a subset of periods?
                (2) For example, the control variable "Big" will be nearly time-invariant, as only a few companies in each industry will change from a big to small auditor. So how can they compute unomitted results for these dummies?

                Reference for the paper:
                Huanmin Yan, Zhenyu Liu, Haoyu Wang, Xuehua Zhang, Xilei Zheng, How does the COVID-19 affect earnings management: Empirical evidence from China, Research in International Business and Finance, Volume 63. https://doi.org/10.1016/j.ribaf.2022.101772.

                Thanks in advance.

                Oliver






                Comment


                • #9
                  (1) How were they able to use year and industry fixed effects with a dummy for a subset of periods?
                  Again, I have not carefully read this article, and perhaps I have missed something. But my response to your question is: they weren't. It appears that they have overlooked the invalidity of using time fixed effects and a variable that designates a subset of the time periods in the same regression. Their results with regard to the Covid variable are not valid, not even meaningful. Note that in their report of the regression coefficients (Table 4) all they say about year is "yes." They provide no detail about how they specified year in their regression, they do not show code nor give any explanation of that. Generally, then, one would assume they did the usual inclusion of single-year indicators for all but a single reference category. In order to get any results at all, their software must have eliminated another year indicator (or the covid indicator--but they clearly would have noticed that.) They probably either didn't notice the missing year indicator, or noticed it but failed to understand its implications. Either way, the results are simply invalid.

                  For example, the control variable "Big" will be nearly time-invariant, as only a few companies in each industry will change from a big to small auditor. So how can they compute unomitted results for these dummies?
                  Nearly time-invariant is not the same thing as fully time-invariant. The consequences of using a nearly time-invariant variable is that its effects are likely to be estimated with poor precision (high standard error, wide confidence interval). But the estimates are still unbiased. It's a bit like the joke about the statisticians who went deer hunting. One shot and missed by 50 yards to the left, and the other shot and missed by 50 yards to the right, but on average they got the deer. That kind of hunting is what you do when you use variables that are nearly colinear. And that problem can, at least in theory, be overcome by using a sufficiently large sample size. (Whether it is feasible in the real world to get a sufficiently large sample size for this purpose is a different question.) But nearly colinear variables do not present the unresolvable difficulty presented by fully colinear variables. With fully colinear variables, something must be omitted to identify the model, and the estimates obtained for the remaining variables involved in the exact colinearity are no longer meaningful.
                  Last edited by Clyde Schechter; 19 Feb 2023, 10:54.

                  Comment


                  • #10
                    Thank you for your feedback again. It seems to me that I should read papers in general a little more critically, because so far I have assumed all the procedures of others to be uncritical and correct. I reread the paper and also can't see any reasons or explanation for the approach the authors used. Again, thank you very much! Your and Carlos replies were very helpful for me.

                    Comment


                    • #11
                      Oliver:
                      as an aside to Clyde's excellent guidance, I'd add:
                      1) the Rsq within from -xtreg,fe. and the Rsq from -regress- are calculated differently. Therefore a straight comparison between the two might be misleading;
                      2) while low Raq within van ve totally reliable, I 'd check the functional form of your regressand:
                      Code:
                      . use "https://www.stata-press.com/data/r17/nlswork.dta"
                      (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
                      
                      . xtreg ln_wage c.age##c.age i.year, fe vce(cluster idcode)
                      
                      Fixed-effects (within) regression               Number of obs     =     28,510
                      Group variable: idcode                          Number of groups  =      4,710
                      
                      R-squared:                                      Obs per group:
                           Within  = 0.1162                                         min =          1
                           Between = 0.1078                                         avg =        6.1
                           Overall = 0.0932                                         max =         15
                      
                                                                      F(16,4709)        =      79.11
                      corr(u_i, Xb) = 0.0613                          Prob > F          =     0.0000
                      
                                                   (Std. err. adjusted for 4,710 clusters in idcode)
                      ------------------------------------------------------------------------------
                                   |               Robust
                           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                               age |   .0728746    .013687     5.32   0.000     .0460416    .0997075
                                   |
                       c.age#c.age |  -.0010113   .0001076    -9.40   0.000    -.0012224   -.0008003
                                   |
                              year |
                               69  |   .0647054   .0155249     4.17   0.000     .0342693    .0951415
                               70  |   .0284423   .0264639     1.07   0.283    -.0234395     .080324
                               71  |   .0579959   .0384111     1.51   0.131    -.0173078    .1332996
                               72  |   .0510671   .0502675     1.02   0.310    -.0474808     .149615
                               73  |   .0424104   .0624924     0.68   0.497    -.0801038    .1649247
                               75  |   .0151376    .086228     0.18   0.861    -.1539096    .1841848
                               77  |   .0340933   .1106841     0.31   0.758    -.1828994     .251086
                               78  |   .0537334   .1232232     0.44   0.663    -.1878417    .2953084
                               80  |   .0369475   .1473725     0.25   0.802    -.2519716    .3258667
                               82  |   .0391687   .1715621     0.23   0.819    -.2971733    .3755108
                               83  |    .058766   .1836086     0.32   0.749    -.3011928    .4187249
                               85  |   .1042758   .2080199     0.50   0.616    -.3035406    .5120922
                               87  |   .1242272   .2327328     0.53   0.594    -.3320379    .5804922
                               88  |   .1904977   .2486083     0.77   0.444    -.2968909    .6778863
                                   |
                             _cons |   .3937532   .2469015     1.59   0.111    -.0902893    .8777957
                      -------------+----------------------------------------------------------------
                           sigma_u |  .40275174
                           sigma_e |  .30127563
                               rho |  .64120306   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      
                      . predict fitted, xb
                      (24 missing values generated)
                      
                      . g sq_fitted=fitted^2
                      (24 missing values generated)
                      
                      . xtreg ln_wage fitted sq_fitted , fe vce(cluster idcode)
                      
                      Fixed-effects (within) regression               Number of obs     =     28,510
                      Group variable: idcode                          Number of groups  =      4,710
                      
                      R-squared:                                      Obs per group:
                           Within  = 0.1164                                         min =          1
                           Between = 0.1094                                         avg =        6.1
                           Overall = 0.0941                                         max =         15
                      
                                                                      F(2,4709)         =     586.29
                      corr(u_i, Xb) = 0.0619                          Prob > F          =     0.0000
                      
                                                   (Std. err. adjusted for 4,710 clusters in idcode)
                      ------------------------------------------------------------------------------
                                   |               Robust
                           ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                      -------------+----------------------------------------------------------------
                            fitted |   2.012332   .5365254     3.75   0.000     .9604909    3.064172
                         sq_fitted |  -.3040363   .1616996    -1.88   0.060    -.6210431    .0129706
                             _cons |  -.8379964    .443929    -1.89   0.059    -1.708305    .0323122
                      -------------+----------------------------------------------------------------
                           sigma_u |  .40239556
                           sigma_e |  .30114591
                               rho |  .64099409   (fraction of variance due to u_i)
                      ------------------------------------------------------------------------------
                      
                      . test sq_fitted
                      
                       ( 1)  sq_fitted = 0
                      
                             F(  1,  4709) =    3.54
                                  Prob > F =    0.0601
                      
                      .
                      As the outcome of -test- does not reach statistical significance, there's no evidence of model misspecification.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X