How to deal with ratios as dependent and independent variables in a panel data regression

Francesca Sossella

Join Date: Nov 2019

Posts: 29
#1

How to deal with ratios as dependent and independent variables in a panel data regression

16 Dec 2019, 06:17

Hello everyone,

I am trying to run a panel data regression for my thesis aimed at testing the relation between R&D intensity (R&D expenditure/net sales) and Tobin's Q (Market Capitalization/Total Assets). My variables are the following:

-Dependent Variable: Tobin's Q, (ranging between 22,84 and 0,18)

-Controlling Variables:
-Debt/Equity (ranging between 0 and 9,59)
-Return on asset: Net income/Total assets (ranging between 48,78 and -73,50)
-YoY growth sales as a percentage (ranging between 5,33 and -0,70)
-LN (total assets): I apply the ln as the literature suggests

-Independent Variable that I want to test: R&D intensity (ranging between 0,97 and 0,069)

Apart from total assets, all other variables are ratios, including also values <1. In order to have the correct unit of measurement for the correctness of the regression, should I multiply the ratios by 100? To be more clear, should I keep 18% as 0,18 or transform it to 18? As far as I know, the regression studies the impact that 1 unit change in the independent variable has on the dependent variable.

Many thanks in advance,

Francesca
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35438
#2

16 Dec 2019, 06:40

Multiplying ratios by 100 will not help much here. If your coefficients are very small or very large then changing the scale might add some clarity to a report but that's nothing to do with the correctness of a model.

The most important consideration is whether to model your response (you say dependent variable) as it comes or as say its logarithm. The main issue there is whether you are bringing the data closer to the functional form you're using.
Comment
Francesca Sossella

Join Date: Nov 2019

Posts: 29
#3

16 Dec 2019, 07:04

Thank you very much Nick! So you think that neither R&D intensity should be multiplied by 100? even if the values are all lower than 1?

Thanks in advance,

Francesca
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35438
#4

16 Dec 2019, 07:23

I am not an economist (barring Economics A level Grade A in 1969 -- only British readers are expected to understand) but -- even if I were -- I would say use the scale for R and D intensity that is customary in your field. Why should being below 1 be thought problematic at all? The coefficient will take care of itself.

As far as I can tell all your predictors are on distinct scales any way....
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#5

16 Dec 2019, 09:05

To add to Nick's helpful responses,the range of Q is not logically restricted to be between any particular bounds; generally it can be negative, although not in your data set. Your R&D variable is, by definition, restricted to be in the [0,1] interval. But as Nick mentioned, multiplying by 100 will have the effect of reducing your coefficient by a factor of 100. In other words, nothing will change in terms of interpreting the model or goodness-of-fit.

The key will be interpreting your coefficients properly, and doing so means you must know how the R&D variable is scaled.

If you show some results you will get better feedback. The bigger questions are using fixed effects, allowing for time effects, and so on.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17675
#6

16 Dec 2019, 09:23

Francesca:
two giants have already given their usually helpful replies to your query.
Placing myself firmly on their shoulders, I would add that:
- you should probably consider -xtreg-;
- provided that you use default standard errors, you should compare -fe- vs -re- specification via -hausman-;
- if you invoke non-default standard errors, switch from -hausman- to the community-contributed programme -xtoverid- (just type -search xtoverid- from within Stata to spot and install it);
- check that your regression model is correctly specified (no predictor omitted, no endogeneity);
- as per FAQ, post what you typed and what Stata gave you back (within -CODE- delimiters, please). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesca Sossella

Join Date: Nov 2019

Posts: 29
#7

16 Dec 2019, 10:06

Thank you very much everybody! This is extremely helpful. I performed the Hausman test on the model including my controlling variables to see if I should use the random effects or fixed effects. You can find attached the output. In your opinion, is there enough evidence to reject the null hypothesis and use fixed effects? Another concern I have regards the R-square, is it normal that it is this low? Thank you again in advance. Hausmann Test.smcl
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35438

16 Dec 2019, 10:15

This is Francesca's output as it should be posted:

Code:

------------------------------------------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  /Users/francescasossella/Desktop/R&D and Stock Price/R&D ORBIS/R&D Cap2/databases/Hausmann Test.smcl
  log type:  smcl
 opened on:  16 Dec 2019, 17:56:18

. xtreg TobinsQ ROA DE LNTA YoYSales, fe

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1376                                         min =          5
     between = 0.0048                                         avg =        5.0
     overall = 0.0093                                         max =          5

                                                F(4,844)          =      33.67
corr(u_i, Xb)  = -0.6633                        Prob > F          =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |    1.70617   .4240882     4.02   0.000     .8737784    2.538561
          DE |  -.1909054   .0678469    -2.81   0.005    -.3240738    -.057737
        LNTA |  -.9315958   .1143173    -8.15   0.000    -1.155975   -.7072163
    YoYSales |   .8974558   .1462864     6.13   0.000     .6103279    1.184584
       _cons |   21.70029   2.359716     9.20   0.000     17.06869    26.33189
-------------+----------------------------------------------------------------
     sigma_u |   2.499816
     sigma_e |  1.0771394
         rho |  .84340925   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(211, 844) = 13.67                   Prob > F = 0.0000

. estimates store fixed

. xtreg  TobinsQ ROA DE LNTA YoYSales, re

Random-effects GLS regression                   Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1004                                         min =          5
     between = 0.0301                                         avg =        5.0
     overall = 0.0436                                         max =          5

                                                Wald chi2(4)      =      96.31
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |   1.810335   .4094423     4.42   0.000     1.007843    2.612827
          DE |   -.227599   .0650743    -3.50   0.000    -.3551423   -.1000557
        LNTA |  -.2513169   .0559118    -4.49   0.000    -.3609019   -.1417319
    YoYSales |   .9126928   .1465755     6.23   0.000     .6254101    1.199975
       _cons |   7.619428   1.157371     6.58   0.000     5.351022    9.887833
-------------+----------------------------------------------------------------
     sigma_u |  1.6535504
     sigma_e |  1.0771394
         rho |  .70208199   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estimates store random

. hausman fixed random

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference          S.E.
-------------+----------------------------------------------------------------
         ROA |     1.70617     1.810335        -.104165        .1104892
          DE |   -.1909054     -.227599        .0366936        .0191971
        LNTA |   -.9315958    -.2513169       -.6802789        .0997112
    YoYSales |    .8974558     .9126928        -.015237               .
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
            B = inconsistent under Ha, efficient under Ho; obtained from xtreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(4) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =       10.95
                Prob>chi2 =      0.0272
                (V_b-V_B is not positive definite)

. log close
      name:  <unnamed>
       log:  /Users/francescasossella/Desktop/R&D and Stock Price/R&D ORBIS/R&D Cap2/databases/Hausmann Test.smcl
  log type:  smcl
 closed on:  16 Dec 2019, 17:57:38
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Comment

Francesca Sossella

Join Date: Nov 2019

Posts: 29
#9

16 Dec 2019, 10:23

Thank you Nick for showing the output in the appropriate way! I am still learning how to upload contents.
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17675

#10

16 Dec 2019, 10:39

Francesca:
please get yourself familiar with FAQ requiremements, as they will save tons of everyboby's time (yours included). Thanks.
That said, despite being a bit limping due to the (as usual) finite sample size, -hausman- outcome points you towards -fe- specification.
I would use the community-contributed progframme -xtoverid- as a sensitivity analysis for -hausman-.
Spotting it is easy:

Code:

search xtoverid

Then follow the instructions on the help file of -xtoverid-link to install it.

Then:

Code:

quietly xtreg  TobinsQ ROA DE LNTA YoYSales, re
xtoverid

If -xtoverid- reaches statistical significance, go -fe-: otherwise, stick with -re-.

At the top of that, you should be sure that your model is correctly specified (heteroskedaticity is a minor issue, that you can manage with robust/cluster standard error).
One of the approach to test for model misspecification is to perform ann auxiliary regression with the linear and the squared terms of the fitted values:

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage i.race tenure, re

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.2079                                         avg =        6.0
     overall = 0.1569                                         max =         15

                                                Wald chi2(3)      =    3532.05
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |  -.1345322   .0120866   -11.13   0.000    -.1582215   -.1108429
      other  |   .1039944   .0504227     2.06   0.039     .0051677    .2028211
             |
      tenure |   .0376405   .0006448    58.37   0.000     .0363767    .0389043
       _cons |    1.59266   .0066729   238.68   0.000     1.579581    1.605738
-------------+----------------------------------------------------------------
     sigma_u |  .33623102
     sigma_e |  .30357621
         rho |  .55090591   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
(433 missing values generated)

. g sq_fitted=fitted^2
(433 missing values generated)

. xtreg ln_wage fitted sq_fitted, re

Random-effects GLS regression                   Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.1049                                         min =          1
     between = 0.2051                                         avg =        6.0
     overall = 0.1612                                         max =         15

                                                Wald chi2(2)      =    3800.39
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   4.440548     .22001    20.18   0.000     4.009336     4.87176
   sq_fitted |   -.956488   .0609592   -15.69   0.000    -1.075966   -.8370101
       _cons |   -3.05509   .1969136   -15.51   0.000    -3.441034   -2.669146
-------------+----------------------------------------------------------------
     sigma_u |  .33891595
     sigma_e |  .30228563
         rho |  .55694179   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

           chi2(  1) =  246.20
         Prob > chi2 =    0.0000

As -test- outcome reaches statistical significance, the regression model is misspecified.

Last edited by Carlo Lazzaro; 16 Dec 2019, 10:42.

Kind regards,
Carlo
(Stata 19.0)

Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#11

17 Dec 2019, 01:43

Thank you Carlo. What is the exact meaning of misspecification?

This is the output I get:

Code:

.  quietly xtreg  TobinsQ ROA DE LNTA YoYSales, re

. 
. . xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re   
Sargan-Hansen statistic  66.960  Chi-sq(4)    P-value = 0.0000

. 
.  xtreg  TobinsQ ROA DE LNTA YoYSales, re

Random-effects GLS regression                   Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1004                                         min =          5
     between = 0.0301                                         avg =        5.0
     overall = 0.0436                                         max =          5

                                                Wald chi2(4)      =      96.31
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |   1.810335   .4094423     4.42   0.000     1.007843    2.612827
          DE |   -.227599   .0650743    -3.50   0.000    -.3551423   -.1000557
        LNTA |  -.2513169   .0559118    -4.49   0.000    -.3609019   -.1417319
    YoYSales |   .9126928   .1465755     6.23   0.000     .6254101    1.199975
       _cons |   7.619428   1.157371     6.58   0.000     5.351022    9.887833
-------------+----------------------------------------------------------------
     sigma_u |  1.6535504
     sigma_e |  1.0771394
         rho |  .70208199   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. predict fitted, xb
variable fitted already defined
r(110);

. g sq_fitted=fitted^2

. xtreg TobinsQ fitted sq_fitted, re

Random-effects GLS regression                   Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1025                                         min =          5
     between = 0.0297                                         avg =        5.0
     overall = 0.0431                                         max =          5

                                                Wald chi2(2)      =      98.78
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   1.270459   .2846544     4.46   0.000     .7125464    1.828371
   sq_fitted |  -.0478275   .0479585    -1.00   0.319    -.1418244    .0461694
       _cons |  -.3565275   .4406381    -0.81   0.418    -1.220162    .5071072
-------------+----------------------------------------------------------------
     sigma_u |  1.7547168
     sigma_e |  1.0974446
         rho |  .71882612   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. test sq_fitted

 ( 1)  sq_fitted = 0

           chi2(  1) =    0.99
         Prob > chi2 =    0.3186

Thanks a lot in advance,

Francesca

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17675

#12

17 Dec 2019, 02:11

Francesca:
thanks for sharing your Stata codes and outcomes within CODE delimiters.
Some comments about your post:
-xtoverid- outcome confrims that you should go -fe-.
The example I provided in my previous reply can be applied to -fe- specification, too (please note that, as expected, the -fe- estimator gets rid of the time-invariant predictor, -race-):

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

.
.  xtreg ln_wage i.race tenure, fe
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.0972                                         min =          1
     between = 0.1966                                         avg =        6.0
     overall = 0.1373                                         max =         15

                                                F(1,23401)        =    2520.15
corr(u_i, Xb)  = 0.1395                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |          0  (omitted)
      other  |          0  (omitted)
             |
      tenure |   .0341807   .0006809    50.20   0.000     .0328462    .0355153
       _cons |   1.570329   .0027935   562.14   0.000     1.564854    1.575805
-------------+----------------------------------------------------------------
     sigma_u |  .39172445
     sigma_e |  .30357621
         rho |  .62477177   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4698, 23401) = 7.80                 Prob > F = 0.0000

. predict fitted, xb
(433 missing values generated)

. g sq_fitted=fitted^2
(433 missing values generated)

.  xtreg ln_wage fitted sq_fitted , fe

Fixed-effects (within) regression               Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.1093                                         min =          1
     between = 0.2233                                         avg =        6.0
     overall = 0.1513                                         max =         15

                                                F(2,23400)        =    1435.15
corr(u_i, Xb)  = 0.1528                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   6.735689   .3231599    20.84   0.000     6.102275    7.369104
   sq_fitted |  -1.594469   .0896669   -17.78   0.000    -1.770222   -1.418716
       _cons |  -5.108404   .2891933   -17.66   0.000    -5.675242   -4.541566
-------------+----------------------------------------------------------------
     sigma_u |    .387985
     sigma_e |  .30155209
         rho |  .62341011   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4698, 23400) = 7.76                 Prob > F = 0.0000

. test sq_fitted

 ( 1)  sq_fitted = 0

       F(  1, 23400) =  316.20
            Prob > F =    0.0000

.

As -test- outcome reaches statistical significance, the regression model is misspecified (that is, I forget some predictors (or interactions between predictors) which is actually part of the data generating process; model mispecification should also be seen as a possible signal of endogeneity).

Kind regards,
Carlo
(Stata 19.0)

Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#13

17 Dec 2019, 03:41

Thank you Carlo. I did the same with fixed effect and my model doesn't seem to be misspecified. However, my next steps are:

1-Test for multicollinearity
2-Test for normality
3-Test for heteroskedasticity
4-Insert dummy variable for years

Concerning point 1, I derived the correlation matrix. Is there a way also to do the VIF test/ is it necessary to do the VIF test?
Correlation matrix gives me this output:

Code:

xtreg  TobinsQ ROA DE LNTA YoYSales, fe

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1376                                         min =          5
     between = 0.0048                                         avg =        5.0
     overall = 0.0093                                         max =          5

                                                F(4,844)          =      33.67
corr(u_i, Xb)  = -0.6633                        Prob > F          =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |    1.70617   .4240882     4.02   0.000     .8737784    2.538561
          DE |  -.1909054   .0678469    -2.81   0.005    -.3240738    -.057737
        LNTA |  -.9315958   .1143173    -8.15   0.000    -1.155975   -.7072163
    YoYSales |   .8974558   .1462864     6.13   0.000     .6103279    1.184584
       _cons |   21.70029   2.359716     9.20   0.000     17.06869    26.33189
-------------+----------------------------------------------------------------
     sigma_u |   2.499816
     sigma_e |  1.0771394
         rho |  .84340925   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(211, 844) = 13.67                   Prob > F = 0.0000

. estat vce, corr

Correlation matrix of coefficients of xtreg model

        e(V) |      ROA        DE      LNTA  YoYSales     _cons 
-------------+--------------------------------------------------
         ROA |   1.0000                                         
          DE |   0.1391    1.0000                               
        LNTA |  -0.0705   -0.2073    1.0000                     
    YoYSales |  -0.1099   -0.0233   -0.1147    1.0000           
       _cons |   0.0684    0.1919   -0.9998    0.1081    1.0000 

.

With reference to point 4, for the accuracy of the regression, is it necessary to include dummy variables or the inclusion of the dummy variables for the years is just a way to test for time effect?

Thanks a lot again in advance.

Francesca

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17675

#14

17 Dec 2019, 04:29

Francesca:
- multicollinearity, as oftentimes commented on by Clyde Schechter is something probably oversold. If you do not see weird standard errors, your regression model is reasonably free from multicollinearity. A really famous contributions on this topic can be found in https://www.hup.harvard.edu/catalog....=9780674175440, Chapter 23. Multicollinearity. That said, your -esta vce, corr- output does not seem to reveal a quasi-extreme multicollinearity issue).
- normality is (weakly) relevant for residual distribution only (see https://www.wiley.com/en-us/Introduc...9780470032701; pages 66-67);
- heteroskedasticity, if detected (-xtreg- suite does not include a command developed for that purpose: Hence, you should judge homo/heteroskedasticity visually) can be dealt with invoking robust/clustered standard errors (both non-default option do the very same job). The main consequences of imposing non-default standard errors is that you should mandatorily switch from -hausman- to -xtoverid- (which, in turn, being glorious but a bit old-fashioned, does not support -ffvarlist- notation, such as -i.year-. In that instance, you should create categorical variables by hand);
-including -i.year- makes sense if you want to investigate whether, within the same panel (as you're using -fe- estimator), time has any effect in inducing variation in the regressand when adjusted for the other predictors. You can also test the joint statistical significance of -i.year. via -testparm-:

Code:

. use "http://www.stata-press.com/data/r15/nlswork.dta"
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. xtreg ln_wage i.race i.year tenure, fe
note: 2.race omitted because of collinearity
note: 3.race omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =     28,101
Group variable: idcode                          Number of groups  =      4,699

R-sq:                                           Obs per group:
     within  = 0.1328                                         min =          1
     between = 0.1830                                         avg =        6.0
     overall = 0.1428                                         max =         15

                                                F(15,23387)       =     238.70
corr(u_i, Xb)  = 0.1302                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
      black  |          0  (omitted)
      other  |          0  (omitted)
             |
        year |
         69  |   .0834009    .012716     6.56   0.000     .0584767    .1083252
         70  |   .0581181   .0118878     4.89   0.000     .0348173     .081419
         71  |   .1039541   .0117593     8.84   0.000     .0809051     .127003
         72  |    .111102   .0121416     9.15   0.000     .0873036    .1349004
         73  |   .1190714   .0117941    10.10   0.000     .0959542    .1421885
         75  |    .125436   .0117121    10.71   0.000     .1024795    .1483926
         77  |   .1726586   .0117994    14.63   0.000      .149531    .1957863
         78  |    .198183   .0120938    16.39   0.000     .1744784    .2218877
         80  |   .2041747    .012265    16.65   0.000     .1801345    .2282149
         82  |   .2078031   .0121494    17.10   0.000     .1839895    .2316166
         83  |   .2198642   .0124184    17.70   0.000     .1955233    .2442052
         85  |   .2606739   .0124265    20.98   0.000     .2363171    .2850307
         87  |    .266248   .0125421    21.23   0.000     .2416646    .2908314
         88  |   .3096376   .0127421    24.30   0.000     .2846622     .334613
             |
      tenure |   .0210664   .0008028    26.24   0.000     .0194928      .02264
       _cons |   1.438391   .0093181   154.37   0.000     1.420127    1.456655
-------------+----------------------------------------------------------------
     sigma_u |  .39153681
     sigma_e |   .2976287
         rho |  .63377952   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4698, 23387) = 8.08                 Prob > F = 0.0000

. testparm(i.year)

 ( 1)  69.year = 0
 ( 2)  70.year = 0
 ( 3)  71.year = 0
 ( 4)  72.year = 0
 ( 5)  73.year = 0
 ( 6)  75.year = 0
 ( 7)  77.year = 0
 ( 8)  78.year = 0
 ( 9)  80.year = 0
 (10)  82.year = 0
 (11)  83.year = 0
 (12)  85.year = 0
 (13)  87.year = 0
 (14)  88.year = 0

       F( 14, 23387) =   68.47
            Prob > F =    0.0000

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#15

17 Dec 2019, 07:24

Thank you very much Carlo, this is extremely useful! Really appreciate. This is the process I follow to visually detect heteroskedasticity:

Code:

xtreg  TobinsQ ROA DE LNTA YoYSales, fe

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1376                                         min =          5
     between = 0.0048                                         avg =        5.0
     overall = 0.0093                                         max =          5

                                                F(4,844)          =      33.67
corr(u_i, Xb)  = -0.6633                        Prob > F          =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |    1.70617   .4240882     4.02   0.000     .8737784    2.538561
          DE |  -.1909054   .0678469    -2.81   0.005    -.3240738    -.057737
        LNTA |  -.9315958   .1143173    -8.15   0.000    -1.155975   -.7072163
    YoYSales |   .8974558   .1462864     6.13   0.000     .6103279    1.184584
       _cons |   21.70029   2.359716     9.20   0.000     17.06869    26.33189
-------------+----------------------------------------------------------------
     sigma_u |   2.499816
     sigma_e |  1.0771394
         rho |  .84340925   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(211, 844) = 13.67                   Prob > F = 0.0000

. predict residuals, e

. predict fitted
(option xb assumed; fitted values)

. scatter residuals fitted

.

and what I get is the following graph:

In my humble opinion, I don't see a big problem of heteroskedasticity, do you think this is correct or should I invoke robust/clustered standard errors?

Many thanks,

Francesca

Announcement