How to deal with ratios as dependent and independent variables in a panel data regression

Francesca Sossella

Join Date: Nov 2019

Posts: 29
#31

21 Dec 2019, 07:10

This is the output in terms of heteroskedasticity
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#32

21 Dec 2019, 09:30

Francesca:
the model looks ok to me.
Hiwever, you might have a minor heteroskedasticity issue: just impose cluster robust SEs and see if the 95% CIs do differ vs default SEs.

Kind regards,
Carlo
(Stata 19.0)
Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#33

21 Dec 2019, 10:50

Carlo, thank you very much again. I invoked robust standard errors. You can find below the output with robust standard errors and without. It seems to me that CI are smaller in the model without robust standard errors, do you agree?

Code:

xtreg TobinsQ ROA DE LNTA YoYSales RDCS i. Years, fe vce(cluster Company1)

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1893                                         min =          5
     between = 0.0031                                         avg =        5.0
     overall = 0.0070                                         max =          5

                                                F(9,211)          =      10.82
corr(u_i, Xb)  = -0.7982                        Prob > F          =     0.0000

                             (Std. Err. adjusted for 212 clusters in Company1)
------------------------------------------------------------------------------
             |               Robust
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |   1.677901   .5152768     3.26   0.001     .6621507    2.693651
          DE |  -.2066595   .0689203    -3.00   0.003    -.3425202   -.0707989
        LNTA |  -1.350005   .2896686    -4.66   0.000     -1.92102   -.7789899
    YoYSales |   .7732031   .2754733     2.81   0.005     .2301706    1.316236
        RDCS |  -.7317888   .3324677    -2.20   0.029    -1.387173    -.076405
             |
       Years |
       2014  |   .1331407    .082901     1.61   0.110    -.0302795    .2965609
       2015  |   .0866756   .1132684     0.77   0.445     -.136607    .3099582
       2016  |   .2196538   .1044505     2.10   0.037     .0137535    .4255541
       2017  |    .679877   .1435643     4.74   0.000     .3968728    .9628811
             |
       _cons |   30.76325   6.044455     5.09   0.000     18.84799     42.6785
-------------+----------------------------------------------------------------
     sigma_u |  3.1345762
     sigma_e |  1.0474911
         rho |  .89954618   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. 
. xtreg TobinsQ ROA DE LNTA YoYSales RDCS i. Years, fe

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1893                                         min =          5
     between = 0.0031                                         avg =        5.0
     overall = 0.0070                                         max =          5

                                                F(9,839)          =      21.76
corr(u_i, Xb)  = -0.7982                        Prob > F          =     0.0000

------------------------------------------------------------------------------
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ROA |   1.677901   .4205996     3.99   0.000     .8523498    2.503452
          DE |  -.2066595   .0667669    -3.10   0.002    -.3377094   -.0756097
        LNTA |  -1.350005    .129184   -10.45   0.000    -1.603567   -1.096443
    YoYSales |   .7732031   .1465645     5.28   0.000      .485527    1.060879
        RDCS |  -.7317888   .1967754    -3.72   0.000    -1.118019   -.3455589
             |
       Years |
       2014  |   .1331407   .1034784     1.29   0.199    -.0699663    .3362476
       2015  |   .0866756   .1058546     0.82   0.413    -.1210953    .2944466
       2016  |   .2196538   .1098901     2.00   0.046     .0039619    .4353456
       2017  |    .679877   .1150542     5.91   0.000     .4540491    .9057048
             |
       _cons |   30.76325   2.699273    11.40   0.000     25.46513    36.06137
-------------+----------------------------------------------------------------
     sigma_u |  3.1345762
     sigma_e |  1.0474911
         rho |  .89954618   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(211, 839) = 14.60                   Prob > F = 0.0000

.

However, I would like to ask your opinion on the following:

-Why if my observation years go from 2013 (included) to 2017(included), I see only years from 2014 when I insert years dummy variable?
-How can I control for the company effect if I don't invoke robust standard errors?
-When I do the Hausman test, do I need to include the dummy variables for the years or is not necessary?
-If my F value increase, it means that the model increase in significance, right?

Many thanks in advance,

Francesca

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#34

21 Dec 2019, 11:29

Francesca:
- I would keep the model with clustered robust SE (and this comment answers to your question #2, too) because SEs are actually higher (and for some coefficients they almost double) than the default ones;
- 2013 is omitted by default to shelter your regression from the so called dummy trap (https://en.wikipedia.org/wiki/Dummy_...(statistics));
- yes, include the dummy variables for year in both the -fe- and -re- regressions, that now should be compared via the community-contributed command -xtoverid- as you invoked non default SEs which are not supported by -hausman-;
- not quite. The F-test investigates whether your coefficients jointly differ from zero.

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesca Sossella

Join Date: Nov 2019

Posts: 29
#35

26 Dec 2019, 03:28

Thank you very much Carlo as always and sorry for the delay in the response!

First of all, merry Christmas and happy holidays! I just realized that for my model I did not take into account potential outliers that should be eliminated. Do you think this is a major issue or can I avoid to check for outliers?

Many thanks again and my best wishes,

Francesca
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#36

26 Dec 2019, 04:45

Francesca:
I do reciprocate best wishes for the Xmas season to you and your dears.
Sticking with statistics, eliminating outliers (unless you're 100% sure that they are the result of a mistaken data entry) is, in general, a very bad idea. What we call outliers, are, in general, expressions of the data generating process undelying our samples. For instance, it is frequent that the statistical distribution of the total cost of a given activity is positively skewed (gamma distribution).
In sum, think very carefully about eliminating "weird" observations: personally, I do not advise that approach.

Kind regards,
Carlo
(Stata 19.0)
Comment
Francesca Sossella

Join Date: Nov 2019

Posts: 29
#37

27 Dec 2019, 04:16

Thank you very much Carlo! However, I would have another concern. When I apply robust standard errors and plot again to check for heteroskedasticity, it seems to stay the same as before invoking for robust standard errors (I attach the graph below). Is it normal?
Comment
Francesca Sossella

Join Date: Nov 2019

Posts: 29
#38

27 Dec 2019, 04:17

This is the output after invoking robust standard errors
Attached Files
Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#39

27 Dec 2019, 04:22

Would it be better to apply the natural logarithm to some of the variables? If I do so and plot for heteroskedastity, these are the output I get:

Code:

xtreg LNTobinsQ LNTA ROA LNDE lnYoYSales i.Years, fe

Fixed-effects (within) regression               Number of obs     =      1,049
Group variable: Company1                        Number of groups  =        210

R-sq:                                           Obs per group:
     within  = 0.2717                                         min =          4
     between = 0.0011                                         avg =        5.0
     overall = 0.0058                                         max =          5

                                                F(8,831)          =      38.76
corr(u_i, Xb)  = -0.7684                        Prob > F          =     0.0000

------------------------------------------------------------------------------
   LNTobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        LNTA |  -.3862058   .0367041   -10.52   0.000    -.4582493   -.3141622
         ROA |   .9756964   .1211722     8.05   0.000     .7378569    1.213536
        LNDE |  -.0886246   .0141089    -6.28   0.000    -.1163178   -.0609314
  lnYoYSales |   .0513431    .009779     5.25   0.000     .0321487    .0705375
             |
       Years |
       2014  |   .0703752   .0295889     2.38   0.018     .0122975    .1284529
       2015  |   .0508165    .030604     1.66   0.097    -.0092537    .1108866
       2016  |   .1006498   .0316692     3.18   0.002     .0384888    .1628109
       2017  |    .266613   .0331686     8.04   0.000     .2015089    .3317171
             |
       _cons |   8.532238   .7666167    11.13   0.000     7.027505    10.03697
-------------+----------------------------------------------------------------
     sigma_u |  .98246626
     sigma_e |  .29886225
         rho |  .91530234   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(209, 831) = 19.11                   Prob > F = 0.0000

The R-sqared increase from 17% to 27% and the heteroskedasticity improves:

Attached Files

Comment

Francesca Sossella

Join Date: Nov 2019

Posts: 29
#40

27 Dec 2019, 04:33

Thank you very much in advance.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#41

27 Dec 2019, 10:29

Francesca:
what you experienced (#37) occurred to everybody when trying to get familiar with -regress- (and I lead the queue of those frightened with looking at the same graph before/after invoking robust standard errors)..
Indeed, re-checking for heteroskedasticity after invoking cluster robust standard error is not useful at all: the graph will remain the same because the -robust- option correct the standard errors, not the residuals.
As far as your second question is concerned, actually there seem to be heteroskedasticity in your last graph (the distribution of the systematic error seems to widen when the value of the fitted values increase). It is worth checking whether heteroskedasticity depends on misspecification.
Logging some variables at random makes little methodological sense (by the way, from your code I fail to get which variables, if any, have already been logged): logging the regressand and/or the predictors means creating different regression models, which coefficients can be difficult to undestand and/or disseminate.
If the heteroskedasticity is not due to model misspecification, invoking cluster robust standard error is, in general, enough.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#42

27 Dec 2019, 17:04

Just to add on to Carlo's helpful comments, you're fixating way too much on heteroskedasticity. It's much more likely that serial correlation is a bigger issue, if you think of these problems the way they are traditionally taught. How come you're testing for heteroskedasticity and not serial correlation?

The point is, you should test for neither. The clustering accounts for any kind of heteroskedasticity and any kind of serial correlation. You're not changing the estimates -- it's still standard fixed effects -- but you're computing robust standard errors. They're robust to heteroskedasticity and serial correlation, and that's why almost all empirical researchers compute them now and do not even bother to check if either is a problem. The standard errors work in either case.

Two things about taking the log of TobinsQ. First, note that you've lost 11 observations, which is due to TobinsQ <=0 in 11 cases. And you've lost two firms entirely. So you don't want to do this. And even if you did, it makes no sense to compare R-squareds across different transformations of the dependent variable. It's possible to compute an R-squared that is comparable, but, since you are losing data taking the logs, you shouldn't do that. It is often true that the R-squared using logs is higher but that doesn't mean you should do it, especially when the samples aren't comparable.

Do you have any negative values of TobinsQ, or just zeros?

Above I suggested that you stop with the results in post #33, using the results with clustered standard errors. That's still my suggestion. Is someone insisting you test for heteroskedasticity?
2 likes
Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#43

01 Jan 2020, 07:45

Dear Carlo and Jeff, thank you very much for your support, it is really important to me! First of all, apologies for my late response and wish you all the best for the new year!

I understand that heteroskedasticity is definitely not a big issue (most of all after applying robust standard errors) and thank you for your patience in explaining me why. I was looking at previous thesis with regression analysis as a reference and I noticed that they were checking on heteroskedasticity before and after some transformations. That's why I was erroneously looking at heteroskedasticity too much. The minimum value of my Tobin's Q is 0,18 and now I also understand why it doesn't make sense to apply the log. So, really thank you.

However, I would like to kindly ask you another clarification on my analysis:

-When I run the regression including time effect, therefore years' dummy variables, what is a good explanation to the fact that only one year is statistically significant?

Below I provide an output for your reference:

Code:

. xtreg TobinsQ LNTA ROA DE YoYSales RDS i. Years, fe vce(cluster Company1)

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1768                                         min =          5
     between = 0.0030                                         avg =        5.0
     overall = 0.0071                                         max =          5

                                                F(9,211)          =      10.54
corr(u_i, Xb)  = -0.7614                        Prob > F          =     0.0000

                             (Std. Err. adjusted for 212 clusters in Company1)
------------------------------------------------------------------------------
             |               Robust
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        LNTA |  -1.201734   .2806209    -4.28   0.000    -1.754914   -.6485543
         ROA |   1.799148   .4931036     3.65   0.000      .827107    2.771189
          DE |  -.2142444   .0701898    -3.05   0.003    -.3526075   -.0758813
    YoYSales |   .8832866   .2885457     3.06   0.002     .3144848    1.452088
         RDS |  -.9375915   1.432355    -0.65   0.513     -3.76115    1.885967
             |
       Years |
       2014  |   .1154523   .0818341     1.41   0.160    -.0458648    .2767695
       2015  |   .0553842   .1094215     0.51   0.613    -.1603152    .2710835
       2016  |   .1661806   .0992611     1.67   0.096      -.02949    .3618511
       2017  |   .6106977   .1347841     4.53   0.000     .3450019    .8763936
             |
       _cons |   27.29121   5.722012     4.77   0.000     16.01158    38.57085
-------------+----------------------------------------------------------------
     sigma_u |  2.9067784
     sigma_e |  1.0554858
         rho |  .88350911   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Many thanks in advance!

Francesca

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17676
#44

01 Jan 2020, 08:55

Francesca:
best wishes for the just begun 2020 to you, too.
You should better consider the joint statistical significance of -i.year- via:

Code:

testparm(i.year)

It may well be that -i.year-, when adjusted for the remaining predictors, does not play a relevant role (jointly speaking) in explaining the variation of the regressand within the same panel (as you're dealing with -fe- specification).

Kind regards,
Carlo
(Stata 19.0)
Comment

Francesca Sossella

Join Date: Nov 2019
Posts: 29

#45

04 Jan 2020, 07:16

Thank you very much Carlo!

Now that I am going to interpret the coefficients, I am wondering if my interpretation is correct. This is the output I get, for example, from my first model:

Code:

. xtreg TobinsQ LNTA ROA DE YoYSales RDS i. Years, fe vce(cluster Company1)

Fixed-effects (within) regression               Number of obs     =      1,060
Group variable: Company1                        Number of groups  =        212

R-sq:                                           Obs per group:
     within  = 0.1768                                         min =          5
     between = 0.0030                                         avg =        5.0
     overall = 0.0071                                         max =          5

                                                F(9,211)          =      10.54
corr(u_i, Xb)  = -0.7614                        Prob > F          =     0.0000

                             (Std. Err. adjusted for 212 clusters in Company1)
------------------------------------------------------------------------------
             |               Robust
     TobinsQ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        LNTA |  -1.201734   .2806209    -4.28   0.000    -1.754914   -.6485543
         ROA |   1.799148   .4931036     3.65   0.000      .827107    2.771189
          DE |  -.2142444   .0701898    -3.05   0.003    -.3526075   -.0758813
    YoYSales |   .8832866   .2885457     3.06   0.002     .3144848    1.452088
         RDS |  -.9375915   1.432355    -0.65   0.513     -3.76115    1.885967
             |
       Years |
       2014  |   .1154523   .0818341     1.41   0.160    -.0458648    .2767695
       2015  |   .0553842   .1094215     0.51   0.613    -.1603152    .2710835
       2016  |   .1661806   .0992611     1.67   0.096      -.02949    .3618511
       2017  |   .6106977   .1347841     4.53   0.000     .3450019    .8763936
             |
       _cons |   27.29121   5.722012     4.77   0.000     16.01158    38.57085
-------------+----------------------------------------------------------------
     sigma_u |  2.9067784
     sigma_e |  1.0554858
         rho |  .88350911   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

is it correct stating the following?

When the natural logarithm of total assets increases by one unit, the change in Tobin’s Q equals a decrease of 1,2 units;
When the return on assets increases by one unit, the change in Tobin’s Q equals an increase of 1,8 units;
When the debt-equity ratio increases by one unit, the change in Tobin’s Q equals a decrease of 0,2 units;
When the year-over-year sales increase by one unit, the change in Tobin’s Q equals an increase of 0,9 units;
When the research and development expenditure intensity increases by one unit, the change in Tobin’s Q equals a decrease of 0,9 units.

Or should I consider increases/decreases in percentages rather than in units?

Many thanks in advance,

Francesca

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment