Panel Negative Binomial Regression Multicollinearity

Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#1

Panel Negative Binomial Regression Multicollinearity

04 Feb 2024, 06:05

Dear forum community,

I am having some troubles in carring out the results part of my thesis. Since I am very beginner on STATA and a more newby on this forum, I apologize in advance if something doesn't follow the guidelines, I tried to read through all of them and make effor to respect them.

The research was initally docused on a sample size of 300 biotech/pharma firms, due to lack of data the final sample size has 78 firms over a period of 6 years (2013-2018).

Let me provide some more context on the model I am trying to run. I have a dependent variable, namely Female Board Presence (FBP) that was calculated as 1-gender ratio (gender ratio = the number of male directors/the total of directors on board. i had to use this measure since with the tools provided by my univeristy i couldn't access the number of women on board so i decided to find this solution to solve the problem). the independent variable is represented by the number of shared patent filedy by company X in y year. i have to moderator which are educational diversity (calculated through Blau's index) and outside director presence (calculated as the ratio of outside directors/total of directors). control variables are: firm age, boardsize (total number of directors), and i've created a dummy variable that indicates the sector (there's 2 sector so sector1=1 or sector2=0).

here's a summary of the descriptives:

Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
shared_pat~s | 468 2.728632 8.42482 0 108
bfp | 468 .1582265 .0977436 0 .42
ded | 468 .653953 .1531496 .18 .81
odp | 468 .8656197 .219407 0 1.4
firmage | 468 33.20513 39.73419 0 118
-------------+---------------------------------------------------------
boardsize | 468 9.138889 2.320181 5 16

Now, because of the nature of my countable dependent variable i want to run a Negative Binomial Regression with random effects.

The problem is that when I run the full model:

xtnbreg shared_patents firmage boardsize sector c.bfp##c.ded c.bfp##c.odp i get the error note: bfp omitted because of collinearity.

The way that i tried to approach this issue is to check for multicollinearity and try to spot what is causing this problem. I went ahead and checked VIF errors

. regress shared_patents firmage boardsize sector sector c.bfp##c.ded c.bfp##c.odp
note: sector omitted because of collinearity.
note: bfp omitted because of collinearity.

Source | SS df MS Number of obs = 468
-------------+---------------------------------- F(8, 459) = 12.72
Model | 6015.1853 8 751.898163 Prob > F = 0.0000
Residual | 27131.351 459 59.1096972 R-squared = 0.1815
-------------+---------------------------------- Adj R-squared = 0.1672
Total | 33146.5363 467 70.9775938 Root MSE = 7.6883

------------------------------------------------------------------------------
shared_pat~s | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
firmage | .0377138 .0116805 3.23 0.001 .0147599 .0606677
boardsize | .739291 .2035207 3.63 0.000 .3393433 1.139239
sector | .6688978 .8866354 0.75 0.451 -1.07347 2.411266
sector | 0 (omitted)
bfp | -4.11039 24.08792 -0.17 0.865 -51.44667 43.22589
ded | -2.111455 5.097674 -0.41 0.679 -12.12913 7.906217
|
c.bfp#c.ded | -38.42171 27.7684 -1.38 0.167 -92.99067 16.14725
|
bfp | 0 (omitted)
odp | -5.502319 3.071279 -1.79 0.074 -11.53783 .5331923
|
c.bfp#c.odp | 30.55852 17.83892 1.71 0.087 -4.497558 65.61459
|
_cons | .8158366 3.848849 0.21 0.832 -6.747713 8.379386
------------------------------------------------------------------------------

. vif

Variable | VIF 1/VIF
-------------+----------------------
firmage | 1.70 0.587612
boardsize | 1.76 0.567651
sector | 1.51 0.664055
bfp | 43.80 0.022833
ded | 4.82 0.207666
c.bfp#c.ded | 30.24 0.033071
odp | 3.59 0.278742
c.bfp#c.odp | 21.50 0.046515
-------------+----------------------
Mean VIF | 13.61

.
and i saw (don't know if this interpretation could be correct) that concering vif values are displayed for bfp, ded and odp. below the correlation matrix:

| shared~s bfp odp ded firmage boards~e sector
-------------+---------------------------------------------------------------
shared_pat~s | 1.0000
bfp | 0.1223 1.0000
odp | -0.0322 0.1396 1.0000
ded | -0.2820 -0.0048 0.1336 1.0000
firmage | 0.3380 0.3779 0.0759 -0.3637 1.0000
boardsize | 0.3112 0.5049 0.0053 -0.1704 0.5164 1.0000
sector | 0.2544 0.2086 -0.0894 -0.4237 0.4187 0.4305 1.0000

Now, I know there could be problem arising because many of my variables are represented as ratios that contain the same parameters (total number of directors ecc).Given these challenges and my limited STATA expertise, I seek guidance on navigating this issue without excluding key moderators. Any advanced insights or suggested directions would be immensely valuable.

Thank you for your time and expertise.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3047
#2

04 Feb 2024, 14:20

c.bfp##c.ded c.bfp##c.odp

this is going to put bfp variable standing alone twice. Stata has ignored it and given you results, which is typical.

try:

bfp ded odp c.bfp#(c.ded c.odp)
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#3

05 Feb 2024, 01:19

Thank you so much @George.Ford and sorry for the super basic question! Just to be sure, since I want to run these following separate models to assest the impacts:

Model 1: controls only
Model 2: controls + moderators (direct)
Model 3: controls + moderators (direct) + independent variable
Model 4: controls + moderators (direct) + independent variable + interactions

Is this code correct?

xtnbreg shared_patents firmsize firmage sector
xtnbreg shared_patents firmsize firmage sector bfp
xtnbreg shared_patents firmsize firmage sector bfp ded odp
xtnbreg shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp)

(attaching a picture of what my final model lookslike)

Thank you in advance for the help!
Comment
George Ford

Join Date: Aug 2014

Posts: 3047
#4

05 Feb 2024, 08:34

Looks fine I suppose. I guess you don't need an interaction of ded & odp.

What sort of variable is sector? If that's categorical, then you need i.sector or else absorb sector as a fixed effect.
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#5

05 Feb 2024, 10:16

thank you again. it's categorical, 0 for biopharma and 1 for biotech, i think i.sector should work fine?
Comment
George Ford

Join Date: Aug 2014

Posts: 3047
#6

05 Feb 2024, 12:43

if a dummy, you're fine.

I'd use xtpoisson with robust or clustered standard errors rather than xtnegbin.
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#7

05 Feb 2024, 13:17

Thanks for the feedback, very much appreciated. Could I ask you why you would suggest the use of poisson? I am asking this because I did run a poisson model previously with this dataset ( xtpoisson shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp), vce(robust)) but the significant likelihood ratio test for alpha (LR test of alpha=0: chibar2(01) = 1729.05, p < 0.000) indicated that overdispersion was present, therefore I've opted for the use of a negative binomial model over a Poisson model. Am I missing some extra information that could point to the direction of ussing Poisson?

Thank you in advance for your reply.
Comment
George Ford

Join Date: Aug 2014

Posts: 3047
#8

05 Feb 2024, 13:53

robust takes care of the overdispersion. Jeff Wooldridge has recommended poisson(robust) over negbin on Statalist numerous times, and I tend to take his advice.

see, e.g.,

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1587040-why-do-poisson-and-negative-binomial-regressions-yield-the-same-result
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#9

06 Feb 2024, 06:39

Thanks. After reviewing the inherent material i've ran both models, below the code and results:

. xtnbreg shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp)

Fitting negative binomial (constant dispersion) model:

Iteration 0: Log likelihood = -1628.2978
Iteration 1: Log likelihood = -1538.8523
Iteration 2: Log likelihood = -1538.1837
Iteration 3: Log likelihood = -1538.1835
Iteration 4: Log likelihood = -1538.1835

Iteration 0: Log likelihood = -1345.3932
Iteration 1: Log likelihood = -1064.5443
Iteration 2: Log likelihood = -1040.9417
Iteration 3: Log likelihood = -774.01387
Iteration 4: Log likelihood = -773.92482
Iteration 5: Log likelihood = -773.92481

Iteration 0: Log likelihood = -773.92481
Iteration 1: Log likelihood = -701.96162
Iteration 2: Log likelihood = -676.68493
Iteration 3: Log likelihood = -674.3759
Iteration 4: Log likelihood = -674.37241
Iteration 5: Log likelihood = -674.37241

Fitting full model:

Iteration 0: Log likelihood = -602.53497
Iteration 1: Log likelihood = -582.52742
Iteration 2: Log likelihood = -573.18202
Iteration 3: Log likelihood = -572.45289
Iteration 4: Log likelihood = -572.45174
Iteration 5: Log likelihood = -572.45174

Random-effects negative binomial regression Number of obs = 468
Group variable: firm_id Number of groups = 78

Random effects u_i ~ Beta Obs per group:
min = 6
avg = 6.0
max = 6

Wald chi2(8) = 82.93
Log likelihood = -572.45174 Prob > chi2 = 0.0000

--------------------------------------------------------------------------------
shared_patents | Coefficient Std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
firmsize | .1276628 .053743 2.38 0.018 .0223285 .2329971
firmage | .0022397 .0035592 0.63 0.529 -.0047361 .0092156
sector | 2.094113 .4263133 4.91 0.000 1.258554 2.929672
bfp | -5.93071 3.304826 -1.79 0.073 -12.40805 .5466297
ded | -2.748073 .8683171 -3.16 0.002 -4.449943 -1.046203
odp | -1.300383 .7269917 -1.79 0.074 -2.725261 .1244945
|
c.bfp#c.ded | 4.372284 4.303534 1.02 0.310 -4.062487 12.80705
|
c.bfp#c.odp | 3.235424 2.867049 1.13 0.259 -2.383888 8.854737
|
_cons | .3332097 .8653768 0.39 0.700 -1.362898 2.029317
---------------+----------------------------------------------------------------
/ln_r | .3564353 .2214854 -.0776681 .7905386
/ln_s | -.3598179 .249906 -.8496246 .1299888
---------------+----------------------------------------------------------------
r | 1.428229 .3163318 .9252715 2.204583
s | .6978034 .1743852 .4275754 1.138816
--------------------------------------------------------------------------------
LR test vs. pooled: chibar2(01) = 203.84 Prob >= chibar2 = 0.000

. xtpoisson shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp), vce(robust)

Fitting Poisson model:

Iteration 0: Log pseudolikelihood = -1628.2978
Iteration 1: Log pseudolikelihood = -1538.8523
Iteration 2: Log pseudolikelihood = -1538.1837
Iteration 3: Log pseudolikelihood = -1538.1835
Iteration 4: Log pseudolikelihood = -1538.1835

Fitting full model:

Iteration 0: Log pseudolikelihood = -693.41018
Iteration 1: Log pseudolikelihood = -674.34085
Iteration 2: Log pseudolikelihood = -673.67798
Iteration 3: Log pseudolikelihood = -673.66078
Iteration 4: Log pseudolikelihood = -673.66073

Random-effects Poisson regression Number of obs = 468
Group variable: firm_id Number of groups = 78

Random effects u_i ~ Gamma Obs per group:
min = 6
avg = 6.0
max = 6

Wald chi2(8) = 105.95
Log pseudolikelihood = -673.66073 Prob > chi2 = 0.0000

(Std. err. adjusted for clustering on firm_id)
--------------------------------------------------------------------------------
| Robust
shared_patents | Coefficient std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
firmsize | .2510834 .0582504 4.31 0.000 .1369148 .365252
firmage | -.005319 .0323993 -0.16 0.870 -.0688205 .0581824
sector | 2.826403 2.416382 1.17 0.242 -1.909619 7.562425
bfp | -11.19526 5.19208 -2.16 0.031 -21.37155 -1.018971
ded | -2.485036 1.224982 -2.03 0.042 -4.885957 -.0841147
odp | -2.016418 1.046289 -1.93 0.054 -4.067107 .0342708
|
c.bfp#c.ded | 7.26593 8.022931 0.91 0.365 -8.458726 22.99059
|
c.bfp#c.odp | 7.261983 2.98975 2.43 0.015 1.402181 13.12178
|
_cons | -.4753957 1.577055 -0.30 0.763 -3.566367 2.615576
---------------+----------------------------------------------------------------
/lnalpha | 1.046936 1.838643 -2.556738 4.65061
---------------+----------------------------------------------------------------
alpha | 2.848909 5.238127 .0775573 104.6488
--------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 1729.05 Prob >= chibar2 = 0.000

.
However, I still have some issue regarding the robust errors. how can i take into considerations these while explaining how the results confirm/reject my hypothesis?

Thank you in advance.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#10

07 Feb 2024, 14:58

Thanks for the plug, George. I should clarify something: I wouldn't use the the Poisson random effects estimator because its consistency rests on a set of very strong assumptions. These include the Poisson distribution being correct, the heterogeneity having a gamma distribution, and serial independence conditional on the covariates and heterogeneity. Poisson FE is a completely different matter: it's fully robust. So is Poisson regression in cross sections. I don't know how badly behaved Poisson RE is, but I wouldn't use it.

Giuditta: If you key explanatory variable changes over time, I'd try Poisson FE.
1 like
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#11

08 Feb 2024, 07:40

Thanks for the feebacks to both of you!

Hi Jeff, my key explenatory variables is represented by the number of females on a firm's board, which changes over time. Would you then say that unless my data respect the conditions mentioned above for the possion RE, i shouldn't use it?

Many thanks
Comment
Giuditta Colombo

Join Date: Feb 2024

Posts: 7
#12

08 Feb 2024, 09:13

In addition, i am concerned regarding my final sample size of 78 firms (over 6 years so 468 obs). Is it any of the model preferred to control for a small sample size as such?
Comment

Announcement

Panel Negative Binomial Regression Multicollinearity

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment