Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Negative Binomial Regression Multicollinearity

    Dear forum community,

    I am having some troubles in carring out the results part of my thesis. Since I am very beginner on STATA and a more newby on this forum, I apologize in advance if something doesn't follow the guidelines, I tried to read through all of them and make effor to respect them.

    The research was initally docused on a sample size of 300 biotech/pharma firms, due to lack of data the final sample size has 78 firms over a period of 6 years (2013-2018).

    Let me provide some more context on the model I am trying to run. I have a dependent variable, namely Female Board Presence (FBP) that was calculated as 1-gender ratio (gender ratio = the number of male directors/the total of directors on board. i had to use this measure since with the tools provided by my univeristy i couldn't access the number of women on board so i decided to find this solution to solve the problem). the independent variable is represented by the number of shared patent filedy by company X in y year. i have to moderator which are educational diversity (calculated through Blau's index) and outside director presence (calculated as the ratio of outside directors/total of directors). control variables are: firm age, boardsize (total number of directors), and i've created a dummy variable that indicates the sector (there's 2 sector so sector1=1 or sector2=0).

    here's a summary of the descriptives:

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    shared_pat~s | 468 2.728632 8.42482 0 108
    bfp | 468 .1582265 .0977436 0 .42
    ded | 468 .653953 .1531496 .18 .81
    odp | 468 .8656197 .219407 0 1.4
    firmage | 468 33.20513 39.73419 0 118
    -------------+---------------------------------------------------------
    boardsize | 468 9.138889 2.320181 5 16


    Now, because of the nature of my countable dependent variable i want to run a Negative Binomial Regression with random effects.

    The problem is that when I run the full model:

    xtnbreg shared_patents firmage boardsize sector c.bfp##c.ded c.bfp##c.odp i get the error note: bfp omitted because of collinearity.

    The way that i tried to approach this issue is to check for multicollinearity and try to spot what is causing this problem. I went ahead and checked VIF errors

    . regress shared_patents firmage boardsize sector sector c.bfp##c.ded c.bfp##c.odp
    note: sector omitted because of collinearity.
    note: bfp omitted because of collinearity.


    Source | SS df MS Number of obs = 468
    -------------+---------------------------------- F(8, 459) = 12.72
    Model | 6015.1853 8 751.898163 Prob > F = 0.0000
    Residual | 27131.351 459 59.1096972 R-squared = 0.1815
    -------------+---------------------------------- Adj R-squared = 0.1672
    Total | 33146.5363 467 70.9775938 Root MSE = 7.6883

    ------------------------------------------------------------------------------
    shared_pat~s | Coefficient Std. err. t P>|t| [95% conf. interval]
    -------------+----------------------------------------------------------------
    firmage | .0377138 .0116805 3.23 0.001 .0147599 .0606677
    boardsize | .739291 .2035207 3.63 0.000 .3393433 1.139239
    sector | .6688978 .8866354 0.75 0.451 -1.07347 2.411266
    sector | 0 (omitted)
    bfp | -4.11039 24.08792 -0.17 0.865 -51.44667 43.22589
    ded | -2.111455 5.097674 -0.41 0.679 -12.12913 7.906217
    |
    c.bfp#c.ded | -38.42171 27.7684 -1.38 0.167 -92.99067 16.14725
    |
    bfp | 0 (omitted)
    odp | -5.502319 3.071279 -1.79 0.074 -11.53783 .5331923
    |
    c.bfp#c.odp | 30.55852 17.83892 1.71 0.087 -4.497558 65.61459
    |
    _cons | .8158366 3.848849 0.21 0.832 -6.747713 8.379386
    ------------------------------------------------------------------------------

    . vif

    Variable | VIF 1/VIF
    -------------+----------------------
    firmage | 1.70 0.587612
    boardsize | 1.76 0.567651
    sector | 1.51 0.664055
    bfp | 43.80 0.022833
    ded | 4.82 0.207666
    c.bfp#c.ded | 30.24 0.033071
    odp | 3.59 0.278742
    c.bfp#c.odp | 21.50 0.046515
    -------------+----------------------
    Mean VIF | 13.61

    .
    and i saw (don't know if this interpretation could be correct) that concering vif values are displayed for bfp, ded and odp. below the correlation matrix:


    | shared~s bfp odp ded firmage boards~e sector
    -------------+---------------------------------------------------------------
    shared_pat~s | 1.0000
    bfp | 0.1223 1.0000
    odp | -0.0322 0.1396 1.0000
    ded | -0.2820 -0.0048 0.1336 1.0000
    firmage | 0.3380 0.3779 0.0759 -0.3637 1.0000
    boardsize | 0.3112 0.5049 0.0053 -0.1704 0.5164 1.0000
    sector | 0.2544 0.2086 -0.0894 -0.4237 0.4187 0.4305 1.0000



    Now, I know there could be problem arising because many of my variables are represented as ratios that contain the same parameters (total number of directors ecc).Given these challenges and my limited STATA expertise, I seek guidance on navigating this issue without excluding key moderators. Any advanced insights or suggested directions would be immensely valuable.

    Thank you for your time and expertise.

  • #2
    c.bfp##c.ded c.bfp##c.odp

    this is going to put bfp variable standing alone twice. Stata has ignored it and given you results, which is typical.

    try:

    bfp ded odp c.bfp#(c.ded c.odp)



    Comment


    • #3
      Thank you so much @George.Ford and sorry for the super basic question! Just to be sure, since I want to run these following separate models to assest the impacts:

      Model 1: controls only
      Model 2: controls + moderators (direct)
      Model 3: controls + moderators (direct) + independent variable
      Model 4: controls + moderators (direct) + independent variable + interactions

      Is this code correct?

      xtnbreg shared_patents firmsize firmage sector
      xtnbreg shared_patents firmsize firmage sector bfp
      xtnbreg shared_patents firmsize firmage sector bfp ded odp
      xtnbreg shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp)

      (attaching a picture of what my final model lookslike)


      Thank you in advance for the help!
      Click image for larger version

Name:	Screenshot 2024-01-30 101225.png
Views:	1
Size:	44.3 KB
ID:	1742097

      Comment


      • #4
        Looks fine I suppose. I guess you don't need an interaction of ded & odp.

        What sort of variable is sector? If that's categorical, then you need i.sector or else absorb sector as a fixed effect.

        Comment


        • #5
          thank you again. it's categorical, 0 for biopharma and 1 for biotech, i think i.sector should work fine?

          Comment


          • #6
            if a dummy, you're fine.

            I'd use xtpoisson with robust or clustered standard errors rather than xtnegbin.

            Comment


            • #7
              Thanks for the feedback, very much appreciated. Could I ask you why you would suggest the use of poisson? I am asking this because I did run a poisson model previously with this dataset ( xtpoisson shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp), vce(robust)) but the significant likelihood ratio test for alpha (LR test of alpha=0: chibar2(01) = 1729.05, p < 0.000) indicated that overdispersion was present, therefore I've opted for the use of a negative binomial model over a Poisson model. Am I missing some extra information that could point to the direction of ussing Poisson?

              Thank you in advance for your reply.

              Comment


              • #8
                robust takes care of the overdispersion. Jeff Wooldridge has recommended poisson(robust) over negbin on Statalist numerous times, and I tend to take his advice.

                see, e.g.,
                HTML Code:
                https://www.statalist.org/forums/forum/general-stata-discussion/general/1587040-why-do-poisson-and-negative-binomial-regressions-yield-the-same-result

                Comment


                • #9
                  Thanks. After reviewing the inherent material i've ran both models, below the code and results:

                  . xtnbreg shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp)

                  Fitting negative binomial (constant dispersion) model:

                  Iteration 0: Log likelihood = -1628.2978
                  Iteration 1: Log likelihood = -1538.8523
                  Iteration 2: Log likelihood = -1538.1837
                  Iteration 3: Log likelihood = -1538.1835
                  Iteration 4: Log likelihood = -1538.1835

                  Iteration 0: Log likelihood = -1345.3932
                  Iteration 1: Log likelihood = -1064.5443
                  Iteration 2: Log likelihood = -1040.9417
                  Iteration 3: Log likelihood = -774.01387
                  Iteration 4: Log likelihood = -773.92482
                  Iteration 5: Log likelihood = -773.92481

                  Iteration 0: Log likelihood = -773.92481
                  Iteration 1: Log likelihood = -701.96162
                  Iteration 2: Log likelihood = -676.68493
                  Iteration 3: Log likelihood = -674.3759
                  Iteration 4: Log likelihood = -674.37241
                  Iteration 5: Log likelihood = -674.37241

                  Fitting full model:

                  Iteration 0: Log likelihood = -602.53497
                  Iteration 1: Log likelihood = -582.52742
                  Iteration 2: Log likelihood = -573.18202
                  Iteration 3: Log likelihood = -572.45289
                  Iteration 4: Log likelihood = -572.45174
                  Iteration 5: Log likelihood = -572.45174

                  Random-effects negative binomial regression Number of obs = 468
                  Group variable: firm_id Number of groups = 78

                  Random effects u_i ~ Beta Obs per group:
                  min = 6
                  avg = 6.0
                  max = 6

                  Wald chi2(8) = 82.93
                  Log likelihood = -572.45174 Prob > chi2 = 0.0000

                  --------------------------------------------------------------------------------
                  shared_patents | Coefficient Std. err. z P>|z| [95% conf. interval]
                  ---------------+----------------------------------------------------------------
                  firmsize | .1276628 .053743 2.38 0.018 .0223285 .2329971
                  firmage | .0022397 .0035592 0.63 0.529 -.0047361 .0092156
                  sector | 2.094113 .4263133 4.91 0.000 1.258554 2.929672
                  bfp | -5.93071 3.304826 -1.79 0.073 -12.40805 .5466297
                  ded | -2.748073 .8683171 -3.16 0.002 -4.449943 -1.046203
                  odp | -1.300383 .7269917 -1.79 0.074 -2.725261 .1244945
                  |
                  c.bfp#c.ded | 4.372284 4.303534 1.02 0.310 -4.062487 12.80705
                  |
                  c.bfp#c.odp | 3.235424 2.867049 1.13 0.259 -2.383888 8.854737
                  |
                  _cons | .3332097 .8653768 0.39 0.700 -1.362898 2.029317
                  ---------------+----------------------------------------------------------------
                  /ln_r | .3564353 .2214854 -.0776681 .7905386
                  /ln_s | -.3598179 .249906 -.8496246 .1299888
                  ---------------+----------------------------------------------------------------
                  r | 1.428229 .3163318 .9252715 2.204583
                  s | .6978034 .1743852 .4275754 1.138816
                  --------------------------------------------------------------------------------
                  LR test vs. pooled: chibar2(01) = 203.84 Prob >= chibar2 = 0.000

                  . xtpoisson shared_patents firmsize firmage sector bfp ded odp c.bfp#(c.ded c.odp), vce(robust)

                  Fitting Poisson model:

                  Iteration 0: Log pseudolikelihood = -1628.2978
                  Iteration 1: Log pseudolikelihood = -1538.8523
                  Iteration 2: Log pseudolikelihood = -1538.1837
                  Iteration 3: Log pseudolikelihood = -1538.1835
                  Iteration 4: Log pseudolikelihood = -1538.1835

                  Fitting full model:

                  Iteration 0: Log pseudolikelihood = -693.41018
                  Iteration 1: Log pseudolikelihood = -674.34085
                  Iteration 2: Log pseudolikelihood = -673.67798
                  Iteration 3: Log pseudolikelihood = -673.66078
                  Iteration 4: Log pseudolikelihood = -673.66073

                  Random-effects Poisson regression Number of obs = 468
                  Group variable: firm_id Number of groups = 78

                  Random effects u_i ~ Gamma Obs per group:
                  min = 6
                  avg = 6.0
                  max = 6

                  Wald chi2(8) = 105.95
                  Log pseudolikelihood = -673.66073 Prob > chi2 = 0.0000

                  (Std. err. adjusted for clustering on firm_id)
                  --------------------------------------------------------------------------------
                  | Robust
                  shared_patents | Coefficient std. err. z P>|z| [95% conf. interval]
                  ---------------+----------------------------------------------------------------
                  firmsize | .2510834 .0582504 4.31 0.000 .1369148 .365252
                  firmage | -.005319 .0323993 -0.16 0.870 -.0688205 .0581824
                  sector | 2.826403 2.416382 1.17 0.242 -1.909619 7.562425
                  bfp | -11.19526 5.19208 -2.16 0.031 -21.37155 -1.018971
                  ded | -2.485036 1.224982 -2.03 0.042 -4.885957 -.0841147
                  odp | -2.016418 1.046289 -1.93 0.054 -4.067107 .0342708
                  |
                  c.bfp#c.ded | 7.26593 8.022931 0.91 0.365 -8.458726 22.99059
                  |
                  c.bfp#c.odp | 7.261983 2.98975 2.43 0.015 1.402181 13.12178
                  |
                  _cons | -.4753957 1.577055 -0.30 0.763 -3.566367 2.615576
                  ---------------+----------------------------------------------------------------
                  /lnalpha | 1.046936 1.838643 -2.556738 4.65061
                  ---------------+----------------------------------------------------------------
                  alpha | 2.848909 5.238127 .0775573 104.6488
                  --------------------------------------------------------------------------------
                  LR test of alpha=0: chibar2(01) = 1729.05 Prob >= chibar2 = 0.000

                  .
                  However, I still have some issue regarding the robust errors. how can i take into considerations these while explaining how the results confirm/reject my hypothesis?

                  Thank you in advance.

                  Comment


                  • #10
                    Thanks for the plug, George. I should clarify something: I wouldn't use the the Poisson random effects estimator because its consistency rests on a set of very strong assumptions. These include the Poisson distribution being correct, the heterogeneity having a gamma distribution, and serial independence conditional on the covariates and heterogeneity. Poisson FE is a completely different matter: it's fully robust. So is Poisson regression in cross sections. I don't know how badly behaved Poisson RE is, but I wouldn't use it.

                    Giuditta: If you key explanatory variable changes over time, I'd try Poisson FE.

                    Comment


                    • #11
                      Thanks for the feebacks to both of you!

                      Hi Jeff, my key explenatory variables is represented by the number of females on a firm's board, which changes over time. Would you then say that unless my data respect the conditions mentioned above for the possion RE, i shouldn't use it?

                      Many thanks

                      Comment


                      • #12
                        In addition, i am concerned regarding my final sample size of 78 firms (over 6 years so 468 obs). Is it any of the model preferred to control for a small sample size as such?

                        Comment

                        Working...
                        X