  • Model for non-linear regression (?)

    Hey everyone,

    I really appreciate the support here in this forum.
    The more I learn about statistic and stata, the more I am questioning my model I am trying to analyze.

    I have unbalanced panel data.
    I want to analyze if Corporate Venture Capital has an influence on the financial performance of a company.
    I have decided that I will add zeros to my panel data whenever a company has not invested in a time period (2009-2019).
    The qualitatively meaning of the zeros follows the same logic: The amount of investment; no investment = zero amount.

    Adding zeros makes my plot look non-linear.

    . plot tq cvc     
      5.1325 +  
             | * *
             | *                                                         *
             | *                          *
             | *
        T    | *
        o    | *   *
        b    | **                  *            *
        i    | *   *
        n    | *                                       *
        '    | *  *                *
        s    | ** *                         *                       *
             | ** * *
        Q    | ***                                                *         *
             | ***  * *                    *                                  *
             | ** *   *                                            *
             | **   **      *         *      *    *                       *
             | *  *  ** * * *   ** *  *  *    *  *                            *
             | * **  **   **     *    *     * * *                             *
     .222395 + * *     *                         *                 *
                    0    Fund Total Estimated Equity Invested in      76.6452
    As I am/was aware I can use

    . local controls "fs lev itq rdi growth cap_exp"
    . xtreg tq cvc `controls' i.fyear, fe vce(cluster gvkey)  
    Fixed-effects (within) regression               Number of obs     =        353
    Group variable: gvkey                           Number of groups  =         34
    R-squared:                                      Obs per group:
         Within  = 0.5024                                         min =          2
         Between = 0.4515                                         avg =       10.4
         Overall = 0.4733                                         max =         11
                                                    F(17,33)          =      31.15
    corr(u_i, Xb) = 0.0740                          Prob > F          =     0.0000
                                     (Std. err. adjusted for 34 clusters in gvkey)
                 |               Robust
              tq | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
             cvc |  -.0004955   .0020686    -0.24   0.812    -.0047041     .003713
              fs |  -.1565618   .1077493    -1.45   0.156    -.3757794    .0626558
             lev |   .0371103   .0150445     2.47   0.019     .0065021    .0677186
             itq |    .685569   .1433776     4.78   0.000     .3938652    .9772729
             rdi |  -6.298757   5.390539    -1.17   0.251    -17.26589    4.668377
          growth |   .0939487   .1110024     0.85   0.403    -.1318875    .3197848
         cap_exp |   1.192223   .7782992     1.53   0.135    -.3912389    2.775684
           fyear |
           2010  |  -.1237752   .0610782    -2.03   0.051    -.2480398    .0004894
           2011  |  -.1557566   .0697817    -2.23   0.033    -.2977285   -.0137846
           2012  |  -.1343674    .075972    -1.77   0.086    -.2889337    .0201988
           2013  |  -.0314902   .0584656    -0.54   0.594    -.1504393    .0874589
           2014  |   .0033938   .0763937     0.04   0.965    -.1520302    .1588179
           2015  |   .0466549   .0763477     0.61   0.545    -.1086757    .2019855
           2016  |   .0999757   .0848034     1.18   0.247    -.0725581    .2725095
           2017  |   .0848351      .0936     0.91   0.371    -.1055954    .2752657
           2018  |   .0135201   .0985238     0.14   0.892    -.1869281    .2139683
           2019  |   .0607907    .094598     0.64   0.525    -.1316703    .2532517
           _cons |   2.321139   1.365598     1.70   0.099    -.4571896    5.099468
         sigma_u |  .74063035
         sigma_e |  .29752864
             rho |  .86104329   (fraction of variance due to u_i)
    to run this model.

    I detected heteroscedasticity, autocorrelation, no multicollinearity (VIF is small) & -fe- is appropriate.

    Is my model correct or can someone recommend me a better model/command in my case.
    I appreciate your support!

    Thank you
    Kind regards,

    I fail to get the title of your post, as you actually ran a linear model.
    the issue here seems to rest on the assumption that no investment in a given year=0. But it may well be that investment was simply unreported (missing). Unless it is traditional in your research field (corporate finance, I guess), it's pretty strong an assumption.
    I would double-check with your professor/teacher/supervisor/senior colleagues.
    As far as -xtreg,fe-regression is concerned, it looks Ok to me (I assume that -re- specification was outperformed by -fe-), provided that you do not mention any postestimation command/code aimed at investigating posible model misspecification.
    I would also test the joint statististical significance of -i.year- via -testparm.
    Last edited by Carlo Lazzaro; 01 Dec 2021, 02:40.
    Kind regards,
    (StataNow 18.5)


      Thanks for all your help so far!
      Actually there is one of my big uncertainties. Am I linear or not? In the beginning I thought I am linear and actally I think not. I am pretty confused.
      I thought I have to check for linearity by looking at the plot and in my point of view it does not look linear. Am I correct or wrong?

      So if i am not, to which command do I have to switch?

      Best regards,


        linearity relates to coefficients, not variables.
        Kind regards,
        (StataNow 18.5)


          Okay maybe that explains my confusion.
          Can you give me an advice how i can test the linearity with the coefficients?
          How do i check the linearity?

          Best regards,


            there's nothing to check here, as all the coefficients have power==1 (hence they are linear).
            Moreover, your regressand is continuous and you have a N>T panel dataset: hence, -xtreg- is the way to go.
            Clustered-robust standard errors are also OK as the number of your panels (Ni dimension) is large eough to support it.
            Kind regards,
            (StataNow 18.5)


              Thank you so much!
              Without you i would be so lost.

              Kind regards,


                thanks, very flattering and, in all likelihood, a bit of a stretch
                Kind regards,
                (StataNow 18.5)

