Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit regression on firms choosing to be audited or not

    Dear all,

    I am working on a dataset consisting of 214 thousand firms in Switzerland. My dependent variable, bOptingOut, is a binary variable which equals 1 if a firm chooses not to have their financial statements audited and 0 otherwise (meaning they have their financial statements audited). Basically, I am trying to assess the impact of information asymmetries between owners and managers and among owners.
    The dataset consist only of firms that (by assumption) do have a choice between having their financial statements audited or not (for example, I exclude all large companies which are legally required to have their financial statements audited).
    The independent variables consist of firm capital, firm age, the size of management and various binary variables describing the ownership structure (single owner, family owned, corporate owners, etc.) as well as fixed effects for industry and canton (i.e. state).

    This is my regression model:
    Code:
    logit bOptingOut lncapital lnAge cntTotalManagement i.bSingleOwner i.bFamilyOwnedStrict i.bOnlyCorporateOwners i.bFullyOwnerManaged i.bOnlySwissOwners i.industry i.firmCanton, nolog vce(robust)
    The Stata output is as follows (for brevity I omit the two categorical variables for industry and canton):
    Code:
    Logistic regression                             Number of obs     =    214,343
                                                    Wald chi2(50)     =   11096.22
                                                    Prob > chi2       =     0.0000
    Log pseudolikelihood = -20750.008               Pseudo R2         =     0.2415
    
    ----------------------------------------------------------------------------------------
                           |               Robust
                bOptingOut |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -----------------------+----------------------------------------------------------------
                 lnCapital |   -.635477   .0207321   -30.65   0.000    -.6761112   -.5948427
                     lnAge |  -.5767789    .020981   -27.49   0.000     -.617901   -.5356568
        cntTotalManagement |  -.5195666    .016879   -30.78   0.000    -.5526489   -.4864844
            1.bSingleOwner |    .336793     .04052     8.31   0.000     .2573753    .4162107
      1.bFamilyOwnedStrict |   .1779222       .049     3.63   0.000      .081884    .2739605
    1.bOnlyCorporateOwners |  -1.796396   .0488854   -36.75   0.000     -1.89221   -1.700583
      1.bFullyOwnerManaged |   .4281087    .047943     8.93   0.000     .3341422    .5220753
        1.bOnlySwissOwners |   .0567386   .0324741     1.75   0.081    -.0069094    .1203866

    To start with I have two basic questions:
    1) Is it okay to use a logit model if the subject (here, the firms) can actually choose the outcome (having their financial statements audited or not)? If it is a problem, how can I mitigate it (e.g. by using a different model)?
    In the traditional examples I encountered in literature, the subjects did usually not have a direct influence on the outcome (e.g. do you get mortgage or not, are you admitted to university or not).

    2) The key independent variables are all highly significant (except for bOnlySwissOwners and some of the fixed effects). Intuitively this makes sense, especially due to the large sample size. Or am I missing something?

    I would appreciate your thoughts as many of you probably have more experience with statistics than I do. If you need additional information, please let me know, I am happy to expand.

    Kind regards,
    Daniel

  • #2
    What logit requires is a binary outcome. Some people choose to study binary outcomes where the subjects do not have direct control over the outcome, but you can choose to study the direct choices of individuals/ firms/ countries. For example: what factors influence firms to establish subsidiaries abroad? (\(y_i= 1\) if firm \(i\) has a subsidiary abroad and 0 otherwise); what influences individuals to vote for the Democratic party? (\(y_i= 1\) if individual \(i\) votes Democrat and 0 if Republican); and so on. The default in logit and other nonlinear models (except Poisson regression) should be the standard maximum likelihood variance estimator, unless you want to admit that your model is misspecified (see https://www.stata.com/support/faqs/s...nce-estimator/).
    Last edited by Andrew Musau; 18 Feb 2022, 10:42.

    Comment


    • #3
      Daniel:
      as an aside to Andrew's helpful reply, as far as your non-default standard errors are concerned, do you need -robust- (heteroskedasticity) or -vce(cluster clusterid) (autocorrelation, assuming that firms belonging to the same -industry- or -firmCanton- are more similar in some respects)?
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        The p-values less than 1/1000 are not surprising, especially, as the sample size is huge, 214,000. I wonder, though, what is the characteristic of 'bOnlySwissOwner'. I'd check if 'onlySwissOwner' has higher missing cases or skewed 0/1 balance than 'bOnlyCorporateOwners'.
        C

        Comment


        • #5
          Dear all,

          first of all thank you very much for your fast and valuable responses.

          With respect to the standard errors, I must admit that - for the time being - I simply used robust ones "to be on the safe side". As a matter of fact there is no big difference between default standard errors and robust ones (which according to some literature is apparently a good sign). Also, I am considering to use clustered standard errors (this would have been one of my next questions) but I was not sure if it makes more sense to cluster by industry or by canton. I am leaning towards clustering by industry as I think this makes more sense in the context of firms opting-out from an audit (or not). I use the variable canton more to take into account differences in the tax rates and systems between cantons. Does this make sense? The results look as follows when I used clustered standard errors:

          Code:
          logit bOptingOut lncapital lnAge cntTotalManagement i.bSingleOwner i.bFamilyOwnedStrict i.bOnlyCorporateOwners i.bFullyOwnerManaged i.bOnlySwissOwners i.industry i.firmCanton, nolog vce(cluster industry)
          Code:
          Logistic regression                             Number of obs     =    214,343
                                                          Wald chi2(17)     =          .
                                                          Prob > chi2       =          .
          Log pseudolikelihood = -20750.008               Pseudo R2         =     0.2415
          
                                                  (Std. Err. adjusted for 18 clusters in industry)
          ----------------------------------------------------------------------------------------
                                 |               Robust
                      bOptingOut |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -----------------------+----------------------------------------------------------------
                       lncapital |   -.635477    .067299    -9.44   0.000    -.7673807   -.5035733
                           lnAge |  -.5767789   .0654286    -8.82   0.000    -.7050166   -.4485412
              cntTotalManagement |  -.5195666   .0250927   -20.71   0.000    -.5687474   -.4703859
                  1.bSingleOwner |    .336793   .0760881     4.43   0.000     .1876631    .4859228
            1.bFamilyOwnedStrict |   .1779222    .103443     1.72   0.085    -.0248224    .3806668
          1.bOnlyCorporateOwners |  -1.796396   .1076633   -16.69   0.000    -2.007413    -1.58538
            1.bFullyOwnerManaged |   .4281087   .0774435     5.53   0.000     .2763223    .5798952
              1.bOnlySwissOwners |   .0567386   .0646774     0.88   0.380    -.0700267     .183504
                                 |
                        industry |
                              B  |   -.854346   .0712018   -12.00   0.000    -.9938989   -.7147932
                              C  |  -.4938768   .0107136   -46.10   0.000     -.514875   -.4728786
                              D  |  -.3362446   .0429352    -7.83   0.000     -.420396   -.2520931
                              E  |  -.4997928   .0158796   -31.47   0.000    -.5309163   -.4686693
                              F  |  -.5611241   .0249478   -22.49   0.000     -.610021   -.5122272
                              G  |   .0194142   .0217862     0.89   0.373     -.023286    .0621143
                              H  |  -1.008594   .0306942   -32.86   0.000    -1.068754    -.948435
                              I  |  -.8183857   .0352556   -23.21   0.000    -.8874855   -.7492859
                              J  |  -.1897129   .0424713    -4.47   0.000    -.2729552   -.1064707
                              K  |  -.2890326   .0558031    -5.18   0.000    -.3984047   -.1796605
                              L  |   .8471999   .0288012    29.42   0.000     .7907505    .9036492
                              M  |   .1915364   .0331869     5.77   0.000     .1264913    .2565815
                              N  |   -.942812   .0347507   -27.13   0.000    -1.010922   -.8747019
                              P  |  -.3830779   .0345037   -11.10   0.000     -.450704   -.3154518
                              Q  |  -1.464395   .0548205   -26.71   0.000    -1.571842   -1.356949
                              R  |   .1793846   .0398343     4.50   0.000     .1013108    .2574585
                              S  |   .0734179   .0298216     2.46   0.014     .0149687    .1318671
          Like this bOnlySwissOwners becomes completely insignificant and bFamilyOwnedStrict becomes less significant.
          On the other hand, the coefficients for many industries become significant (compared to the version with robust standard errors - which I did not include above).
          Btw. If I cluster by cantons, the results are a bit similar but then the p-values for the cantonal coefficients become more significant.

          With respect to bOnlySwissOwners:
          This variable indicates if a firm is only owned by Swiss individuals or by firms which are domiciled in Switzerland. The underlying hypothesis is that Swiss owners are more likely to opt out from an audit as this is a bit of a specialty in Swiss law to which foreigners or foreign investors might not be accustomed to. I am not very surprised that this variable turns out to be insignificant. On the other hand, the variable does not have any missing values. About 70% of firms are Swiss owned whereas only about 7% have only corporate owners. See below a summary of the two variables:

          Code:
          . summarize bOnlySwissOwners bOnlyCorporateOwners
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
          bOnlySwiss~s |    214,343    .6772603    .4675252          0          1
          bOnlyCorpo~s |    214,343    .0687543    .2530364          0          1


          Kind regards,
          Daniel

          Comment


          • #6
            Daniel:
            I would cluster on -industry-.
            Obviously, with such a large sample size, statistical significance is easy to reach.
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Daniel:
              so, 'bOnlySwissOwner' is 0.68 mean while 'bOnlyCorporateOwners' is 0.07.That is a big difference in my opinion, but you know about the data and it is what it is. I am still wondering, though, maybe other binary predictors are also skewed and overestimate P values (?)in addition to the big sample size.
              C

              Comment

              Working...
              X