Logit regression on firms choosing to be audited or not

Daniel Baettig

Join Date: Mar 2019

Posts: 9
#1

Logit regression on firms choosing to be audited or not

18 Feb 2022, 08:58

Dear all,

I am working on a dataset consisting of 214 thousand firms in Switzerland. My dependent variable, bOptingOut, is a binary variable which equals 1 if a firm chooses not to have their financial statements audited and 0 otherwise (meaning they have their financial statements audited). Basically, I am trying to assess the impact of information asymmetries between owners and managers and among owners.
The dataset consist only of firms that (by assumption) do have a choice between having their financial statements audited or not (for example, I exclude all large companies which are legally required to have their financial statements audited).
The independent variables consist of firm capital, firm age, the size of management and various binary variables describing the ownership structure (single owner, family owned, corporate owners, etc.) as well as fixed effects for industry and canton (i.e. state).

This is my regression model:

Code:

logit bOptingOut lncapital lnAge cntTotalManagement i.bSingleOwner i.bFamilyOwnedStrict i.bOnlyCorporateOwners i.bFullyOwnerManaged i.bOnlySwissOwners i.industry i.firmCanton, nolog vce(robust)

The Stata output is as follows (for brevity I omit the two categorical variables for industry and canton):

Code:

Logistic regression Number of obs = 214,343 Wald chi2(50) = 11096.22 Prob > chi2 = 0.0000 Log pseudolikelihood = -20750.008 Pseudo R2 = 0.2415 ---------------------------------------------------------------------------------------- | Robust bOptingOut | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------------------+---------------------------------------------------------------- lnCapital | -.635477 .0207321 -30.65 0.000 -.6761112 -.5948427 lnAge | -.5767789 .020981 -27.49 0.000 -.617901 -.5356568 cntTotalManagement | -.5195666 .016879 -30.78 0.000 -.5526489 -.4864844 1.bSingleOwner | .336793 .04052 8.31 0.000 .2573753 .4162107 1.bFamilyOwnedStrict | .1779222 .049 3.63 0.000 .081884 .2739605 1.bOnlyCorporateOwners | -1.796396 .0488854 -36.75 0.000 -1.89221 -1.700583 1.bFullyOwnerManaged | .4281087 .047943 8.93 0.000 .3341422 .5220753 1.bOnlySwissOwners | .0567386 .0324741 1.75 0.081 -.0069094 .1203866

To start with I have two basic questions:
1) Is it okay to use a logit model if the subject (here, the firms) can actually choose the outcome (having their financial statements audited or not)? If it is a problem, how can I mitigate it (e.g. by using a different model)?
In the traditional examples I encountered in literature, the subjects did usually not have a direct influence on the outcome (e.g. do you get mortgage or not, are you admitted to university or not).

2) The key independent variables are all highly significant (except for bOnlySwissOwners and some of the fixed effects). Intuitively this makes sense, especially due to the large sample size. Or am I missing something?

I would appreciate your thoughts as many of you probably have more experience with statistics than I do. If you need additional information, please let me know, I am happy to expand.

Kind regards,
Daniel
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#2

18 Feb 2022, 09:29

What logit requires is a binary outcome. Some people choose to study binary outcomes where the subjects do not have direct control over the outcome, but you can choose to study the direct choices of individuals/ firms/ countries. For example: what factors influence firms to establish subsidiaries abroad? (\(y_i= 1\) if firm \(i\) has a subsidiary abroad and 0 otherwise); what influences individuals to vote for the Democratic party? (\(y_i= 1\) if individual \(i\) votes Democrat and 0 if Republican); and so on. The default in logit and other nonlinear models (except Poisson regression) should be the standard maximum likelihood variance estimator, unless you want to admit that your model is misspecified (see https://www.stata.com/support/faqs/s...nce-estimator/).

Last edited by Andrew Musau; 18 Feb 2022, 09:42.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#3

18 Feb 2022, 10:03

Daniel:
as an aside to Andrew's helpful reply, as far as your non-default standard errors are concerned, do you need -robust- (heteroskedasticity) or -vce(cluster clusterid) (autocorrelation, assuming that firms belonging to the same -industry- or -firmCanton- are more similar in some respects)?

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#4

18 Feb 2022, 10:28

The p-values less than 1/1000 are not surprising, especially, as the sample size is huge, 214,000. I wonder, though, what is the characteristic of 'bOnlySwissOwner'. I'd check if 'onlySwissOwner' has higher missing cases or skewed 0/1 balance than 'bOnlyCorporateOwners'.
C
Comment

Daniel Baettig

Join Date: Mar 2019
Posts: 9

18 Feb 2022, 10:55

Dear all,

first of all thank you very much for your fast and valuable responses.

With respect to the standard errors, I must admit that - for the time being - I simply used robust ones "to be on the safe side". As a matter of fact there is no big difference between default standard errors and robust ones (which according to some literature is apparently a good sign). Also, I am considering to use clustered standard errors (this would have been one of my next questions) but I was not sure if it makes more sense to cluster by industry or by canton. I am leaning towards clustering by industry as I think this makes more sense in the context of firms opting-out from an audit (or not). I use the variable canton more to take into account differences in the tax rates and systems between cantons. Does this make sense? The results look as follows when I used clustered standard errors:

Code:

logit bOptingOut lncapital lnAge cntTotalManagement i.bSingleOwner i.bFamilyOwnedStrict i.bOnlyCorporateOwners i.bFullyOwnerManaged i.bOnlySwissOwners i.industry i.firmCanton, nolog vce(cluster industry)

Code:

Logistic regression                             Number of obs     =    214,343
                                                Wald chi2(17)     =          .
                                                Prob > chi2       =          .
Log pseudolikelihood = -20750.008               Pseudo R2         =     0.2415

                                        (Std. Err. adjusted for 18 clusters in industry)
----------------------------------------------------------------------------------------
                       |               Robust
            bOptingOut |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
             lncapital |   -.635477    .067299    -9.44   0.000    -.7673807   -.5035733
                 lnAge |  -.5767789   .0654286    -8.82   0.000    -.7050166   -.4485412
    cntTotalManagement |  -.5195666   .0250927   -20.71   0.000    -.5687474   -.4703859
        1.bSingleOwner |    .336793   .0760881     4.43   0.000     .1876631    .4859228
  1.bFamilyOwnedStrict |   .1779222    .103443     1.72   0.085    -.0248224    .3806668
1.bOnlyCorporateOwners |  -1.796396   .1076633   -16.69   0.000    -2.007413    -1.58538
  1.bFullyOwnerManaged |   .4281087   .0774435     5.53   0.000     .2763223    .5798952
    1.bOnlySwissOwners |   .0567386   .0646774     0.88   0.380    -.0700267     .183504
                       |
              industry |
                    B  |   -.854346   .0712018   -12.00   0.000    -.9938989   -.7147932
                    C  |  -.4938768   .0107136   -46.10   0.000     -.514875   -.4728786
                    D  |  -.3362446   .0429352    -7.83   0.000     -.420396   -.2520931
                    E  |  -.4997928   .0158796   -31.47   0.000    -.5309163   -.4686693
                    F  |  -.5611241   .0249478   -22.49   0.000     -.610021   -.5122272
                    G  |   .0194142   .0217862     0.89   0.373     -.023286    .0621143
                    H  |  -1.008594   .0306942   -32.86   0.000    -1.068754    -.948435
                    I  |  -.8183857   .0352556   -23.21   0.000    -.8874855   -.7492859
                    J  |  -.1897129   .0424713    -4.47   0.000    -.2729552   -.1064707
                    K  |  -.2890326   .0558031    -5.18   0.000    -.3984047   -.1796605
                    L  |   .8471999   .0288012    29.42   0.000     .7907505    .9036492
                    M  |   .1915364   .0331869     5.77   0.000     .1264913    .2565815
                    N  |   -.942812   .0347507   -27.13   0.000    -1.010922   -.8747019
                    P  |  -.3830779   .0345037   -11.10   0.000     -.450704   -.3154518
                    Q  |  -1.464395   .0548205   -26.71   0.000    -1.571842   -1.356949
                    R  |   .1793846   .0398343     4.50   0.000     .1013108    .2574585
                    S  |   .0734179   .0298216     2.46   0.014     .0149687    .1318671

Like this bOnlySwissOwners becomes completely insignificant and bFamilyOwnedStrict becomes less significant.
On the other hand, the coefficients for many industries become significant (compared to the version with robust standard errors - which I did not include above).
Btw. If I cluster by cantons, the results are a bit similar but then the p-values for the cantonal coefficients become more significant.

With respect to bOnlySwissOwners:
This variable indicates if a firm is only owned by Swiss individuals or by firms which are domiciled in Switzerland. The underlying hypothesis is that Swiss owners are more likely to opt out from an audit as this is a bit of a specialty in Swiss law to which foreigners or foreign investors might not be accustomed to. I am not very surprised that this variable turns out to be insignificant. On the other hand, the variable does not have any missing values. About 70% of firms are Swiss owned whereas only about 7% have only corporate owners. See below a summary of the two variables:

Code:

. summarize bOnlySwissOwners bOnlyCorporateOwners

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
bOnlySwiss~s |    214,343    .6772603    .4675252          0          1
bOnlyCorpo~s |    214,343    .0687543    .2530364          0          1

Kind regards,
Daniel

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#6

18 Feb 2022, 11:03

Daniel:
I would cluster on -industry-.
Obviously, with such a large sample size, statistical significance is easy to reach.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Chul Lee

Join Date: Apr 2019

Posts: 45
#7

18 Feb 2022, 17:41

Daniel:
so, 'bOnlySwissOwner' is 0.68 mean while 'bOnlyCorporateOwners' is 0.07.That is a big difference in my opinion, but you know about the data and it is what it is. I am still wondering, though, maybe other binary predictors are also skewed and overestimate P values (?)in addition to the big sample size.
C
Comment

Announcement