Scatterplot no correlation and fixed effects

Josephine Nicolai

Join Date: Jun 2021

Posts: 20
#1

Scatterplot no correlation and fixed effects

14 Dec 2022, 04:12

Hi everyone,

I'm investigating the relationship of the gender diversity in partnerships of audit firms (GDR) on the audit quality (abs_ModDACC). When I do a two-way scatterplot this is what I get (see attachement). Audit quality is measured by absolute discretionary accruals and the lower the accruals are the higher the audit quality. GDR is measured by proportion female/total, but I still need to create a better variable that when this ratio is 0.5 (50%), the new variable will be 1 (100%, because 50/50 is most diverse). However, this is not perse the problem for now so I won't go deeper into this. The code I used for the scatter:

Code:

twoway(scatter abs_ModDACC GDR)

I cannot really find a similar scatter on the internet, so how can I interpret this scatterplot? I've also tried to run a regression, but the gender diversity ratio is always insignificant. I actually don't know whether I do this correctly. First I did this, because I have panel data:

Code:

egen id = group(CIK) xtset id Year, yearly order id

I'm not sure that when you want to cluster at industry level, you should group based on the CIK code (firm code) or SIC2 (industry code). Below are some codes I've tried to run the regression. I tried to cluster standard errors and tried to add industry fixed effects (SIC2) and year fixed effects (Year). All the other variables are my control variables.

Code:

areg abs_ModDACC i.Year GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize, absorb(SIC2) vce(cluster SIC2) areg abs_ModDACC GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize, absorb(SIC2) cluster (SIC2) regress abs_ModDACC i.Year i.SIC2 GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize, robust regress abs_ModDACC i.Year i.SIC2 GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize regress abs_ModDACC i.Year GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize xtreg abs_ModDACC GDR Leverage Restatement TotalAccruals ICW GEOSEG BUSSEG Busyness Loss SalesGrowth MTB lnCFO specialist Big4 lnAuditFees lnNonAuditFees lnClientOfficeSize, fe vce(cluster CIK)

Prior literature shows that most of the time industry and year fixed effects are used, and clustered standard errors at client or audit firm level. I think I have to run a fixed effects model because I have panel data.
If I need to add more information so that someone can help me to elaborate on this, please let me know! I'm really struggling with it...

Thanks in advance and kind regards,
Josephine

Attached Files
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35433
#2

14 Dec 2022, 06:08

I will focus on trying to get a more informative scatter plot. You have some outliers there that need a story.

First, I recommend ms(Oh) as a default for scatter plots with a result that is less heavy on the eye and will often make scanning a plot easier, or at worst not less difficult.

If there are no zeros in abs_ModDACC then use log scale. If there are zeros, plot using

Code:

log(1 + absModDACC)

and if that strikes any reader (e.g. John Mullahy) as ad hoc,

1. Yes; it is.

2. Here I translate ad hoc as for the purpose.

3. log (1 + 0) is naturally zero and log (1 + x) for small x is close to x and that kind of behaviour often seems about right in a transformation of a non-negative variable. Here log() means ln() if you prefer that notation.

4. This is just to get a better graph and doesn't rule out less adhockery in modelling, such as a generalized linear model or a two-part model.

5. What's your alternative suggestion?
3 likes
Comment
Josephine Nicolai

Join Date: Jun 2021

Posts: 20
#3

14 Dec 2022, 07:38

Thankyou Nick!
I indeed tried to handle the outliers. This gave me the following scatterplots (see attachments). These are the two commands I used to winsorize:

Code:

winsor2 abs_ModDACC, replace cut(5 95) winsor2 abs_ModDACC, replace cut(5 95) trim

There were zeros in my dataset, so I didn't use that code you suggested. Also, if you think that I handled my outliers correctly, could you help me with the other commands in my previous post, including clustered standard errors and fixed effects?

Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#4

14 Dec 2022, 09:20

I wasn't recommending Winsorizing, but I was recommending a particular transformation that copes with zeros. What is more evident on your latest plot is, unsurprisingly, that you have many zeros on GDR too.

I can't easily predict any results when you bring in more predictors. The problem seems inevitably one where it is hard, if not impossible, to identify the role of one predictor among many.

Last edited by Nick Cox; 14 Dec 2022, 09:37.
Comment
Josephine Nicolai

Join Date: Jun 2021

Posts: 20
#5

14 Dec 2022, 09:54

I winsorized because that helps with outliers right? I saw a YouTube video for that. You said my outliers need a story, but what do you mean with that?

So following your advice, should I generate a new variable using that command you sent with log in it? I used this code doing that and I did not winsorize beforehand:

Code:

gen Logabs_ModDACC = log(1+ abs_ModDACC) twoway(scatter Logabs_ModDACC GDR, ms(Oh))

Is this correct? Is this the log transformation you mean? I still have zeros (see attachment). And what should I do with GDR, also log transform? I GDR of 0 is possible, because that just means there a zero women in a partnership of an audit firm

What do you mean with "I can't easily predict any results when you bring in more predictors."? These are common control variables used, should I sent you the code without? Independent variable is still GDR and dependent Audit quality.
Attached Files
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35433
#6

14 Dec 2022, 10:41

I mean that you should seek an explanation of those outliers, which I guess starts with seeing what else you know about the data points, which firm, industry, sector or whatever else you know.

My stance is that outliers should be omitted if and only if they turn out to be wrong, irrelevant to your goals, or both. Winsorizers often disagree.

indeed you have zeros after transformation as well as before. An important detail of that transformation is that zeros map to zeros, as was explicit in #2.

I am just focusing on exploratory graphics. You need economic advice on how best to model your data, and I am not an economist.
Comment

Announcement

Scatterplot no correlation and fixed effects

Comment

Comment

Comment

Comment

Comment