Propensity Score Matching to estimate racial discrimination

Lei Jin

Join Date: May 2021
Posts: 20

Propensity Score Matching to estimate racial discrimination

14 Dec 2021, 05:06

Hello everyone,

I am doing research on racial discrimination at the loan approval decisions, i.e. whether minority borrowers have a lower approval probability than similar white borrowers, c.p.

Actually, previous research has already raised evidence of discrimination against minorities using a logit regression in the following form,

P(approval)= f (minority status, loan features, borrower characteristics, ..., and some other controls)

A negative and statistically significant coefficient from this logit model reveals that minority status reduces the loan approval probability, c.p.

Nevertheless, this model is unable to distinguish between differential treatment and disparate impact discrimination. In the form of differential treatment discrimination, two otherwise equal borrowers - except their race and ethnicity - will be treated differently by lenders. The second form - disparate impact discrimination - has a legal cover but can have an unintentional disparate impact against minority borrowers. One example is that lenders could set a minimum income level for all borrowers. This seemingly race-blind requirement will most likely negatively impact minority borrowers but not white borrowers because on average minorities have a lower income level than white.

The best way and the only way to isolate differential treatment discrimination in loan approvals is the paired testing methodology. Specifically, two applicants with the same credit histories and in need of the same type of loan would apply for a mortgage at the same lender. In this setting, the observed differences in treatment only reflect the differential treatment discrimination because two applicants are identically qualified. But the paired testing methodology is hardly practical in real life, because of the fact that pushing pair testing into the loan approval stage might be illegal and face high legal bills.

I noticed that the propensity score matching is used to balance the distribution of covariates, in other words, it will match the observations and make them the most similar in the covariables except the treatment indicator - in our case, the minority indicator. In other words, the propensity score matching seems perfectly imitate the paired testing. The minority-status impact is just the difference between the observed value of one observation and the observed value of its matching. Race as a treatment seems to be unreasonable. But maybe we can assume that a borrower enrolled in a "minority program" when he/she was born. The borrower enrolled in this minority program might have a lower income or other disadvantages in the future.

In fact, when I run the baseline logit model,

Code:

logit approval minority income_w dti20 dti20_30 dti30_36 dti36_49 dti50_60  fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95  origination_2019  refinance female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen tract_owner_occupied_units tract_one_to_four_family_homes tract_median_age_of_housing_unit cra fhfa_index

I got the following result, i.e. the minority indicator has a negative value equaling -.391 at p<0.0001

Code:

Logistic regression                                   Number of obs =  250,000
                                                      LR chi2(28)   = 55744.90
                                                      Prob < chi2   =   0.0000
Log likelihood = -88966.138                           Pseudo R2     =   0.2386

--------------------------------------------------------------------------------------------------
                        approval | Coefficient  Std. err.      z    P<|z|     [95% conf. interval]
---------------------------------+----------------------------------------------------------------
                        minority |   -.391179   .0152328   -25.68   0.000    -.4210346   -.3613233
                        income_w |    .004729   .0001977    23.93   0.000     .0043416    .0051163
                           dti20 |   2.967021   .0540445    54.90   0.000     2.861096    3.072947
                        dti20_30 |   3.664266   .0424642    86.29   0.000     3.581038    3.747495
                        dti30_36 |   3.892662   .0413462    94.15   0.000     3.811625    3.973699
                        dti36_49 |   3.960401   .0378111   104.74   0.000     3.886293    4.034509
                        dti50_60 |   3.709353    .038279    96.90   0.000     3.634328    3.784378
                     fico680_699 |   .0205687   .0424997     0.48   0.628    -.0627291    .1038665
                     fico700_719 |     .11979   .0419051     2.86   0.004     .0376574    .2019225
                     fico720_739 |   .0570352   .0466314     1.22   0.221    -.0343607    .1484311
                           ltv80 |  -.2817957   .0251547   -11.20   0.000     -.331098   -.2324933
                        ltv80_85 |   -.043908   .0258024    -1.70   0.089    -.0944797    .0066637
                        ltv85_90 |  -.2121639   .0289899    -7.32   0.000    -.2689831   -.1553448
                        ltv90_95 |  -.3095127   .0256459   -12.07   0.000    -.3597778   -.2592476
                origination_2019 |   .2395657   .0132166    18.13   0.000     .2136617    .2654698
                       refinance |   -1.23423   .0219976   -56.11   0.000    -1.277345   -1.191116
                          female |    -.02576   .0135202    -1.91   0.057    -.0522592    .0007392
                           age62 |  -.3451483   .0198167   -17.42   0.000    -.3839883   -.3063083
                   lender_top100 |  -.4454505   .0154397   -28.85   0.000    -.4757118   -.4151892
                      shadowbank |  -.0205853   .0167077    -1.23   0.218    -.0533318    .0121612
                         fintech |  -.1228223   .0212574    -5.78   0.000    -.1644859   -.0811586
                             aus |   2.048448   .0218263    93.85   0.000     2.005669    2.091226
tract_minority_population_percen |    .003672    .000289    12.71   0.000     .0031055    .0042384
      tract_owner_occupied_units |   .0001755   .0000218     8.07   0.000     .0001329    .0002182
  tract_one_to_four_family_homes |  -.0000885   .0000165    -5.35   0.000    -.0001209   -.0000561
tract_median_age_of_housing_unit |  -.0005254   .0004295    -1.22   0.221    -.0013672    .0003164
                             cra |  -.1248721   .0165081    -7.56   0.000    -.1572274   -.0925168
                      fhfa_index |   .0417232   .0042321     9.86   0.000     .0334285    .0500179
                           _cons |  -3.884013   .0679459   -57.16   0.000    -4.017184   -3.750841
--------------------------------------------------------------------------------------------------

Next, we run the propensity score matching in the same sample by using - teffects psmatch -,

Code:

teffects psmatch (approval) (minority income_w dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen tract_owner_occupied_units tract_one_to_four_family_homes tract_median_age_of_housing_unit cra fhfa_index)

, and we got the average treatment effect equals only -.044 at p<0.0001

Code:

Treatment-effects estimation                   Number of obs      =    250,000
Estimator      : propensity-score matching     Matches: requested =          1
Outcome model  : matching                                     min =          1
Treatment model: logit                                        max =          3
------------------------------------------------------------------------------
             |              AI robust
    approval | Coefficient  std. err.      z    P<|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
ATE          |
    minority |
   (1 vs 0)  |   -.043562   .0026974   -16.15   0.000    -.0488489   -.0382751
------------------------------------------------------------------------------

From the above result, we noticed the minority coefficient changed from -.391 to only -.044, both at p<0.0001. If the propensity score matching imitates the paired testing well, then we can conclude that differential treatment discrimination is not the major concern, while the disparate impact discrimination plays the main role in discrimination at the loan origination decisions.

Can we use the propensity score matching to imitate the paired testing and isolate the differential treatment discrimination?
Is this method feasible?

Thanks！

Tags: None

Lei Jin

Join Date: May 2021
Posts: 20

14 Dec 2021, 05:11

The sample data is attached. The data was divided into two pieces because of the max var number - dataex - limits.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(approval minority) long income_w float(dti20 dti20_30 dti30_36 dti36_49 dti50_60 fico680_699 fico700_719 fico720_739 ltv80 ltv80_85 ltv85_90 ltv90_95 origination_2019 refinance)
0 0  36 0 0 1 0 0 1 0 0 1 0 0 0 1 1
1 1  80 0 0 0 0 1 0 1 0 1 0 0 0 1 1
1 0 127 0 1 0 0 0 0 1 0 1 0 0 0 0 0
1 1  75 0 0 0 1 0 0 1 0 1 0 0 0 1 1
0 0  28 0 0 0 1 0 1 0 0 1 0 0 0 1 1
end

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(female age62 lender_top100 shadowbank fintech aus tract_minority_population_percen) int(tract_owner_occupied_units tract_one_to_four_family_homes) byte tract_median_age_of_housing_unit float cra double fhfa_index
1 0 0 0 0 1  7.86 1544 2072 26 0 6.66
1 0 0 0 0 1 51.04 1263 1758 33 0 4.23
1 0 0 0 0 1 10.89 1341 1393 59 0 5.02
0 0 1 0 0 1 70.06  914 1245 57 1  6.4
1 1 1 1 0 0 11.23 3191 3557 15 0 5.67
end

Comment

Fei Wang

Join Date: Oct 2021

Posts: 726
#3

14 Dec 2021, 08:09

I think it's reasonable to improve your estimation using PSM. But the -0.04 obtained from PSM is essentially a linear regression estimate, and cannot be directly compared with the original logit coefficient -0.39 -- Need to compute the marginal effect from logit and compare it with the -0.04.
1 like
Comment
Lei Jin

Join Date: May 2021

Posts: 20
#4

15 Dec 2021, 05:55

Hi Fei, thanks for the comment!

I am not very clear about why the coefficient reported by the -teffects psmatch- is the marginal effect. The following is my understanding of how -teffects psmatch- works:

1) estimate the propensity score (PSC) using a logit model, where the dependent variable is the treatment variable, in our case, the minority status,
i.e. P(minority status)= f (loan features, borrower characteristics, ..., and some other controls)

2) match the observations with the nearest PSC into pairs

3) compute the difference of the outcome of observations within each pair. The outcome is estimated by a logit model, where the dependent variable is the loan approval decision,
i.e. P(approval)= f (minority status, loan features, borrower characteristics, ..., and some other controls)

4) average the differences among all pairs, and we get the average treatment effect of the minority status in terms of log odds of loan approval

Is my understanding problematic?

Meanwhile, the average marginal effect of the minority coefficient from the logit model is -.043, almost identical to the PSM result -.044.
If the PSM result reveals the marginal effect, then I guess the only takeaway from the PSM result is that the baseline logit model should not be concerned with confounding variables?

Last edited by Lei Jin; 15 Dec 2021, 05:58.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#5

15 Dec 2021, 06:53

3) compute the difference of the outcome of observations within each pair. The outcome is estimated by a logit model, where the dependent variable is the loan approval decision,
i.e. P(approval)= f (minority status, loan features, borrower characteristics, ..., and some other controls)

This part is incorrect. The outcome is estimated by a linear model on the matched pairs. So the ATE is simply the effect of being minority on the probability of approval.

Meanwhile, the average marginal effect of the minority coefficient from the logit model is -.043, almost identical to the PSM result -.044.
If the PSM result reveals the marginal effect, then I guess the only takeaway from the PSM result is that the baseline logit model should not be concerned with confounding variables?

Essentially, matching is merely a model improvement on observables, and cannot solve the issue of omitting unobservables. I would at most conclude that the marginal effect is robust to the functional forms of the observables.

Last edited by Fei Wang; 15 Dec 2021, 07:02.
3 likes
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

15 Dec 2021, 07:34

Originally posted by Fei Wang View Post

This part is incorrect. The outcome is estimated by a linear model on the matched pairs. So the ATE is simply the effect of being minority on the probability of approval.
...

I think this bit needs to be emphasized more. Logistic regression, as we all know, works on the log-odds scale, and gives odds ratios when you exponentiate the coefficients. If you run margins, you get results on the probability scale, i.e. you get a risk difference.

teffects psmatch inherently works on the probability scale, i.e. the coefficients are risk differences.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Lei Jin

Join Date: May 2021

Posts: 20
#7

15 Dec 2021, 18:51

Thanks, Fei and Weiwen, I will go through the mechanism of the PSM again!
Comment

Announcement

Propensity Score Matching to estimate racial discrimination

Comment

Comment

Comment

Comment

Comment

Comment