Hello everyone,
I am doing research about racial discrimination in mortgage interest.
Many researchers use OLS model to investigate whether minorities pay a higher interest rate than comparable White borrowers, controlling for borrower demographic characteristics, creditworthiness, and loan features. In addition, researchers will apply ''year fixed effect'' and 'county/MSA effect'' to control unobserved housing market situation at year of loan origination and unobserved effect due to different geographic location of the property. As an example, please see table 4 at page 45 in paper with link https://faculty.haas.berkeley.edu/mo...rs/discrim.pdf
However, the data being used in these papers are in fact cross-sectional data instead of panel data. For example, 20000 obs of loan data at individual level, which were generated between 2009 - 2019. Each obs is a loan being originated by a unique borrower.
As an example of the data:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int as_of_year str10 respondent_id long loan_amount_000s byte(loan_purpose loan_type) int county_code long msamd byte(applicant_race_1 applicant_ethnicity) 2017 "0000068601" 175 1 1 7 . 6 3 2017 "0000063194" 196 1 2 5 38540 5 2 2017 "0000451965" 1079 1 1 1 36084 6 3 2017 "41-1842999" 199 3 1 53 33460 6 1 2017 "0000613307" 800 3 1 47 35614 6 3 2017 "0000451965" 170 3 1 141 21340 5 1 2017 "0000016450" 157 1 2 209 28140 6 2 2017 "73-1577221" 183 3 1 39 26420 6 3 2017 "0000068490" 62 3 1 31 27260 5 2 2017 "0000451965" 196 1 2 91 33874 5 2 2017 "0000060806" 304 1 1 113 19124 6 3 2017 "33-0975529" 82 1 2 115 . 5 2 2017 "7197000003" 112 1 1 189 41180 6 3 2017 "0000504713" 287 1 1 193 43580 5 2 2017 "7197000003" 350 3 1 7 15540 6 3 2017 "0001189117" 158 3 1 119 16740 5 2 2017 "36-4327855" 416 1 1 101 37964 5 2 2017 "62-1532940" 152 1 1 213 . 5 2 2017 "0003303298" 705 1 1 61 35614 6 3 2017 "0000852218" 128 3 1 86 33124 5 1 end
In a cross-sectional data, data are not observed at T time periods, as a result, the unobserved variables cannot be eliminated by demeaning the variables using the within transformation. Also, the explanatory variable of interest, "race of borrower", is a time-constant variable. It will be swept away by using the within transformation.
Hence, my question is, are the "year fixed effect" and "county/MSA fixed effect" in these papers actually just two sets of dummies?
To be more specific, a set of dummies for year of loan origination between 2009 - 2019, e.g. if a loan was originated at 2009, then the dummy for 2009 is 1.
And a set of dummies for all counties/MSA, e.g. if a loan was originated at county 86, then the dummy of county 86 is 1.
Thank you!
Lei