Assistance Needed with Stata Code for Forecasting Loan Default Using Simulated Data and Saved Coefficients

Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#1

Assistance Needed with Stata Code for Forecasting Loan Default Using Simulated Data and Saved Coefficients

22 Sep 2024, 07:52

Hello Colleagues,

I require some help in debugging a Stata code. I know where the error is but I am not sure how to fix it. The error is coming from how I am handling ch_hpi, neg_ch_hpi and int_hpi_ch.

I want to do the following:

I want to forecast loan default.

I have the "CA2016" dataset where the dependent variable is default and the control variables are fico, fico_680, fico_720, fico_760, int_fico680, int_fico720, int_fico760, orig7, orig8, purchase, retail, ch_hpi, neg_ch_hpi, int_hpi_ch, ltv, ltv_dummy, int_ltv, CA, NY, CO, WA, TX, FL, and the intercept.

First, I estimate a logit model using the dependent and control variables.

There are about 1723 observations in this dataset.

I need to save the coefficients from this model to be used later in the session.

In the "sim_ch_hpi_vh" dataset, I have a date variable that goes from Oct 2022 to Sept 2042 and variables Sim1 to Sim100. This dataset has 240 rows and 101 columns.

First, I want to take the first observation from the "CA2016" dataset and forecast default using the saved coefficients estimated earlier. The data points for the corresponding coefficients are already there in the first observation. However, the only variables that will change are ch_hpi, neg_ch_hpi (this is an indicator variable if ch_hpi is negative), and int_hpi_ch (it is the interaction of neg_ch_hpi and ch_hpi). The ch_hpi values will come from the Sim1 to Sim100 variables in the "sim_ch_hpi_vh" dataset.

I want to take the first row from "sim_ch_hpi_vh" and use Sim1 to Sim100 as ch_hpi, and then calculate neg_ch_hpi and int_hpi_ch based on ch_hpi, and forecast default.

I should get 240 rows and 100 columns of default forecasts for the first observation from the CA2016 dataset.

I need to repeat this for 1723 observations in the CA2016 dataset.

However, for each observation in the CA2016 dataset, I only want to keep the 37th, 97th, 157th, 229th, and 240th rows.

In my final dataset, I want to keep the observation ID from the CA2016 dataset, row numbers from "sim_ch_hpi_vh," and the default values corresponding to Sim1 to Sim100. So the final dataset will have 1723*5 rows and 102 columns.

I hope this is not too confusing. I have attached the code. I am facing difficulty in changing the values of ch_hpi, neg_ch_hpi, and int_hpi_ch. The error is "local not found."

Your help is appreciated.

My code is attached.

Thank you.
Sagarika

Attached Files

Forecasting default1.do (3.9 KB, 2 views)
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35212
#2

23 Sep 2024, 06:28

Your do-file was not too large to post directly as CODE. I'll guess that you got no replies before now partly because many long-term members won't open attachments either on principle or because it's too much like work.

That said,

Code:

local int_hpi_ch = local neg_ch_hpi * local ch_hpi

is presumably the problem, which perhaps should be

Code:

local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'

I've checked nothing else, just looked for what might produce that error.
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#3

23 Sep 2024, 18:29

Thank you for the tip, Nick. I didn't know. I am posting for the first time. I will check and let you know. Thank you again.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#4

24 Sep 2024, 04:52

Being new here is fine, and we post a great deal of advice for members new and not so new.

See https://www.statalist.org/forums/help#stata for advice encouraging the posting of code and discouraging the posting of attachments. We should make explicit that that means .do files too.
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#5

24 Sep 2024, 19:55

Hi Nick, thank you. I will keep this in mind for my future posts.

I tried what you suggested and now the error is "invalid syntax". I am pasting the code below. Could you please advise what I am doing wrong?

use final3, clear

logit default fico fico_680 fico_720 fico_760 int_fico680 int_fico720 int_fico760 orig7 orig8 purchase retail ch_hpi neg_ch_hpi int_hpi_ch ltv ltv_dummy int_ltv CA NY CO WA TX FL

* Save the coefficients
matrix coeffs = e(b)

* Load subsample of final3 to start looping through observations
use CA2016, clear

* Initialize a matrix to store results for all observations
* (5 rows per observation) * (number of observations) with 102 columns (1 for observation ID, 1 for row ID, and 100 for Sim1 to Sim100)
matrix all_forecasts = J(1723 * 5, 102, .)

* Counter for tracking the rows in the matrix
local matrix_row = 1

* Preserve CA2016 so that it can be restored in each iteration
preserve

* Loop over each observation in CA2016
forvalues obs = 1/1723 {

* Extract current observation's control variables
local fico = fico[`obs']
local fico_680 = fico_680[`obs']
local fico_720 = fico_720[`obs']
local fico_760 = fico_760[`obs']
local int_fico680 = int_fico680[`obs']
local int_fico720 = int_fico720[`obs']
local int_fico760 = int_fico760[`obs']
local orig7 = orig7[`obs']
local orig8 = orig8[`obs']
local purchase = purchase[`obs']
local retail = retail[`obs']
local ch_hpi = ch_hpi[`obs']
local neg_ch_hpi = neg_ch_hpi[`obs']
local int_hpi_ch = int_hpi_ch[`obs']
local ltv = ltv[`obs']
local ltv_dummy = ltv_dummy[`obs']
local int_ltv = int_ltv[`obs']
local CA = CA[`obs']
local NY = NY[`obs']
local CO = CO[`obs']
local WA = WA[`obs']
local TX = TX[`obs']
local FL = FL[`obs']

* Restore sim_ch_hpi_vh for the simulations
*restore
use sim_ch_hpi_vh, clear

* Loop over the 100 simulations (Sim1 to Sim100)
forvalues sim = 1/100 {
* Replace ch_hpi with the value for the current simulation
local ch_hpi = Sim`sim'

* Calculate neg_ch_hpi and int_hpi_ch for the current simulation
if `ch_hpi' < 0 {
local neg_ch_hpi = 1
}
local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'

* Only keep 37th, 97th, 157th, 229th, and 240th rows
foreach row in 37 97 157 229 240 {
* Calculate the linear prediction using the coefficients and current values
gen linear_pred = `fico'*`coeffs[1, "fico"]' + `fico_680'*`coeffs[1, "fico_680"]' + `fico_720'*`coeffs[1, "fico_720"]' + `fico_760'*`coeffs[1, "fico_760"]' + `int_fico680'*`coeffs[1, "int_fico680"]' + `int_fico720'*`coeffs[1, "int_fico720"]' + `int_fico760'*`coeffs[1, "int_fico760"]' + `orig7'*`coeffs[1, "orig7"]' + `orig8'*`coeffs[1, "orig8"]' + `purchase'*`coeffs[1, "purchase"]' + `retail'*`coeffs[1, "retail"]' + `ch_hpi'*`coeffs[1, "ch_hpi"]' + `neg_ch_hpi'*`coeffs[1, "neg_ch_hpi"]' + `int_hpi_ch'*`coeffs[1, "int_hpi_ch"]' + `ltv'*`coeffs[1, "ltv"]' + `ltv_dummy'*`coeffs[1, "ltv_dummy"]' + `int_ltv'*`coeffs[1, "int_ltv"]' + `CA'*`coeffs[1, "CA"]' + `NY'*`coeffs[1, "NY"]' + `CO'*`coeffs[1, "CO"]' + `WA'*`coeffs[1, "WA"]' + `TX'*`coeffs[1, "TX"]' + `FL'*`coeffs[1, "FL"]'

* Calculate the logit probability of default
gen default_prob = invlogit(linear_pred)

* Store the forecast for the selected row and current simulation
* Store the forecast, row_id from dataset 2, and observation_id from dataset 1
matrix all_forecasts[`matrix_row', 1] = `obs'
matrix all_forecasts[`matrix_row', 2] = `row'
matrix all_forecasts[`matrix_row', `sim' + 2] = default_prob

* Move to the next row in the matrix
local matrix_row = `matrix_row' + 1
}
}

* Restore dataset1 for the next observation
preserve
}

* Now that the matrix is populated, output it to Excel
putexcel set "all_default_forecasts.xlsx", sheet("Sheet1") replace
putexcel A1 = matrix(all_forecasts)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#6

25 Sep 2024, 04:33

Which command produces that error?

Please use CODE delimiters for code as explained at 12.3 in https://www.statalist.org/forums/help#stata
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#7

25 Sep 2024, 18:17

Hi Nick,
I used delimiters. I got the following error

. forvalues obs = 1/1723 {
> * Extract current observation's control variables ;
program error: code follows on the same line as open brace
r(198);

I don't think any code following on the same line as open brace.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#8

26 Sep 2024, 01:12

That could arise from a non-printable character, e.g. ASCII 160.

You’re not yet using CODE delimiters in the sense of this forum. Please study FAQ Advice #12.
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#9

26 Sep 2024, 03:27

Hi Nick,
Please see if this is what you mean

use final3, clear

logit default fico fico_680 fico_720 fico_760 int_fico680 int_fico720 int_fico760 orig7 orig8 purchase retail ch_hpi neg_ch_hpi int_hpi_ch ltv ltv_dummy int_ltv CA NY CO WA TX FL

* Save the coefficients
matrix coeffs = e(b)

* Load subsample of final3 to start looping through observations
use CA2016, clear

* Initialize a matrix to store results for all observations
* (5 rows per observation) * (number of observations) with 102 columns (1 for observation ID, 1 for row ID, and 100 for Sim1 to Sim100)
matrix all_forecasts = J(1723 * 5, 102, .)

* Counter for tracking the rows in the matrix
local matrix_row = 1

* Preserve CA2016 so that it can be restored in each iteration
preserve

* Loop over each observation in CA2016
forvalues obs = 1/1723 {

* Extract current observation's control variables
local fico = fico[`obs']
local fico_680 = fico_680[`obs']
local fico_720 = fico_720[`obs']
local fico_760 = fico_760[`obs']
local int_fico680 = int_fico680[`obs']
local int_fico720 = int_fico720[`obs']
local int_fico760 = int_fico760[`obs']
local orig7 = orig7[`obs']
local orig8 = orig8[`obs']
local purchase = purchase[`obs']
local retail = retail[`obs']
local ch_hpi = ch_hpi[`obs']
local neg_ch_hpi = neg_ch_hpi[`obs']
local int_hpi_ch = int_hpi_ch[`obs']
local ltv = ltv[`obs']
local ltv_dummy = ltv_dummy[`obs']
local int_ltv = int_ltv[`obs']
local CA = CA[`obs']
local NY = NY[`obs']
local CO = CO[`obs']
local WA = WA[`obs']
local TX = TX[`obs']
local FL = FL[`obs']

* Restore sim_ch_hpi_vh for the simulations
*restore
use sim_ch_hpi_vh, clear

* Loop over the 100 simulations (Sim1 to Sim100)
forvalues sim = 1/100 {
* Replace ch_hpi with the value for the current simulation
local ch_hpi = sim`sim'

* Calculate neg_ch_hpi and int_hpi_ch for the current simulation
if `ch_hpi' < 0 {
local neg_ch_hpi = 1
}
local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'

* Only keep 37th, 97th, 157th, 229th, and 240th rows
foreach row in 37 97 157 229 240 {
* Calculate the linear prediction using the coefficients and current values
gen linear_pred = `fico'*`coeffs[1, "fico"]' + `fico_680'*`coeffs[1, "fico_680"]' + `fico_720'*`coeffs[1, "fico_720"]' + `fico_760'*`coeffs[1, "fico_760"]' + `int_fico680'*`coeffs[1, "int_fico680"]' + `int_fico720'*`coeffs[1, "int_fico720"]' + `int_fico760'*`coeffs[1, "int_fico760"]' + `orig7'*`coeffs[1, "orig7"]' + `orig8'*`coeffs[1, "orig8"]' + `purchase'*`coeffs[1, "purchase"]' + `retail'*`coeffs[1, "retail"]' + `ch_hpi'*`coeffs[1, "ch_hpi"]' + `neg_ch_hpi'*`coeffs[1, "neg_ch_hpi"]' + `int_hpi_ch'*`coeffs[1, "int_hpi_ch"]' + `ltv'*`coeffs[1, "ltv"]' + `ltv_dummy'*`coeffs[1, "ltv_dummy"]' + `int_ltv'*`coeffs[1, "int_ltv"]' + `CA'*`coeffs[1, "CA"]' + `NY'*`coeffs[1, "NY"]' + `CO'*`coeffs[1, "CO"]' + `WA'*`coeffs[1, "WA"]' + `TX'*`coeffs[1, "TX"]' + `FL'*`coeffs[1, "FL"]'

* Calculate the logit probability of default
gen default_prob = invlogit(linear_pred)

* Store the forecast for the selected row and current simulation
* Store the forecast, row_id from dataset 2, and observation_id from dataset 1
matrix all_forecasts[`matrix_row', 1] = `obs'
matrix all_forecasts[`matrix_row', 2] = `row'
matrix all_forecasts[`matrix_row', `sim' + 2] = default_prob

* Move to the next row in the matrix
local matrix_row = `matrix_row' + 1
}
}

* Restore dataset1 for the next observation
preserve
}

* Now that the matrix is populated, output it to Excel
putexcel set "all_default_forecasts.xlsx", sheet("Sheet1") replace
putexcel A1 = matrix(all_forecasts)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35212
#10

26 Sep 2024, 04:33

No; it's not. It's more readable than what you had before, but it's not ideal.

Code:

sysuse auto, clear scatter mpg weight

is the kind of display you should get if you follow instructions.

Now in terms of your error message. There may be gunk following the code line

Code:

forvalues obs = 1/1723 {

either on the same line or on the next line which looks like one or more spaces followed by some end-of-line delimiter(s).

There are at least four things to try.

1. Are you editing code with MS Word or some other word processor? If so, don't do that. Word processors leave formatting characters in documents; that is how they work, but it doesn't help coding.

2. A good text editor will show you surprising characters in some way.

3. chartab from SSC will tell you about surprising characters in your do-file.

4. Just retype the command and the following lines in your do-file using Stata's do-file editor or any text editor as good or better.
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#11

26 Sep 2024, 05:29

Thank you. I will do as you suggested.
Comment
Sagarika Mishra

Join Date: Sep 2024

Posts: 10
#12

29 Sep 2024, 03:24

There were other issues with my code related to how I was accessing the coefficients saved in the matrix and the structure of the nested loops. These issues have now been fixed, and the code is running. Thank you.
Comment

Announcement

Assistance Needed with Stata Code for Forecasting Loan Default Using Simulated Data and Saved Coefficients

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment