Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assistance Needed with Stata Code for Forecasting Loan Default Using Simulated Data and Saved Coefficients

    Hello Colleagues,

    I require some help in debugging a Stata code. I know where the error is but I am not sure how to fix it. The error is coming from how I am handling ch_hpi, neg_ch_hpi and int_hpi_ch.

    I want to do the following:

    I want to forecast loan default.

    I have the "CA2016" dataset where the dependent variable is default and the control variables are fico, fico_680, fico_720, fico_760, int_fico680, int_fico720, int_fico760, orig7, orig8, purchase, retail, ch_hpi, neg_ch_hpi, int_hpi_ch, ltv, ltv_dummy, int_ltv, CA, NY, CO, WA, TX, FL, and the intercept.

    First, I estimate a logit model using the dependent and control variables.

    There are about 1723 observations in this dataset.

    I need to save the coefficients from this model to be used later in the session.

    In the "sim_ch_hpi_vh" dataset, I have a date variable that goes from Oct 2022 to Sept 2042 and variables Sim1 to Sim100. This dataset has 240 rows and 101 columns.

    First, I want to take the first observation from the "CA2016" dataset and forecast default using the saved coefficients estimated earlier. The data points for the corresponding coefficients are already there in the first observation. However, the only variables that will change are ch_hpi, neg_ch_hpi (this is an indicator variable if ch_hpi is negative), and int_hpi_ch (it is the interaction of neg_ch_hpi and ch_hpi). The ch_hpi values will come from the Sim1 to Sim100 variables in the "sim_ch_hpi_vh" dataset.

    I want to take the first row from "sim_ch_hpi_vh" and use Sim1 to Sim100 as ch_hpi, and then calculate neg_ch_hpi and int_hpi_ch based on ch_hpi, and forecast default.

    I should get 240 rows and 100 columns of default forecasts for the first observation from the CA2016 dataset.

    I need to repeat this for 1723 observations in the CA2016 dataset.

    However, for each observation in the CA2016 dataset, I only want to keep the 37th, 97th, 157th, 229th, and 240th rows.

    In my final dataset, I want to keep the observation ID from the CA2016 dataset, row numbers from "sim_ch_hpi_vh," and the default values corresponding to Sim1 to Sim100. So the final dataset will have 1723*5 rows and 102 columns.

    I hope this is not too confusing. I have attached the code. I am facing difficulty in changing the values of ch_hpi, neg_ch_hpi, and int_hpi_ch. The error is "local not found."

    Your help is appreciated.

    My code is attached.

    Thank you.
    Sagarika

    Attached Files

  • #2
    Your do-file was not too large to post directly as CODE. I'll guess that you got no replies before now partly because many long-term members won't open attachments either on principle or because it's too much like work.

    That said,

    Code:
     local int_hpi_ch = local neg_ch_hpi * local ch_hpi
    is presumably the problem, which perhaps should be

    Code:
     local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'
    I've checked nothing else, just looked for what might produce that error.

    Comment


    • #3
      Thank you for the tip, Nick. I didn't know. I am posting for the first time. I will check and let you know. Thank you again.

      Comment


      • #4
        Being new here is fine, and we post a great deal of advice for members new and not so new.

        See https://www.statalist.org/forums/help#stata for advice encouraging the posting of code and discouraging the posting of attachments. We should make explicit that that means .do files too.

        Comment


        • #5
          Hi Nick, thank you. I will keep this in mind for my future posts.

          I tried what you suggested and now the error is "invalid syntax". I am pasting the code below. Could you please advise what I am doing wrong?

          use final3, clear

          logit default fico fico_680 fico_720 fico_760 int_fico680 int_fico720 int_fico760 orig7 orig8 purchase retail ch_hpi neg_ch_hpi int_hpi_ch ltv ltv_dummy int_ltv CA NY CO WA TX FL

          * Save the coefficients
          matrix coeffs = e(b)

          * Load subsample of final3 to start looping through observations
          use CA2016, clear

          * Initialize a matrix to store results for all observations
          * (5 rows per observation) * (number of observations) with 102 columns (1 for observation ID, 1 for row ID, and 100 for Sim1 to Sim100)
          matrix all_forecasts = J(1723 * 5, 102, .)

          * Counter for tracking the rows in the matrix
          local matrix_row = 1

          * Preserve CA2016 so that it can be restored in each iteration
          preserve

          * Loop over each observation in CA2016
          forvalues obs = 1/1723 {

          * Extract current observation's control variables
          local fico = fico[`obs']
          local fico_680 = fico_680[`obs']
          local fico_720 = fico_720[`obs']
          local fico_760 = fico_760[`obs']
          local int_fico680 = int_fico680[`obs']
          local int_fico720 = int_fico720[`obs']
          local int_fico760 = int_fico760[`obs']
          local orig7 = orig7[`obs']
          local orig8 = orig8[`obs']
          local purchase = purchase[`obs']
          local retail = retail[`obs']
          local ch_hpi = ch_hpi[`obs']
          local neg_ch_hpi = neg_ch_hpi[`obs']
          local int_hpi_ch = int_hpi_ch[`obs']
          local ltv = ltv[`obs']
          local ltv_dummy = ltv_dummy[`obs']
          local int_ltv = int_ltv[`obs']
          local CA = CA[`obs']
          local NY = NY[`obs']
          local CO = CO[`obs']
          local WA = WA[`obs']
          local TX = TX[`obs']
          local FL = FL[`obs']

          * Restore sim_ch_hpi_vh for the simulations
          *restore
          use sim_ch_hpi_vh, clear

          * Loop over the 100 simulations (Sim1 to Sim100)
          forvalues sim = 1/100 {
          * Replace ch_hpi with the value for the current simulation
          local ch_hpi = Sim`sim'

          * Calculate neg_ch_hpi and int_hpi_ch for the current simulation
          if `ch_hpi' < 0 {
          local neg_ch_hpi = 1
          }
          local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'

          * Only keep 37th, 97th, 157th, 229th, and 240th rows
          foreach row in 37 97 157 229 240 {
          * Calculate the linear prediction using the coefficients and current values
          gen linear_pred = `fico'*`coeffs[1, "fico"]' + `fico_680'*`coeffs[1, "fico_680"]' + `fico_720'*`coeffs[1, "fico_720"]' + `fico_760'*`coeffs[1, "fico_760"]' + `int_fico680'*`coeffs[1, "int_fico680"]' + `int_fico720'*`coeffs[1, "int_fico720"]' + `int_fico760'*`coeffs[1, "int_fico760"]' + `orig7'*`coeffs[1, "orig7"]' + `orig8'*`coeffs[1, "orig8"]' + `purchase'*`coeffs[1, "purchase"]' + `retail'*`coeffs[1, "retail"]' + `ch_hpi'*`coeffs[1, "ch_hpi"]' + `neg_ch_hpi'*`coeffs[1, "neg_ch_hpi"]' + `int_hpi_ch'*`coeffs[1, "int_hpi_ch"]' + `ltv'*`coeffs[1, "ltv"]' + `ltv_dummy'*`coeffs[1, "ltv_dummy"]' + `int_ltv'*`coeffs[1, "int_ltv"]' + `CA'*`coeffs[1, "CA"]' + `NY'*`coeffs[1, "NY"]' + `CO'*`coeffs[1, "CO"]' + `WA'*`coeffs[1, "WA"]' + `TX'*`coeffs[1, "TX"]' + `FL'*`coeffs[1, "FL"]'

          * Calculate the logit probability of default
          gen default_prob = invlogit(linear_pred)

          * Store the forecast for the selected row and current simulation
          * Store the forecast, row_id from dataset 2, and observation_id from dataset 1
          matrix all_forecasts[`matrix_row', 1] = `obs'
          matrix all_forecasts[`matrix_row', 2] = `row'
          matrix all_forecasts[`matrix_row', `sim' + 2] = default_prob

          * Move to the next row in the matrix
          local matrix_row = `matrix_row' + 1
          }
          }

          * Restore dataset1 for the next observation
          preserve
          }

          * Now that the matrix is populated, output it to Excel
          putexcel set "all_default_forecasts.xlsx", sheet("Sheet1") replace
          putexcel A1 = matrix(all_forecasts)

          Comment


          • #6
            Which command produces that error?

            Please use CODE delimiters for code as explained at 12.3 in https://www.statalist.org/forums/help#stata

            Comment


            • #7
              Hi Nick,
              I used delimiters. I got the following error

              . forvalues obs = 1/1723 {
              > * Extract current observation's control variables ;
              program error: code follows on the same line as open brace
              r(198);

              I don't think any code following on the same line as open brace.

              Comment


              • #8
                That could arise from a non-printable character, e.g. ASCII 160.

                You’re not yet using CODE delimiters in the sense of this forum. Please study FAQ Advice #12.

                Comment


                • #9
                  Hi Nick,
                  Please see if this is what you mean

                  use final3, clear

                  logit default fico fico_680 fico_720 fico_760 int_fico680 int_fico720 int_fico760 orig7 orig8 purchase retail ch_hpi neg_ch_hpi int_hpi_ch ltv ltv_dummy int_ltv CA NY CO WA TX FL

                  * Save the coefficients
                  matrix coeffs = e(b)

                  * Load subsample of final3 to start looping through observations
                  use CA2016, clear

                  * Initialize a matrix to store results for all observations
                  * (5 rows per observation) * (number of observations) with 102 columns (1 for observation ID, 1 for row ID, and 100 for Sim1 to Sim100)
                  matrix all_forecasts = J(1723 * 5, 102, .)

                  * Counter for tracking the rows in the matrix
                  local matrix_row = 1

                  * Preserve CA2016 so that it can be restored in each iteration
                  preserve

                  * Loop over each observation in CA2016
                  forvalues obs = 1/1723 {

                  * Extract current observation's control variables
                  local fico = fico[`obs']
                  local fico_680 = fico_680[`obs']
                  local fico_720 = fico_720[`obs']
                  local fico_760 = fico_760[`obs']
                  local int_fico680 = int_fico680[`obs']
                  local int_fico720 = int_fico720[`obs']
                  local int_fico760 = int_fico760[`obs']
                  local orig7 = orig7[`obs']
                  local orig8 = orig8[`obs']
                  local purchase = purchase[`obs']
                  local retail = retail[`obs']
                  local ch_hpi = ch_hpi[`obs']
                  local neg_ch_hpi = neg_ch_hpi[`obs']
                  local int_hpi_ch = int_hpi_ch[`obs']
                  local ltv = ltv[`obs']
                  local ltv_dummy = ltv_dummy[`obs']
                  local int_ltv = int_ltv[`obs']
                  local CA = CA[`obs']
                  local NY = NY[`obs']
                  local CO = CO[`obs']
                  local WA = WA[`obs']
                  local TX = TX[`obs']
                  local FL = FL[`obs']

                  * Restore sim_ch_hpi_vh for the simulations
                  *restore
                  use sim_ch_hpi_vh, clear

                  * Loop over the 100 simulations (Sim1 to Sim100)
                  forvalues sim = 1/100 {
                  * Replace ch_hpi with the value for the current simulation
                  local ch_hpi = sim`sim'

                  * Calculate neg_ch_hpi and int_hpi_ch for the current simulation
                  if `ch_hpi' < 0 {
                  local neg_ch_hpi = 1
                  }
                  local int_hpi_ch = `neg_ch_hpi' * `ch_hpi'

                  * Only keep 37th, 97th, 157th, 229th, and 240th rows
                  foreach row in 37 97 157 229 240 {
                  * Calculate the linear prediction using the coefficients and current values
                  gen linear_pred = `fico'*`coeffs[1, "fico"]' + `fico_680'*`coeffs[1, "fico_680"]' + `fico_720'*`coeffs[1, "fico_720"]' + `fico_760'*`coeffs[1, "fico_760"]' + `int_fico680'*`coeffs[1, "int_fico680"]' + `int_fico720'*`coeffs[1, "int_fico720"]' + `int_fico760'*`coeffs[1, "int_fico760"]' + `orig7'*`coeffs[1, "orig7"]' + `orig8'*`coeffs[1, "orig8"]' + `purchase'*`coeffs[1, "purchase"]' + `retail'*`coeffs[1, "retail"]' + `ch_hpi'*`coeffs[1, "ch_hpi"]' + `neg_ch_hpi'*`coeffs[1, "neg_ch_hpi"]' + `int_hpi_ch'*`coeffs[1, "int_hpi_ch"]' + `ltv'*`coeffs[1, "ltv"]' + `ltv_dummy'*`coeffs[1, "ltv_dummy"]' + `int_ltv'*`coeffs[1, "int_ltv"]' + `CA'*`coeffs[1, "CA"]' + `NY'*`coeffs[1, "NY"]' + `CO'*`coeffs[1, "CO"]' + `WA'*`coeffs[1, "WA"]' + `TX'*`coeffs[1, "TX"]' + `FL'*`coeffs[1, "FL"]'

                  * Calculate the logit probability of default
                  gen default_prob = invlogit(linear_pred)

                  * Store the forecast for the selected row and current simulation
                  * Store the forecast, row_id from dataset 2, and observation_id from dataset 1
                  matrix all_forecasts[`matrix_row', 1] = `obs'
                  matrix all_forecasts[`matrix_row', 2] = `row'
                  matrix all_forecasts[`matrix_row', `sim' + 2] = default_prob

                  * Move to the next row in the matrix
                  local matrix_row = `matrix_row' + 1
                  }
                  }

                  * Restore dataset1 for the next observation
                  preserve
                  }

                  * Now that the matrix is populated, output it to Excel
                  putexcel set "all_default_forecasts.xlsx", sheet("Sheet1") replace
                  putexcel A1 = matrix(all_forecasts)

                  Comment


                  • #10
                    No; it's not. It's more readable than what you had before, but it's not ideal.

                    Code:
                    sysuse auto, clear 
                    
                    scatter mpg weight
                    is the kind of display you should get if you follow instructions.

                    Now in terms of your error message. There may be gunk following the code line

                    Code:
                    forvalues obs = 1/1723 {
                    either on the same line or on the next line which looks like one or more spaces followed by some end-of-line delimiter(s).

                    There are at least four things to try.

                    1. Are you editing code with MS Word or some other word processor? If so, don't do that. Word processors leave formatting characters in documents; that is how they work, but it doesn't help coding.

                    2. A good text editor will show you surprising characters in some way.

                    3. chartab from SSC will tell you about surprising characters in your do-file.

                    4. Just retype the command and the following lines in your do-file using Stata's do-file editor or any text editor as good or better.

                    Comment


                    • #11
                      Thank you. I will do as you suggested.

                      Comment


                      • #12
                        There were other issues with my code related to how I was accessing the coefficients saved in the matrix and the structure of the nested loops. These issues have now been fixed, and the code is running. Thank you.

                        Comment

                        Working...
                        X