Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating the Dataset for a Binary Probability model with Panel Data

    Hi Statalisters,

    I am a novice user in Stata. I'm working with Stata.14 and Windows 7.

    I'm working on a Panel Data Set for all commerical banks (name2) in the U.S. for the period 1995 - 2018 (time variable). So I have data on a bank-year level. I created the ID Variable with the variables name2 and cert I already calculated four bank risk proxies: Z-Score, NPA (non-performing assets), LLP (loan loss provisions) and LLR (loan loss reserves) on a bank-year level. Here is the "Dataset 1":

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long name2 float date2 long cert int(year id) float(llr llp npa z_score)
      1 143 15448 2007  1 .011630812   .018261097  .015062087 -1.5716723
      2 143 23309 2014  2 .007704217    .00026063   .01757166 -1.6909508
      3 143 16419 2015  2 .004713984            0   .02558387  -1.714116
      4 143  4788 2016  2 .005010014   .000539208   .01931906 -1.9513158
      5 143 23406 2017  2 .005206013   .000905271   .01653177  -1.886552
      6 143 33619 2018  2 .005082208   .000445157  .012838008 -1.8675916
      7 143 27552 2014  3          0            0           0  -3.228743
      8 143  1673 2015  3          0            0           0  -3.197962
      9 143 19713 2006  4 .007627714   .000923452  .002693402 -1.8396757
     10 143  6646 2007  4 .007800257   .001282144  .001100233 -1.9295822
     11 143 26977 2008  4 .010616484   .003585204  .013777885  -1.963218
     12 143 20662 2009  4 .016729187   .016583314   .02593907 -1.9244528
     13 143 19593 2010  4 .014617286     .0030025  .017214797 -1.9180492
     14 143  9087 2011  4 .010910933   .002482072  .018298596  -1.586883
     15 143 22536 2005  6 .007931727   .007931727           0 -2.2908888
     16 143 22543 2006  6 .007908663    .00720697           0 -1.8879156
     17 143 27136 2007  6 .008464329   .000998414   .02762278 -1.6747932
     18 143  8841 2008  6 .008355642   .003057135  .019206095  -1.449307
     25 143 19795 2009  6  .02964397    .03421423   .09687161  -.8901772
     26 143 25002 2010  6 .029940894   .035797488    .1661957  -.1181851
     27 143 26668 2011  6 .021382164    .01136998   .17405207  -.9856699
     28 143 21109 2012  6  .01443208   .004534268    .1530683 -1.1408491
     29 143 14936 2013  6 .016162274   .005791808   .15281764 -1.3834363
     30 143 23064 2014  6 .012107523   .000682854   .13720109 -1.4684038
     31 143 18489 2015  6 .008448168   .001030842   .08406697 -1.4148947
     32 143 14311 2016  6 .007447635   .001197504   .06336267  -1.474553
    end
    format %tq!Qq-CCYY date2
    label values name2 name2
    label def name2 1 "1st American State Bank of Minnesota", modify
    label def name2 2 "1st Bank", modify
    label def name2 3 "1st Bank & Trust", modify
    label def name2 4 "1st Bank of Troy", modify
    label def name2 5 "1st Business Bank", modify
    label def name2 6 "1st Choice Bank", modify
    label def name2 7 "1st Constitution Bank", modify
    label def name2 8 "1st Financial Bank South Dakota", modify
    label def name2 9 "1st Floyd Bank", modify
    label def name2 10 "1st National Bank", modify
    label def name2 11 "1st National Community Bank", modify
    label def name2 12 "1st Security Bank of Laurel", modify
    label def name2 13 "1st Security Bank of West Yellowstone, Montana", modify
    label def name2 14 "1st Source Bank", modify
    label def name2 15 "1st State Bank and Trust Company of Palos Hills", modify
    label def name2 16 "1st State Bank of Mason City", modify
    label def name2 17 "1st United Bank", modify
    label def name2 18 "21st Century Bank", modify
    label def name2 25 "Abbotsford State Bank", modify
    label def name2 26 "Abrams Centre National Bank", modify
    label def name2 27 "Acadia State Bank", modify
    label def name2 28 "Acadian Bank", modify
    label def name2 29 "Ackley State Bank", modify
    label def name2 30 "Adair State Bank", modify
    label def name2 31 "Adams Bank & Trust", modify
    label def name2 32 "Adams County Bank", modify
    Now I would like to use the four calculated risk proxies (llr, llp, npa, z_score). I use a Probability Model explaining the occurrence of a bank failure (0/1) with the four proxies (lagged by one year). The dependent variable equals 1 if a bank failed and its equal 0 if a bank is active.
    For the dependent variable I have the "Failed Bank List" provided by the FDIC. With the names of the failed banks (namefb), the status that they failed (failure), the certification number (certfb) and their failure date (datefb) (I will convert the string variable to a numeric variable). The failed banks from the "Failed Bank List" are also included in my first Dataset.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(namefb failure certfb) str11 datefb
    419 1 17719 "10/13/2017 "
    145 1  1802 "5/26/2017 " 
    184 1 58302 "4/28/2017 " 
    341 1 35495 "03.03.2017" 
    359 1 19328 "1/27/2017 " 
    224 1 34951 "1/13/2017 " 
     14 1    91 "9/23/2016 " 
    440 1 11297 "8/19/2016 " 
    164 1 35312 "05.06.2016" 
    447 1  9956 "4/29/2016 " 
    303 1 20364 "03.11.2016" 
    240 1 35156 "10.02.2015" 
    411 1 35259 "10.02.2015" 
    339 1 34112 "07.10.2015" 
    138 1 57772 "05.08.2015" 
     71 1 33938 "2/13/2015 " 
    234 1 20290 "1/23/2015 " 
    178 1 17557 "1/16/2015 " 
    304 1 34983 "12/19/2014 "
    434 1   916 "10/24/2014 "
    288 1  4862 "10/17/2014 "
    137 1 58125 "7/18/2014 " 
    430 1 12483 "6/27/2014 " 
    454 1 21793 "6/20/2014 " 
    454 1 10450 "6/20/2014 " 
     28 1 57866 "5/16/2014 " 
    end
    label values namefb namefailedbank2
    label def namefailedbank2 14 "ALLIED BANK", modify
    label def namefailedbank2 28 "AZTECAMERICA BANK", modify
    label def namefailedbank2 71 "CAPITOL CITY BANK & TRUST COMPANY", modify
    label def namefailedbank2 137 "EASTSIDE COMMERCIAL BANK", modify
    label def namefailedbank2 138 "EDGEBROOK BANK", modify
    label def namefailedbank2 145 "FAYETTE COUNTY BANK", modify
    label def namefailedbank2 164 "FIRST CORNERSTONE BANK", modify
    label def namefailedbank2 178 "FIRST NATIONAL BANK OF CRESTVIEW", modify
    label def namefailedbank2 184 "FIRST NBC BANK", modify
    label def namefailedbank2 224 "HARVEST COMMUNITY BANK", modify
    label def namefailedbank2 234 "HIGHLAND COMMUNITY BANK", modify
    label def namefailedbank2 240 "HOMETOWN NATIONAL BANK", modify
    label def namefailedbank2 288 "NBRS FINANCIAL BANK", modify
    label def namefailedbank2 303 "NORTH MILWAUKEE STATE BANK", modify
    label def namefailedbank2 304 "NORTHERN STAR BANK", modify
    label def namefailedbank2 339 "PREMIER BANK", modify
    label def namefailedbank2 341 "PROFICIO BANK", modify
    label def namefailedbank2 359 "SEAWAY BANK AND TRUST COMPANY", modify
    label def namefailedbank2 411 "THE BANK OF GEORGIA", modify
    label def namefailedbank2 419 "THE FARMERS AND MERCHANTS STATE BANK OF ARGONIA", modify
    label def namefailedbank2 430 "THE FREEDOM STATE BANK", modify
    label def namefailedbank2 434 "THE NATIONAL REPUBLIC BANK OF CHICAGO", modify
    label def namefailedbank2 440 "THE WOODBURY BANKING COMPANY", modify
    label def namefailedbank2 447 "TRUST COMPANY BANK", modify
    label def namefailedbank2 454 "VALLEY BANK", modify
    label values failure failure
    label def failure 1 "FAILURE", modify
    To run my regeression, my goal is to set up my Stata Dataset with the dependent variable with its binary outcome: Failure = 1 and Active = 0

    How can I combine the two datasets?

    Thank you so much for your input!

  • #2
    The relevant command here is -merge-, which uses a so-called "key" variable to match observations from one file with another. That key variable must have the same name in both files. You would use merge to put the "failure" and perhaps other variables onto your master bank file. What variable is the key that links your two files? Perhaps this is obvious and I am overlooking it.

    Perhaps you want to use the names, but note that the "name" variables in the two files have different coding schemes.

    Comment


    • #3
      What you have here is a mess. Your bank name variables use different numerical coding schemes in the two data sets, so if you try to merge them using that variable you will get data salad. (This is most likely the result of having used -encode- separately in the two data sets.) So the first thing you have to do in each data set is -decode- the name variable to get the original string variables back. If, after you combine the data sets you want to do a fresh -encode- to create a new numeric variable as a panel identifier that's OK.

      The other thing I don't get is that the bank name, or even the combination of namefb and datefb do not uniquely identify observations in the failed bank data set: the same bank fails twice on the same date. What's that about? I'm going to assume that it's either an error or that the distinction between the two simultaneous failures of the same bank is not relevant to the issues at hand.

      You also state that you have panel data, but in the first data set you actually have only one observation per bank. Perhaps this is just not representative of your complete data set. But it raises a question: when you put the data sets together, do you want to match the bank failure observations with all observations of the same bank in dataset 1, or just with those that are in the same quarter as the failure date? I will assume the former, but if you prefer the latter change the -merge- command to -merge 1:1 bank_name year using `cleaned_failed_banks', keep(master match)-.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long(namefb failure certfb) str11 datefb
      419 1 17719 "10/13/2017 "
      145 1  1802 "5/26/2017 " 
      184 1 58302 "4/28/2017 " 
      341 1 35495 "03.03.2017" 
      359 1 19328 "1/27/2017 " 
      224 1 34951 "1/13/2017 " 
       14 1    91 "9/23/2016 " 
      440 1 11297 "8/19/2016 " 
      164 1 35312 "05.06.2016" 
      447 1  9956 "4/29/2016 " 
      303 1 20364 "03.11.2016" 
      240 1 35156 "10.02.2015" 
      411 1 35259 "10.02.2015" 
      339 1 34112 "07.10.2015" 
      138 1 57772 "05.08.2015" 
       71 1 33938 "2/13/2015 " 
      234 1 20290 "1/23/2015 " 
      178 1 17557 "1/16/2015 " 
      304 1 34983 "12/19/2014 "
      434 1   916 "10/24/2014 "
      288 1  4862 "10/17/2014 "
      137 1 58125 "7/18/2014 " 
      430 1 12483 "6/27/2014 " 
      454 1 21793 "6/20/2014 " 
      454 1 10450 "6/20/2014 " 
       28 1 57866 "5/16/2014 " 
      end
      label values namefb namefailedbank2
      label def namefailedbank2 14 "ALLIED BANK", modify
      label def namefailedbank2 28 "AZTECAMERICA BANK", modify
      label def namefailedbank2 71 "CAPITOL CITY BANK & TRUST COMPANY", modify
      label def namefailedbank2 137 "EASTSIDE COMMERCIAL BANK", modify
      label def namefailedbank2 138 "EDGEBROOK BANK", modify
      label def namefailedbank2 145 "FAYETTE COUNTY BANK", modify
      label def namefailedbank2 164 "FIRST CORNERSTONE BANK", modify
      label def namefailedbank2 178 "FIRST NATIONAL BANK OF CRESTVIEW", modify
      label def namefailedbank2 184 "FIRST NBC BANK", modify
      label def namefailedbank2 224 "HARVEST COMMUNITY BANK", modify
      label def namefailedbank2 234 "HIGHLAND COMMUNITY BANK", modify
      label def namefailedbank2 240 "HOMETOWN NATIONAL BANK", modify
      label def namefailedbank2 288 "NBRS FINANCIAL BANK", modify
      label def namefailedbank2 303 "NORTH MILWAUKEE STATE BANK", modify
      label def namefailedbank2 304 "NORTHERN STAR BANK", modify
      label def namefailedbank2 339 "PREMIER BANK", modify
      label def namefailedbank2 341 "PROFICIO BANK", modify
      label def namefailedbank2 359 "SEAWAY BANK AND TRUST COMPANY", modify
      label def namefailedbank2 411 "THE BANK OF GEORGIA", modify
      label def namefailedbank2 419 "THE FARMERS AND MERCHANTS STATE BANK OF ARGONIA", modify
      label def namefailedbank2 430 "THE FREEDOM STATE BANK", modify
      label def namefailedbank2 434 "THE NATIONAL REPUBLIC BANK OF CHICAGO", modify
      label def namefailedbank2 440 "THE WOODBURY BANKING COMPANY", modify
      label def namefailedbank2 447 "TRUST COMPANY BANK", modify
      label def namefailedbank2 454 "VALLEY BANK", modify
      label values failure failure
      label def failure 1 "FAILURE", modify
      tempfile failed_banks
      save `failed_banks'
      
      
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long name2 float date2 long cert int(year id) float(llr llp npa z_score)
        1 143 15448 2007  1 .011630812   .018261097  .015062087 -1.5716723
        2 143 23309 2014  2 .007704217    .00026063   .01757166 -1.6909508
        3 143 16419 2015  2 .004713984            0   .02558387  -1.714116
        4 143  4788 2016  2 .005010014   .000539208   .01931906 -1.9513158
        5 143 23406 2017  2 .005206013   .000905271   .01653177  -1.886552
        6 143 33619 2018  2 .005082208   .000445157  .012838008 -1.8675916
        7 143 27552 2014  3          0            0           0  -3.228743
        8 143  1673 2015  3          0            0           0  -3.197962
        9 143 19713 2006  4 .007627714   .000923452  .002693402 -1.8396757
       10 143  6646 2007  4 .007800257   .001282144  .001100233 -1.9295822
       11 143 26977 2008  4 .010616484   .003585204  .013777885  -1.963218
       12 143 20662 2009  4 .016729187   .016583314   .02593907 -1.9244528
       13 143 19593 2010  4 .014617286     .0030025  .017214797 -1.9180492
       14 143  9087 2011  4 .010910933   .002482072  .018298596  -1.586883
       15 143 22536 2005  6 .007931727   .007931727           0 -2.2908888
       16 143 22543 2006  6 .007908663    .00720697           0 -1.8879156
       17 143 27136 2007  6 .008464329   .000998414   .02762278 -1.6747932
       18 143  8841 2008  6 .008355642   .003057135  .019206095  -1.449307
       25 143 19795 2009  6  .02964397    .03421423   .09687161  -.8901772
       26 143 25002 2010  6 .029940894   .035797488    .1661957  -.1181851
       27 143 26668 2011  6 .021382164    .01136998   .17405207  -.9856699
       28 143 21109 2012  6  .01443208   .004534268    .1530683 -1.1408491
       29 143 14936 2013  6 .016162274   .005791808   .15281764 -1.3834363
       30 143 23064 2014  6 .012107523   .000682854   .13720109 -1.4684038
       31 143 18489 2015  6 .008448168   .001030842   .08406697 -1.4148947
       32 143 14311 2016  6 .007447635   .001197504   .06336267  -1.474553
      end
      format %tq!Qq-CCYY date2
      label values name2 name2
      label def name2 1 "1st American State Bank of Minnesota", modify
      label def name2 2 "1st Bank", modify
      label def name2 3 "1st Bank & Trust", modify
      label def name2 4 "1st Bank of Troy", modify
      label def name2 5 "1st Business Bank", modify
      label def name2 6 "1st Choice Bank", modify
      label def name2 7 "1st Constitution Bank", modify
      label def name2 8 "1st Financial Bank South Dakota", modify
      label def name2 9 "1st Floyd Bank", modify
      label def name2 10 "1st National Bank", modify
      label def name2 11 "1st National Community Bank", modify
      label def name2 12 "1st Security Bank of Laurel", modify
      label def name2 13 "1st Security Bank of West Yellowstone, Montana", modify
      label def name2 14 "1st Source Bank", modify
      label def name2 15 "1st State Bank and Trust Company of Palos Hills", modify
      label def name2 16 "1st State Bank of Mason City", modify
      label def name2 17 "1st United Bank", modify
      label def name2 18 "21st Century Bank", modify
      label def name2 25 "Abbotsford State Bank", modify
      label def name2 26 "Abrams Centre National Bank", modify
      label def name2 27 "Acadia State Bank", modify
      label def name2 28 "Acadian Bank", modify
      label def name2 29 "Ackley State Bank", modify
      label def name2 30 "Adair State Bank", modify
      label def name2 31 "Adams Bank & Trust", modify
      label def name2 32 "Adams County Bank", modify
      tempfile dataset1
      save `dataset1'
      
      //    PREPARE THE DATA SETS
      use `failed_banks', clear
      decode namefb, gen(bank_name)
      drop namefb
      drop certfb // IRRELEVFANT VARIABLE LEADING TO DUPLICATE RECORDS
      duplicates drop
      gen failure_date = daily(datefb, "MDY")
      assert missing(failure_date) == missing(datefb)
      format failure_date %td
      gen year = yofd(failure_date)
      tempfile cleaned_failed_banks
      save `cleaned_failed_banks'
      
      use `dataset1'
      decode name2, gen(bank_name)
      drop name2
      merge m:1 bank_name using `cleaned_failed_banks', keep(master match)
      gen byte status = (_merge == 3)
      label define status  0     "Active"    1    "Failed"
      label values status status
      
      encode bank_name, gen(bank_num)
      It also would be more helpful, in the future, when asking for help with putting data sets together, to show examples where some of the observations in the two data sets belong together and some don't. In your example, there are no actual matches.

      Comment


      • #4
        Thank you very much, Mike and Clyde!

        What you have here is a mess. Your bank name variables use different numerical coding schemes in the two data sets, so if you try to merge them using that variable you will get data salad. (This is most likely the result of having used -encode- separately in the two data sets.) So the first thing you have to do in each data set is -decode- the name variable to get the original string variables back. If, after you combine the data sets you want to do a fresh -encode- to create a new numeric variable as a panel identifier that's OK.
        Yes, it is a mess. Thank you for the very usefull hint with the encoding.

        You also state that you have panel data, but in the first data set you actually have only one observation per bank. Perhaps this is just not representative of your complete data set. But it raises a question: when you put the data sets together, do you want to match the bank failure observations with all observations of the same bank in dataset 1, or just with those that are in the same quarter as the failure date? I will assume the former, but if you prefer the latter change the -merge- command to -merge 1:1 bank_name year using `cleaned_failed_banks', keep(master match)-.
        It is just not representative of my complete data set, because there are 172 431 observations. And I would like to match the bank failure observations with all observations. I checked it in Excel, hat if the bankX failed in 2016 (so its listed in the bank failure list with FAILER in 2016) and the bankX is listed in dataset1 from 1995-2015. So I guess the bankX should be listed in the final dataset in the category FAILURE to see with my probability regression with a binary outcome, if the risk proxies could explaining the occurrence of a bank failure (0/1)

        Here are all variables I would like to use:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str72 original_name float original_year long original_cert int(calculated_year calculated_id) float(calculated_llr calculated_llp failure_year) long(failure_name1 failure_cert) str7 failure
        "1st American State Bank of Minnesota"                1995 15448 2007  1 .011630812   .018261097 2017 419 17719 "FAILURE"
        "1st Bank"                                            1995 23309 2014  2 .007704217    .00026063 2017 145  1802 "FAILURE"
        "1st Bank & Trust"                                    1995 16419 2015  2 .004713984            0 2017 184 58302 "FAILURE"
        "1st Bank of Troy"                                    1995  4788 2016  2 .005010014   .000539208 2017 341 35495 "FAILURE"
        "1st Business Bank"                                   1995 23406 2017  2 .005206013   .000905271 2017 359 19328 "FAILURE"
        "1st Choice Bank"                                     1995 33619 2018  2 .005082208   .000445157 2017 224 34951 "FAILURE"
        "1st Constitution Bank"                               1995 27552 2014  3          0            0 2016  14    91 "FAILURE"
        "1st Financial Bank South Dakota"                     1995  1673 2015  3          0            0 2016 440 11297 "FAILURE"
        "1st Floyd Bank"                                      1995 19713 2006  4 .007627714   .000923452 2016 164 35312 "FAILURE"
        "1st National Bank"                                   1995  6646 2007  4 .007800257   .001282144 2016 447  9956 "FAILURE"
        "1st National Community Bank"                         1995 26977 2008  4 .010616484   .003585204 2016 303 20364 "FAILURE"
        "1st Security Bank of Laurel"                         1995 20662 2009  4 .016729187   .016583314 2015 240 35156 "FAILURE"
        "1st Security Bank of West Yellowstone, Montana"      1995 19593 2010  4 .014617286     .0030025 2015 411 35259 "FAILURE"
        "1st Source Bank"                                     1995  9087 2011  4 .010910933   .002482072 2015 339 34112 "FAILURE"
        "1st State Bank and Trust Company of Palos Hills"     1995 22536 2005  6 .007931727   .007931727 2015 138 57772 "FAILURE"
        "1st State Bank of Mason City"                        1995 22543 2006  6 .007908663    .00720697 2015  71 33938 "FAILURE"
        "1st United Bank"                                     1995 27136 2007  6 .008464329   .000998414 2015 234 20290 "FAILURE"
        "21st Century Bank"                                   1995  8841 2008  6 .008355642   .003057135 2015 178 17557 "FAILURE"
        "Abbotsford State Bank"                               1995 19795 2009  6  .02964397    .03421423 2014 304 34983 "FAILURE"
        "Abrams Centre National Bank"                         1995 25002 2010  6 .029940894   .035797488 2014 434   916 "FAILURE"
        "Acadia State Bank"                                   1995 26668 2011  6 .021382164    .01136998 2014 288  4862 "FAILURE"
        "Acadian Bank"                                        1995 21109 2012  6  .01443208   .004534268 2014 137 58125 "FAILURE"
        "Ackley State Bank"                                   1995 14936 2013  6 .016162274   .005791808 2014 430 12483 "FAILURE"
        "Adair State Bank"                                    1995 23064 2014  6 .012107523   .000682854 2014 454 21793 "FAILURE"
        "Adams Bank & Trust"                                  1995 18489 2015  6 .008448168   .001030842 2014 454 10450 "FAILURE"
        end
        label values failure_name1 failure_name1
        label def failure_name1 14 "ALLIED BANK", modify
        label def failure_name1 71 "CAPITOL CITY BANK & TRUST COMPANY", modify
        label def failure_name1 137 "EASTSIDE COMMERCIAL BANK", modify
        label def failure_name1 138 "EDGEBROOK BANK", modify
        label def failure_name1 145 "FAYETTE COUNTY BANK", modify
        label def failure_name1 164 "FIRST CORNERSTONE BANK", modify
        label def failure_name1 178 "FIRST NATIONAL BANK OF CRESTVIEW", modify
        label def failure_name1 184 "FIRST NBC BANK", modify
        label def failure_name1 224 "HARVEST COMMUNITY BANK", modify
        label def failure_name1 234 "HIGHLAND COMMUNITY BANK", modify
        label def failure_name1 240 "HOMETOWN NATIONAL BANK", modify
        label def failure_name1 288 "NBRS FINANCIAL BANK", modify
        label def failure_name1 303 "NORTH MILWAUKEE STATE BANK", modify
        label def failure_name1 304 "NORTHERN STAR BANK", modify
        label def failure_name1 339 "PREMIER BANK", modify
        label def failure_name1 341 "PROFICIO BANK", modify
        label def failure_name1 359 "SEAWAY BANK AND TRUST COMPANY", modify
        label def failure_name1 411 "THE BANK OF GEORGIA", modify
        label def failure_name1 419 "THE FARMERS AND MERCHANTS STATE BANK OF ARGONIA", modify
        label def failure_name1 430 "THE FREEDOM STATE BANK", modify
        label def failure_name1 434 "THE NATIONAL REPUBLIC BANK OF CHICAGO", modify
        label def failure_name1 440 "THE WOODBURY BANKING COMPANY", modify
        label def failure_name1 447 "TRUST COMPANY BANK", modify
        label def failure_name1 454 "VALLEY BANK", modify

        To make the mess perfect I describe what I did before:

        1) I started with the Dataset "Original" with the variables original_name original_year and original_cert and 172431 observations. Within the original_name are duplicates, because there are several banks with the same name in several cities / states of the U.S. - so I actually have not "bank" names in my data set, rather "branch" names. So I need the variable original_cert to make the branch names unique and to create the ID variable calculated_id :

        Code:
        isid original_name original_cert original_year
        egen calculated_id = group original_name original_cert
        To calculating the risk proxies (llr, llp) I have had additional balance sheet variables like "asset" in my original dataset. Before calculating I collapsed the "asset" variable on a bank-year level:

        Code:
        collapse(mean) asset, by (original_name original_year)
        The result of the calculation (which doesnt matter in my current question) is the Data Set "Calculated" with the variables caluclated_year calculated_id calculated_llr calculated_llp

        And now I would like to merge the Dataset "Failed Bank List" with 495 observations and the variables failure_year failure_name failure_cert failure

        My goal is still to do a probability model with a binary outcome 0 "active" and 1 "failure" with the calculated risk proxies llr llp. But my problem is, that the calculated proxies are on a calculated_id calculated_year level and my Failed Bank Data on a faulure_name1 failure_cert failre_year level.

        I guess I should go back to my original data set and insert / merge the Data Set Original and Failed Bank List with the merging key variable cert before I start any calculation ?

        Thank you very much!
        Last edited by Katharina Maier; 19 Jun 2019, 11:43.

        Comment


        • #5
          Well, I have to say I'm totally confused by this now.

          My suggestion is that you clean up the problem of the discrepant -encode-ing of the bank/branch names and create new data sets that keep these as string variables. Then post a short example of each data set, including some banks (or branches?) that failed and some that did not, and then show what you want the end result to look like.

          Comment


          • #6
            Hello,

            the Bank Failure List is my first Dataset in which all failed commercial banks from 1995-2017 of the U.S. are listed.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str53 institutionname long cert float year str7 status
            "THE FARMERS AND MERCHANTS STATE BANK OF ARGONIA"       17719 2017 "FAILURE"
            "FAYETTE COUNTY BANK"                                    1802 2017 "FAILURE"
            "FIRST NBC BANK"                                        58302 2017 "FAILURE"
            "PROFICIO BANK"                                         35495 2017 "FAILURE"
            "SEAWAY BANK AND TRUST COMPANY"                         19328 2017 "FAILURE"
            "HARVEST COMMUNITY BANK"                                34951 2017 "FAILURE"
            "ALLIED BANK"                                              91 2016 "FAILURE"
            "THE WOODBURY BANKING COMPANY"                          11297 2016 "FAILURE"
            "FIRST CORNERSTONE BANK"                                35312 2016 "FAILURE"
            "TRUST COMPANY BANK"                                     9956 2016 "FAILURE"
            "NORTH MILWAUKEE STATE BANK"                            20364 2016 "FAILURE"
            "HOMETOWN NATIONAL BANK"                                35156 2015 "FAILURE"
            "THE BANK OF GEORGIA"                                   35259 2015 "FAILURE"
            "PREMIER BANK"                                          34112 2015 "FAILURE"
            "EDGEBROOK BANK"                                        57772 2015 "FAILURE"
            "CAPITOL CITY BANK & TRUST COMPANY"                     33938 2015 "FAILURE"
            "HIGHLAND COMMUNITY BANK"                               20290 2015 "FAILURE"
            "FIRST NATIONAL BANK OF CRESTVIEW"                      17557 2015 "FAILURE"
            "NORTHERN STAR BANK"                                    34983 2014 "FAILURE"
            "THE NATIONAL REPUBLIC BANK OF CHICAGO"                   916 2014 "FAILURE"
            "NBRS FINANCIAL BANK"                                    4862 2014 "FAILURE"
            "EASTSIDE COMMERCIAL BANK"                              58125 2014 "FAILURE"
            "THE FREEDOM STATE BANK"                                12483 2014 "FAILURE"
            "VALLEY BANK"                                           21793 2014 "FAILURE"
            "VALLEY BANK"                                           10450 2014 "FAILURE"
            "AZTECAMERICA BANK"                                     57866 2014 "FAILURE"
            end
            This is my second Dataset with all U.S. commercial banks included (the active and the failured banks). I use a Probability Model explaining the occurrence of a bank failure (0/1) with a binaray outcome. The dependent variable equals 1 if a bank failed and its equal 0 if a bank is active. My goal is to merge the Dataset 1 and the Dataset2 and to create / add the column "status" for the dependent variable with "Active = 0" and "Failure = 1" [the first examples of the column "status" I did bymyself for demonstration reasons].

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str72 name long cert float year str7 status double asset
            "1st American State Bank of Minnesota"                15448 1995 "Failure"   13783
            "1st Bank"                                            23309 1995 "Active"    87207
            "1st Bank & Trust"                                    16419 1995 "Active"    49277
            "1st Bank of Troy"                                     4788 1995 "Active"    14142
            "1st Business Bank"                                   23406 1995 ""         881285
            "1st Choice Bank"                                     33619 1995 ""          88965
            "1st Constitution Bank"                               27552 1995 ""          68645
            "1st Financial Bank South Dakota"                      1673 1995 ""          33405
            "1st Floyd Bank"                                      19713 1995 ""          20554
            "1st National Bank"                                    6646 1995 ""          60055
            "1st National Community Bank"                         26977 1995 ""          34036
            "1st Ozark National Bank"                             21170 1995 ""          52737
            "1st Security Bank of Laurel"                         20662 1995 ""          18494
            "1st Security Bank of West Yellowstone, Montana"      19593 1995 ""          18900
            "1st Source Bank"                                      9087 1995 ""        1541738
            "1st Source Bank of Starke County"                    14181 1995 ""          43377
            "1st State Bank and Trust Company of Palos Hills"     22536 1995 ""          50986
            "1st State Bank of Mason City"                        22543 1995 ""          14447
            "1st United Bank"                                     27136 1995 ""         257312
            "21st Century Bank"                                    8841 1995 ""          67435
            "Abbotsford State Bank"                               19795 1995 ""          89250
            "Abrams Centre National Bank"                         25002 1995 ""          56945
            "Acadia State Bank"                                   26668 1995 ""          20755
            "Acadian Bank"                                        21109 1995 ""          62900
            "Ackley State Bank"                                   14936 1995 ""          38010
            "Adair State Bank"                                    23064 1995 ""          11752
            end
            On your advice from your post #3 I tryed to merge the datasets once with the key variable bank_name:

            Code:
            merge m:1 bank_name using `cleaned_failed_banks', keep(master match)
            variable bank_name does not uniquely identify observations in the using data
            r(459);
            And on time with the key variable cert (certification number)

            Code:
            merge m:1 cert using `cleaned_failed_banks', keep(master match)
            
                Result                           # of obs.
                -----------------------------------------
                not matched                       675,229
                    from master                   675,229  (_merge==1)
                    from using                          0  (_merge==2)
            
                matched                            22,986  (_merge==3)
                -----------------------------------------
            Thank you very much for your patience.
            Last edited by Katharina Maier; 24 Jun 2019, 10:08.

            Comment


            • #7
              Each time you post I get a different impression of how these data are supposed to be paired with each other.

              You state that your data set of failed banks contains both the active and the failed banks, but the example you show has only failed banks. And your illustration of the first four observations in the second data set isn't really helpful, because the banks in those observations don't appear in the example from the failed banks data set, so nobody can really see what is going on here.

              I think we're going around in circles here, as your examples have not been helpful so far. So let's take a different approach.

              Imagine you were going to do this task by hand. Write down step by step how you would go about it. You would start with an observation from the second data set. You know its name, year, and cert. (You also know its assets, but that is presumably irrelevant here because assets do not appear in the other data set.) How would you pick out from the first data set the observation(s), if any, that correspond to this observation from the second data set? What would have to match up with what? Is it possible that one observation from the second data set might match up with more than one observation in the first?

              Comment


              • #8
                You state that your data set of failed banks contains both the active and the failed banks, but the example you show has only failed banks.
                I am sorry for being so unclear. In the Dataset1 are only the failed banks listed with their cert number and their failure_year and their status "failure".

                Dataset1:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str53 failure_name long cert float failure_year str7 status
                "1ST AMERICAN STATE BANK OF MINNESOTA"     15448 2010 "FAILURE"
                "1ST CENTENNIAL BANK"                      33025 2009 "FAILURE"
                "1ST COMMERCE BANK"                        58358 2013 "FAILURE"
                "1ST PACIFIC BANK OF CALIFORNIA"           35517 2010 "FAILURE"
                "1ST REGENTS BANK"                         57157 2013 "FAILURE"
                "ACCESS BANK"                              16476 2010 "FAILURE"
                "ADVANTA BANK CORP."                       33535 2010 "FAILURE"
                "AFFINITY BANK"                            27197 2009 "FAILURE"
                "ALABAMA TRUST BANK NATIONAL ASSOCIATION"  35224 2012 "FAILURE"
                "ALL AMERICAN BANK"                        57759 2011 "FAILURE"
                "ALLEGIANCE BANK OF NORTH AMERICA"         35078 2010 "FAILURE"
                "ALLENDALE COUNTY BANK"                    15062 2014 "FAILURE"
                "ALLIANCE BANK"                            23124 2009 "FAILURE"
                "ALLIED BANK"                                 91 2016 "FAILURE"
                "ALPHA BANK & TRUST"                       58241 2008 "FAILURE"
                "AMCORE BANK NATIONAL ASSOCIATION"          3735 2010 "FAILURE"
                "AMERICA WEST BANK"                        35461 2009 "FAILURE"
                "AMERICAN MARINE BANK"                     16730 2010 "FAILURE"
                "AMERICAN NATIONAL BANK"                   18806 2010 "FAILURE"
                "AMERICAN SOUTHERN BANK"                   57943 2009 "FAILURE"
                "AMERICAN TRUST BANK"                      57432 2011 "FAILURE"
                "AMERICAN UNITED BANK"                     57794 2009 "FAILURE"
                "AMERICANFIRST BANK"                       57724 2010 "FAILURE"
                "AMTRADE INTERNATIONAL BANK OF GEORGIA"    33784 2002 "FAILURE"
                "ANB FINANCIAL NATIONAL ASSOCIATION"       33901 2008 "FAILURE"
                "APPALACHIAN COMMUNITY BANK"               33989 2010 "FAILURE"
                end
                And your illustration of the first four observations in the second data set isn't really helpful, because the banks in those observations don't appear in the example from the failed banks data set, so nobody can really see what is going on here.
                Thank you for this hint. I sorted the Dataset1 by their bank names. The "1st American State Bank of Minnesota" with its cert number 15448 is listed in the Dataset1, because it failed in the year 2010 and it is in the Dataset2, because it's existing since the year 1995. The bank "ADVANTA BANK CORP." with its cert number 33535 is in the Dataset1, because it failed in the year 2010. And it's in the Dataset2 with the name "Advanta Financial Corporation" with the cert number 33535, because it's existing since the year 1995. [Sometimes the banks are changing their names over the period, so the cert number is the variable to identify them uniquely].

                In the Dataset2 are all banks. The banks which failed and the banks which are still existing.

                Dataset2:
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str72 name long cert float year
                "1st American State Bank of Minnesota"                15448 1995
                "1st Bank"                                            23309 1995
                "1st Bank & Trust"                                    16419 1995
                "1st Bank of Troy"                                     4788 1995
                "1st Business Bank"                                   23406 1995
                "1st Choice Bank"                                     33619 1995
                "1st Constitution Bank"                               27552 1995
                "1st Financial Bank South Dakota"                      1673 1995
                "1st Floyd Bank"                                      19713 1995
                "1st National Bank"                                    6646 1995
                "1st National Community Bank"                         26977 1995
                "1st Ozark National Bank"                             21170 1995
                "1st Security Bank of Laurel"                         20662 1995
                "1st Security Bank of West Yellowstone, Montana"      19593 1995
                "1st Source Bank"                                      9087 1995
                "1st Source Bank of Starke County"                    14181 1995
                "1st State Bank and Trust Company of Palos Hills"     22536 1995
                "1st State Bank of Mason City"                        22543 1995
                "1st United Bank"                                     27136 1995
                "21st Century Bank"                                    8841 1995
                "Abbotsford State Bank"                               19795 1995
                "Abrams Centre National Bank"                         25002 1995
                "Acadia State Bank"                                   26668 1995
                "Acadian Bank"                                        21109 1995
                "Ackley State Bank"                                   14936 1995
                "Adair State Bank"                                    23064 1995
                "Adams Bank & Trust"                                  18489 1995
                "Adams County Bank"                                   14311 1995
                "Adams County National Bank"                           7506 1995
                "Adams National Bank"                                 22658 1995
                "Adams State Bank"                                     9812 1995
                "Addison National Bank"                               26202 1995
                "Adel Banking Company"                                 5682 1995
                "Adell State Bank"                                     9531 1995
                "Admiralty Bank"                                      26973 1995
                "Admire Bank and Trust"                               10922 1995
                "Adrian Bank"                                          8251 1995
                "Adrian State Bank"                                   13435 1995
                "Adrian State Bank"                                    8822 1995
                "Advanta Financial Corporation"                       33535 1995
                "Advanta National Bank"                               33979 1995
                "Aetna Bank"                                          13693 1995
                "AFBA Industrial Bank"                                91005 1995
                "Aiken County National Bank"                          27318 1995
                "Air Academy National Bank"                           19600 1995
                "Alabama Exchange Bank"                               15696 1995
                "Alameda First National Bank"                         19158 1995
                "Alamo Bank of Texas"                                 10361 1995
                "Albany Bank and Trust Company National Association"  17230 1995
                "Albert City Savings Bank"                              230 1995
                "Alden State Bank"                                    17622 1995
                "Alden State Bank"                                    16321 1995
                "Alden State Bank"                                     9831 1995
                "Alfalfa County Bank"                                  4061 1995
                "Algonquin State Bank"                                13080 1995
                "Alice Bank of Texas"                                 10326 1995
                "Alief Alamo Bank"                                    23032 1995
                "All American Bank of Chicago"                        20153 1995
                "Allegheny Valley Bank of Pittsburgh"                  6058 1995
                end
                This my goal Dataset. I would like to merge the Dataset1 and the Dataset2. I would like to have the coulumn "status" in which the status is saying "Failure = 1" for the failed banks and "Active = 0) for the still existing, active banks. As example, there is the bank with the cert number 15448 and the bank 33535 with the status "Failure" in the merged list and the others as "Active". [ I generated this "status" example in excel].

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str72 name long cert float year str7 status
                "1st American State Bank of Minnesota"                15448 1995 "Failure"
                "1st Bank"                                            23309 1995 "Active"
                "1st Bank & Trust"                                    16419 1995 "Active"
                "1st Bank of Troy"                                     4788 1995 "Active"
                "1st Business Bank"                                   23406 1995 "Active"
                "1st Choice Bank"                                     33619 1995 "Active"
                "1st Constitution Bank"                               27552 1995 "Active"
                "1st Financial Bank South Dakota"                      1673 1995 "Active"
                "1st Floyd Bank"                                      19713 1995 "Active"
                "1st National Bank"                                    6646 1995 "Active"
                "1st National Community Bank"                         26977 1995 "Active"
                "1st Ozark National Bank"                             21170 1995 "Active"
                "1st Security Bank of Laurel"                         20662 1995 "Active"
                "1st Security Bank of West Yellowstone, Montana"      19593 1995 "Active"
                "1st Source Bank"                                      9087 1995 "Active"
                "1st Source Bank of Starke County"                    14181 1995 "Active"
                "1st State Bank and Trust Company of Palos Hills"     22536 1995 "Active"
                "1st State Bank of Mason City"                        22543 1995 "Active"
                "1st United Bank"                                     27136 1995 "Active"
                "21st Century Bank"                                    8841 1995 "Active"
                "Abbotsford State Bank"                               19795 1995 "Active"
                "Abrams Centre National Bank"                         25002 1995 "Active"
                "Acadia State Bank"                                   26668 1995 "Active"
                "Acadian Bank"                                        21109 1995 "Active"
                "Ackley State Bank"                                   14936 1995 "Active"
                "Adair State Bank"                                    23064 1995 "Active"
                "Adams Bank & Trust"                                  18489 1995 "Active"
                "Adams County Bank"                                   14311 1995 "Active"
                "Adams County National Bank"                           7506 1995 "Active"
                "Adams National Bank"                                 22658 1995 "Active"
                "Adams State Bank"                                     9812 1995 "Active"
                "Addison National Bank"                               26202 1995 "Active"
                "Adel Banking Company"                                 5682 1995 "Active"
                "Adell State Bank"                                     9531 1995 "Active"
                "Admiralty Bank"                                      26973 1995 "Active"
                "Admire Bank and Trust"                               10922 1995 "Active"
                "Adrian Bank"                                          8251 1995 "Active"
                "Adrian State Bank"                                   13435 1995 "Active"
                "Adrian State Bank"                                    8822 1995 "Active"
                "Advanta Financial Corporation"                       33535 1995 "Failure"
                "Advanta National Bank"                               33979 1995 ""      
                "Aetna Bank"                                          13693 1995 ""      
                "AFBA Industrial Bank"                                91005 1995 ""      
                "Aiken County National Bank"                          27318 1995 ""      
                "Air Academy National Bank"                           19600 1995 ""      
                "Alabama Exchange Bank"                               15696 1995 ""      
                "Alameda First National Bank"                         19158 1995 ""      
                "Alamo Bank of Texas"                                 10361 1995 ""      
                "Albany Bank and Trust Company National Association"  17230 1995 ""      
                "Albert City Savings Bank"                              230 1995 ""      
                "Alden State Bank"                                    17622 1995 ""      
                "Alden State Bank"                                    16321 1995 ""      
                "Alden State Bank"                                     9831 1995 ""      
                "Alfalfa County Bank"                                  4061 1995 ""      
                "Algonquin State Bank"                                13080 1995 ""      
                "Alice Bank of Texas"                                 10326 1995 ""      
                "Alief Alamo Bank"                                    23032 1995 ""      
                "All American Bank of Chicago"                        20153 1995 ""      
                "Allegheny Valley Bank of Pittsburgh"                  6058 1995 ""      
                end
                I hope these Datasamples are more helpful

                How would you pick out from the first data set the observation(s), if any, that correspond to this observation from the second data set? What would have to match up with what?
                All observations from the Dataset1 (Failed Banks) are in the the Dataset2 (Failed and still active banks). The key variable to merge is cert.
                I use a probability model explaining the occurrence of a bank failure (0 = active / 1 = failed).
                Last edited by Katharina Maier; 25 Jun 2019, 07:59.

                Comment


                • #9
                  Thank you for the clarifications. Based on your descriptions, this should work:

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str53 failure_name long cert float failure_year str7 status
                  "1ST AMERICAN STATE BANK OF MINNESOTA"     15448 2010 "FAILURE"
                  "1ST CENTENNIAL BANK"                      33025 2009 "FAILURE"
                  "1ST COMMERCE BANK"                        58358 2013 "FAILURE"
                  "1ST PACIFIC BANK OF CALIFORNIA"           35517 2010 "FAILURE"
                  "1ST REGENTS BANK"                         57157 2013 "FAILURE"
                  "ACCESS BANK"                              16476 2010 "FAILURE"
                  "ADVANTA BANK CORP."                       33535 2010 "FAILURE"
                  "AFFINITY BANK"                            27197 2009 "FAILURE"
                  "ALABAMA TRUST BANK NATIONAL ASSOCIATION"  35224 2012 "FAILURE"
                  "ALL AMERICAN BANK"                        57759 2011 "FAILURE"
                  "ALLEGIANCE BANK OF NORTH AMERICA"         35078 2010 "FAILURE"
                  "ALLENDALE COUNTY BANK"                    15062 2014 "FAILURE"
                  "ALLIANCE BANK"                            23124 2009 "FAILURE"
                  "ALLIED BANK"                                 91 2016 "FAILURE"
                  "ALPHA BANK & TRUST"                       58241 2008 "FAILURE"
                  "AMCORE BANK NATIONAL ASSOCIATION"          3735 2010 "FAILURE"
                  "AMERICA WEST BANK"                        35461 2009 "FAILURE"
                  "AMERICAN MARINE BANK"                     16730 2010 "FAILURE"
                  "AMERICAN NATIONAL BANK"                   18806 2010 "FAILURE"
                  "AMERICAN SOUTHERN BANK"                   57943 2009 "FAILURE"
                  "AMERICAN TRUST BANK"                      57432 2011 "FAILURE"
                  "AMERICAN UNITED BANK"                     57794 2009 "FAILURE"
                  "AMERICANFIRST BANK"                       57724 2010 "FAILURE"
                  "AMTRADE INTERNATIONAL BANK OF GEORGIA"    33784 2002 "FAILURE"
                  "ANB FINANCIAL NATIONAL ASSOCIATION"       33901 2008 "FAILURE"
                  "APPALACHIAN COMMUNITY BANK"               33989 2010 "FAILURE"
                  end
                  tempfile dataset1
                  save `dataset1'
                  
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str72 name long cert float year
                  "1st American State Bank of Minnesota"                15448 1995
                  "1st Bank"                                            23309 1995
                  "1st Bank & Trust"                                    16419 1995
                  "1st Bank of Troy"                                     4788 1995
                  "1st Business Bank"                                   23406 1995
                  "1st Choice Bank"                                     33619 1995
                  "1st Constitution Bank"                               27552 1995
                  "1st Financial Bank South Dakota"                      1673 1995
                  "1st Floyd Bank"                                      19713 1995
                  "1st National Bank"                                    6646 1995
                  "1st National Community Bank"                         26977 1995
                  "1st Ozark National Bank"                             21170 1995
                  "1st Security Bank of Laurel"                         20662 1995
                  "1st Security Bank of West Yellowstone, Montana"      19593 1995
                  "1st Source Bank"                                      9087 1995
                  "1st Source Bank of Starke County"                    14181 1995
                  "1st State Bank and Trust Company of Palos Hills"     22536 1995
                  "1st State Bank of Mason City"                        22543 1995
                  "1st United Bank"                                     27136 1995
                  "21st Century Bank"                                    8841 1995
                  "Abbotsford State Bank"                               19795 1995
                  "Abrams Centre National Bank"                         25002 1995
                  "Acadia State Bank"                                   26668 1995
                  "Acadian Bank"                                        21109 1995
                  "Ackley State Bank"                                   14936 1995
                  "Adair State Bank"                                    23064 1995
                  "Adams Bank & Trust"                                  18489 1995
                  "Adams County Bank"                                   14311 1995
                  "Adams County National Bank"                           7506 1995
                  "Adams National Bank"                                 22658 1995
                  "Adams State Bank"                                     9812 1995
                  "Addison National Bank"                               26202 1995
                  "Adel Banking Company"                                 5682 1995
                  "Adell State Bank"                                     9531 1995
                  "Admiralty Bank"                                      26973 1995
                  "Admire Bank and Trust"                               10922 1995
                  "Adrian Bank"                                          8251 1995
                  "Adrian State Bank"                                   13435 1995
                  "Adrian State Bank"                                    8822 1995
                  "Advanta Financial Corporation"                       33535 1995
                  "Advanta National Bank"                               33979 1995
                  "Aetna Bank"                                          13693 1995
                  "AFBA Industrial Bank"                                91005 1995
                  "Aiken County National Bank"                          27318 1995
                  "Air Academy National Bank"                           19600 1995
                  "Alabama Exchange Bank"                               15696 1995
                  "Alameda First National Bank"                         19158 1995
                  "Alamo Bank of Texas"                                 10361 1995
                  "Albany Bank and Trust Company National Association"  17230 1995
                  "Albert City Savings Bank"                              230 1995
                  "Alden State Bank"                                    17622 1995
                  "Alden State Bank"                                    16321 1995
                  "Alden State Bank"                                     9831 1995
                  "Alfalfa County Bank"                                  4061 1995
                  "Algonquin State Bank"                                13080 1995
                  "Alice Bank of Texas"                                 10326 1995
                  "Alief Alamo Bank"                                    23032 1995
                  "All American Bank of Chicago"                        20153 1995
                  "Allegheny Valley Bank of Pittsburgh"                  6058 1995
                  end
                  tempfile dataset2
                  save `dataset2'
                  
                  use `dataset2', clear
                  merge m:1 cert using `dataset1',   assert(match master)
                  replace status = "ACTIVE" if _merge == 1
                  assert status == "FAILURE" if _merge == 3
                  Now, you state that every bank that appears in dataset1 (the failed banks) also appears in dataset2. That is not true of your data example, but the code shown here will verify that this is the case in your full data. If you have any banks in dataset1 that do not match (by cert) anything in dataset2, you will get an error message from the -merge- command saying that. In that case -browse if _merge == 2- will show you the offending observations and then you can explore what went wrong during data set creation to lead to those problem cases.

                  This code also assumes that any single bank (identified by cert) appears only once in dataset1. In your earlier examples this has not been the case--perhaps you have since cleaned up the data to eliminate those duplications. If your dataset1 does contain any banks more than once you will, again, get an error message from the -merge- command. If that happens, run
                  Code:
                  use `dataset1', clear
                  duplicates tag cert, gen(flag)
                  browse if flag
                  to see these duplicated banks. Then, again, investigate what went wrong in the data management process that led to the same bank being listed more than once here, and fix that.

                  Comment


                  • #10
                    Code:
                    merge m:1 cert using `dataset1', assert(match master)
                    
                        Result                           # of obs.
                        -----------------------------------------
                        not matched                       675,229
                            from master                   675,229  (_merge==1)
                            from using                          0  (_merge==2)
                    
                        matched                            22,986  (_merge==3)
                        -----------------------------------------
                    
                    . replace status = "ACTIVE" if _merge == 1
                    (675,229 real changes made)
                    
                    . assert status == "FAILURE" if _merge == 3
                    Thank you so much! It worked perfectly and now I can continue my work

                    Comment

                    Working...
                    X