Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a sample of pseudo deals matched to real ones by certain parameters

    Dear community,

    Help is needed with the following task. I need to create a sample of matched pseudo international merger deals based on the sample of the real deals (example attached by dataex). Description of the data set I have: ANATION is acquirer country, TNATION is target country (and their ISO codes as ANATIONCODE TNATIONCODE), RANKVAL is the deal value, YearANN is the year of the deal, TSICP is target's industry code.

    Based on this sample of "real" deals (pairs are created in a way that each row = one real deal, which is uniquely defined by ANATION, TNATION and YearANN), I need to create a matched sample of pseudo-pairs in a way that each real deal pair in a given year has up to 5 matched pseudo-pairs (the exact number would depend on the number of suitable candidates in the entire data set), which are matched based on target's TSICP (exact match) and RANKVAL (approximate match). These pseudo-pairs are then assigned 0 and real pairs assigned 1 in a new dummy variable created to distinguish real from pseudo.

    Any advice is highly appreciated! I am new to Stata, so am not even sure how to approach this task.

    Thanks in advance!


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str42(ANATION TNATION) str3(ANATIONCODE TNATIONCODE) double RANKVAL float YearANN str40(ATF_MID_DESC TTF_MID_DESC) str4 TSICP
    "Canada"         "Afghanistan"    "CAN" "AFG"  47.355 1989 "Aerospace & Defense"                    "Aerospace & Defense"                  "3721"
    "Greece"         "Albania"        "GRC" "ALB"      85 2000 "Other Financials"                       "Telecommunications Services"          "4812"
    "Canada"         "Albania"        "CAN" "ALB"  33.996 2011 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Australia"      "Albania"        "AUS" "ALB"    .916 2012 "Metals & Mining"                        "Metals & Mining"                      "1061"
    "Italy"          "Albania"        "ITA" "ALB"   6.139 2000 "Food and Beverage"                      "Food and Beverage"                    "2082"
    "Czech Republic" "Albania"        "CZE" "ALB" 144.047 2008 "Power"                                  "Power"                                "4911"
    "Greece"         "Albania"        "GRC" "ALB"     4.7 1999 "Petrochemicals"                         "Oil & Gas"                            "5541"
    "Italy"          "Albania"        "ITA" "ALB"   156.9 2006 "Banks"                                  "Banks"                                "6000"
    "United Kingdom" "Albania"        "GBR" "ALB"   1.278 1999 "Supranational"                          "Banks"                                "6000"
    "Turkey"         "Albania"        "TUR" "ALB"       6 2000 "Banks"                                  "Banks"                                "6000"
    "Italy"          "Albania"        "ITA" "ALB"    40.8 2005 "Banks"                                  "Banks"                                "6000"
    "Canada"         "Albania"        "CAN" "ALB"   4.144 2014 "Metals & Mining"                        "Metals & Mining"                      "1021"
    "Saudi Arabia"   "Albania"        "SAU" "ALB"    7.95 2009 "Brokerage"                              "Banks"                                "6000"
    "Kuwait"         "Albania"        "KWT" "ALB"    4.79 2018 "Other Financials"                       "Real Estate Management & Development" "6531"
    "Turkey"         "Albania"        "TUR" "ALB" 144.456 2005 "Other Financials"                       "Telecommunications Services"          "4813"
    "Netherlands"    "Albania"        "NLD" "ALB"  17.082 2007 "Other Financials"                       "Insurance"                            "6351"
    "Greece"         "Albania"        "GRC" "ALB"   3.009 2008 "Healthcare Providers & Services (HMOs)" "Hospitals"                            "8062"
    "Austria"        "Albania"        "AUT" "ALB"     126 2003 "Credit Institutions"                    "Banks"                                "6000"
    "Bulgaria"       "Albania"        "BGR" "ALB"   57.07 2019 "Other Financials"                       "Telecommunications Services"          "4812"
    "Italy"          "Albania"        "ITA" "ALB"   1.519 2018 "Professional Services"                  "Professional Services"                "7211"
    "Israel"         "Albania"        "ISR" "ALB"  11.614 2008 "Food and Beverage"                      "Food and Beverage"                    "2095"
    "Germany"        "Albania"        "DEU" "ALB"      80 1996 "Travel Services"                        "Metals & Mining"                      "1081"
    "Turkey"         "Albania"        "TUR" "ALB"      10 1999 "Other Financials"                       "Banks"                                "6000"
    "Poland"         "Albania"        "POL" "ALB"  10.306 2009 "IT Consulting & Services"               "IT Consulting & Services"             "7376"
    "Canada"         "Albania"        "CAN" "ALB"    .905 2008 "Other Financials"                       "Metals & Mining"                      "1021"
    "Greece"         "Albania"        "GRC" "ALB"  62.137 2009 "Telecommunications Services"            "Telecommunications Services"          "4812"
    "Turkey"         "Albania"        "TUR" "ALB" 161.117 2007 "Other Financials"                       "Telecommunications Services"          "4813"
    "Canada"         "Albania"        "CAN" "ALB"      .5 2012 "Metals & Mining"                        "Petrochemicals"                       "2911"
    "Canada"         "Albania"        "CAN" "ALB"    .159 2009 "Other Financials"                       "Metals & Mining"                      "1231"
    "Netherlands"    "Albania"        "NLD" "ALB"       1 1994 "Transportation & Infrastructure"        "Transportation & Infrastructure"      "4512"
    "United Kingdom" "Albania"        "GBR" "ALB"     2.5 1994 "Supranational"                          "Banks"                                "6000"
    "Cyprus"         "Albania"        "CYP" "ALB"   2.863 2007 "Asset Management"                       "Insurance"                            "6399"
    "British Virgin" "Albania"        "VGB" "ALB"  44.699 2016 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Canada"         "Albania"        "CAN" "ALB"    .785 2018 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Greece"         "Albania"        "GRC" "ALB"   2.287 2003 "Banks"                                  "Banks"                                "6000"
    "Italy"          "Albania"        "ITA" "ALB"   6.712 2005 "Other Financials"                       "Banks"                                "6000"
    "France"         "Algeria"        "FRA" "DZA"    .692 1991 "Other Financials"                       "Other Financials"                     "6799"
    "Canada"         "Algeria"        "CAN" "DZA"   1.035 2008 "Healthcare Equipment & Supplies"        "Healthcare Equipment & Supplies"      "5047"
    "Egypt"          "Algeria"        "EGY" "DZA"  30.621 1999 "Brokerage"                              "Pharmaceuticals"                      "2834"
    "Kuwait"         "Algeria"        "KWT" "DZA"   24.99 2014 "Insurance"                              "Insurance"                            "6321"
    "Jordan"         "Algeria"        "JOR" "DZA"    18.5 2010 "Pharmaceuticals"                        "Pharmaceuticals"                      "2834"
    "United States"  "Algeria"        "USA" "DZA"      55 2000 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Spain"          "Algeria"        "ESP" "DZA"   18.26 2006 "Food and Beverage"                      "Food and Beverage"                    "2086"
    "United States"  "Algeria"        "USA" "DZA"    13.8 2009 "Other Financials"                       "Other Real Estate"                    "6552"
    "France"         "Algeria"        "FRA" "DZA"    21.8 2007 "Building/Construction & Engineering"    "Building/Construction & Engineering"  "8711"
    "Spain"          "Algeria"        "ESP" "DZA"    9.77 2013 "Other Financials"                       "Biotechnology"                        "2836"
    "Netherlands"    "Algeria"        "NLD" "DZA"    42.3 1999 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Egypt"          "Algeria"        "EGY" "DZA"  46.853 2007 "Construction Materials"                 "Construction Materials"               "3241"
    "Italy"          "Algeria"        "ITA" "DZA"  65.879 2006 "Construction Materials"                 "Construction Materials"               "3241"
    "France"         "Algeria"        "FRA" "DZA"  68.009 2008 "Construction Materials"                 "Construction Materials"               "3241"
    "Saudi Arabia"   "Algeria"        "SAU" "DZA"      50 2000 "Food & Beverage Retailing"              "Food and Beverage"                    "2079"
    "India"          "Algeria"        "IND" "DZA"     .15 2020 "Automobiles & Components"               "Automobiles & Components"             "3711"
    "United Kingdom" "Algeria"        "GBR" "DZA"      55 2016 "Other Financials"                       "Paper & Forest Products"              "2675"
    "Utd Arab Em"    "Algeria"        "ARE" "DZA"     230 2006 "Alternative Financial Investments"      "Other Telecom"                        "4813"
    "Jordan"         "Algeria"        "JOR" "DZA"    .576 2010 "Other Financials"                       "Other Consumer Products"              "5122"
    "France"         "Algeria"        "FRA" "DZA"    4900 2019 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Utd Arab Em"    "Algeria"        "ARE" "DZA"      20 2018 "Oil & Gas"                              "Oil & Gas"                            "3533"
    "Indonesia"      "Algeria"        "IDN" "DZA" 1752.87 2012 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Egypt"          "Algeria"        "EGY" "DZA"     178 2014 "Telecommunications Services"            "Other Telecom"                        "4813"
    "Italy"          "Algeria"        "ITA" "DZA"   73.48 2006 "Construction Materials"                 "Construction Materials"               "3241"
    "Italy"          "Algeria"        "ITA" "DZA"  43.546 2000 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Egypt"          "Algeria"        "EGY" "DZA"   2.514 2003 "Construction Materials"                 "Containers & Packaging"               "2674"
    "Norway"         "Algeria"        "NOR" "DZA"     740 2003 "Oil & Gas"                              "Petrochemicals"                       "2911"
    "France"         "Algeria"        "FRA" "DZA"   14.31 2008 "Chemicals"                              "Machinery"                            "3548"
    "Egypt"          "Algeria"        "EGY" "DZA"     399 2006 "Telecommunications Services"            "Other Telecom"                        "4813"
    "Australia"      "Algeria"        "AUS" "DZA"    22.5 2000 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Utd Arab Em"    "Algeria"        "ARE" "DZA"      40 2018 "Oil & Gas"                              "Oil & Gas"                            "3533"
    "United Kingdom" "Algeria"        "GBR" "DZA"      25 2004 "Food and Beverage"                      "Food and Beverage"                    "2086"
    "Jordan"         "Algeria"        "JOR" "DZA"   1.694 2010 "Other Financials"                       "Other Consumer Products"              "5122"
    "United States"  "Algeria"        "USA" "DZA"     135 1986 "Oil & Gas"                              "Transportation & Infrastructure"      "4412"
    "Taiwan"         "American Somoa" "TWN" "ASM"  23.929 2018 "Textiles & Apparel"                     "Textiles & Apparel"                   "5137"
    "Australia"      "American Somoa" "AUS" "ASM"      10 2000 "Banks"                                  "Banks"                                "6000"
    "Taiwan"         "American Somoa" "TWN" "ASM"   5.371 2017 "Other Financials"                       "Other Consumer Products"              "5099"
    "Hong Kong"      "American Somoa" "HKG" "ASM"  242.06 2018 "Other Financials"                       "Food and Beverage"                    "2033"
    "Hong Kong"      "American Somoa" "HKG" "ASM"  15.478 2015 "Other Financials"                       "Food and Beverage"                    "2033"
    "British Virgin" "American Somoa" "VGB" "ASM"    2.74 2015 "Other Financials"                       "IT Consulting & Services"             "7379"
    "China"          "American Somoa" "CHN" "ASM"   6.782 2015 "Computers & Peripherals"                "Professional Services"                "6794"
    "South Korea"    "American Somoa" "KOR" "ASM"  25.556 2014 "Building/Construction & Engineering"    "Containers & Packaging"               "3411"
    "United States"  "Andorra"        "USA" "AND"  32.768 2016 "Alternative Financial Investments"      "Other Financials"                     "6282"
    "Spain"          "Andorra"        "ESP" "AND"   4.982 2018 "Machinery"                              "Building/Construction & Engineering"  "1799"
    "United Kingdom" "Andorra"        "GBR" "AND"   13.26 1999 "Banks"                                  "Banks"                                "6000"
    "Switzerland"    "Andorra"        "CHE" "AND"   11.82 2012 "Tobacco"                                "Tobacco"                              "2131"
    "Spain"          "Andorra"        "ESP" "AND"   6.067 2000 "Transportation & Infrastructure"        "Transportation & Infrastructure"      "4111"
    "South Africa"   "Angola"         "ZAF" "AGO"    15.3 2012 "Construction Materials"                 "Water and Waste Management"           "9511"
    "Philippines"    "Angola"         "PHL" "AGO"    8.07 2003 "Transportation & Infrastructure"        "Transportation & Infrastructure"      "4491"
    "Italy"          "Angola"         "ITA" "AGO" 663.745 2006 "Petrochemicals"                         "Oil & Gas"                            "1311"
    "Switzerland"    "Angola"         "CHE" "AGO"      30 2019 "Food and Beverage"                      "Food and Beverage"                    "2026"
    "Cayman Islands" "Angola"         "CYM" "AGO" 215.482 2006 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Australia"      "Angola"         "AUS" "AGO"    .526 2018 "Metals & Mining"                        "Motion Pictures / Audio Visual"       "7812"
    "Cayman Islands" "Angola"         "CYM" "AGO" 313.428 2006 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Canada"         "Angola"         "CAN" "AGO"   5.801 2015 "Metals & Mining"                        "Metals & Mining"                      "1499"
    "Cayman Islands" "Angola"         "CYM" "AGO"    1520 2013 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "United States"  "Angola"         "USA" "AGO"  34.366 2007 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "United Kingdom" "Angola"         "GBR" "AGO"  15.843 2001 "Food and Beverage"                      "Food and Beverage"                    "2086"
    "Portugal"       "Angola"         "PRT" "AGO"  40.734 2014 "Banks"                                  "Banks"                                "6000"
    "France"         "Angola"         "FRA" "AGO"     105 2018 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Japan"          "Angola"         "JPN" "AGO" 254.998 1986 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Austria"        "Angola"         "AUT" "AGO" 125.837 1993 "Banks"                                  "Banks"                                "6000"
    "United States"  "Angola"         "USA" "AGO" 124.009 2005 "Oil & Gas"                              "Oil & Gas"                            "1311"
    "Australia"      "Angola"         "AUS" "AGO"    .536 2006 "Oil & Gas"                              "Metals & Mining"                      "1094"
    end

  • #2
    Which deals are the real ones and which are the pseudo-deals? The data you show give no obvious indication of this.

    Also, how do you want the "approximate" match on RANKVAL to work? Do you want the 5 closest values, even if some of them are not very close? (And, if so, closest meaning by difference or by ratio?) Or do you have some maximum difference (or ratio) that is acceptable for the match? (And if so, what is that maximum?)

    Added: Thank you for using -dataex- on your very first post!
    Last edited by Clyde Schechter; 06 Apr 2022, 13:16.

    Comment


    • #3
      Dear Clyde,

      Many thanks for your comments and interest to my topic! Some clarifications as per yor questions:

      The attached deals are all real ones, so the overall idea is to take each real one row by row (one row = one real deal) and create up to 5 pseudo deals, which are matched to a real one by RANKVAL and Target industry. For example, if the 1st real deal in the example is between CAN-AFG with RANKVAL = 47.355 and target's industry SIC is 3721, the goal is to find in the same dataset (so, among real deals) up to 5 observations that have same target's industry code and similar RANKVAL and mark them as pseudo deals. The total data set has over 200,000 observations, so there should potentially be at least 1-2 observations to qualify for matching as pseudo deal..

      The RANKVAL questions: thank you fir this! I was thinking on what would be the best approximation factor. The final answer would depend on how many deals would be available for matching in the overall dataset I have. I am thinking to start with + - 15% of real deal's RANKVAL. So if pseudo deal's RANKVAL would be in the range of +-15% of the real deals' RANKVAL.

      Again, thank you so much in advance for any guidance!

      Anastassia.

      Comment


      • #4
        OK. In the example data, there are 54 distinct values of TSICP, and 84 distinct pairs of TSICP YearANN values, out of 100 observations. Since you require exact matching on TSICP and YearANN, this already means that most observations will find no match at all. On top opf that, the dispersion of RANKVAL is very wide, so that the further restriction to + or - 15% agreement on that turns out to eliminate almost all of the otherwise conceivable matches. But perhaps in your larger data set you will not encounter this same problem.

        Code:
        gen long obs_no = _n
        label var obs_no "Observation Number in Original Data"
        
        tempfile copy
        save `copy'
        
        gen lower = 0.85*RANKVAL
        gen upper = 1.15*RANKVAL
        
        rangejoin RANKVAL lower upper using `copy', by(TSICP YearANN)
        
        set seed 1234 // OR YOUR PREFERRED RANDOM NUMBER SEED
        gen double shuffle1 = runiform()
        gen double shuffle2 = runiform()
        gen byte non_self_match = (obs_no != obs_no_U) // TO ENABLE KEEPING SELF-MATCH
        by obs_no (non_self_match shuffle1 shuffle2), sort: keep if _n <= 6 // 6 == 5 + 1
        
        rename obs_no group_num
        ds group_num *_U, not
        drop `r(varlist)'
        rename *_U *
        order obs_no, after(group_num)
        -rangejoin- is written by Robert Picard and is available from SSC. To use it you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

        At the end of this code, there will be a variable group_num which corresponds to the obs_no of the "real" deal in the original data. The same value of group_num will appear with up to 5 matched "pseudo-deals" picked from the original data according to your criteria. Any original real deal that could not find any appropriate matches will appear with its group_num as a singleton.

        Comment


        • #5
          Dear Clyde,

          Thank you so much for working on it and for additional explanations! The code worked perfectly, and when I used it on an actual dataset, it returns enough observations to be marked as pseudo-deals for my research. It is also very clear, so I can make necessary adjustments for any future projects requiring similar approach.

          Sincerely appreciate your help!

          Best,
          Anastassia.

          Comment

          Working...
          X