Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code for creation of counterfactual dataset

    Dear all,

    I would like to ask for your advice on creating a dataset of counterfactuals based on several criteria.

    Some background information on the data structure: The initial dataset contains investors (variable: investor_id) who invested in certain companies (variable: company_id) in different industries (variable: industry) at a specific point of time (variable: date). An investment into a specific company at a specific time is marked with an investment_id. Each line in the dataset represents a unique combination of one investor with a specific company. However, there are always multiple co-investors investing together in a specific company (several investors having the same investment_id indicates that they are co-investing together).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int investment_id str2 company int(investor_id date) str18 industry str2 investor_firm
     190 "AA"  317 15414 "Tech"               "II"
     190 "AA"  479 15414 "Tech"               "II"
     268 "BB"  169 19330 "Consumer Goods"     "JJ"
     268 "BB"  106 19330 "Consumer Goods"     "KK"
     533 "CC"  837 18067 "Consumer Goods"     "LL"
     533 "CC" 2400 18067 "Consumer Goods"     "MM"
     542 "DD" 1460 15400 "Tech"               "NN"
     542 "DD"  279 15400 "Tech"               "OO"
     747 "EE" 2020 19299 "Tech"               "PP"
     747 "EE"   45 19299 "Tech"               "QQ"
     889 "FF"  318 14728 "Financial Services" "II"
     889 "FF"  479 14728 "Financial Services" "II"
     889 "FF"  317 14728 "Financial Services" "II"
     927 "GG" 2026 20114 "Healthcare"         "XX"
     927 "GG" 4129 20114 "Healthcare"         "XX"
    1107 "HH"   72 17439 "Healthcare"         "II"
    1107 "HH"  418 17439 "Healthcare"         "NN"
    end
    format %tdnn/dd/CCYY date
    I would like to create a set of counterfactual co-investors for each actual co-investment (i.e. other investors who could have co-invested but did not) based on the following criteria:
    (1) both co-investors do not work at the same firm (variable: investor_firm)
    (2) the counterfactual co-investor must have invested in the same industry within 30 days (variable: date) of the actual co-investment
    (3) the counterfactual co-investors must not have co-invested with the other investor previously

    The structure of the final dataset would then look like this (example for investment_id 190 from above):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int investment_id str2 company int(investor_id date) str4 industry str2 investor_firm int coinvestor_id byte actual_coinvestor
    190 "AA" 317 15414 "Tech" "II"  479 1
    190 "AA" 317 15414 "Tech" "NN" 1460 0
    190 "AA" 317 15414 "Tech" "OO"  279 0
    end
    format %tdnn/dd/CCYY date

    How would you recommend to go about this?
    Conceptually, I was thinking to create a matrix of all possible investor - co-investor combinations (both actual and counterfactual) and generate a dummy that indicates if it is an actual co-investment (i.e. same investment_id) (however, this step would be computationally intensive since the overall dataset is quite large). Next I was thinking to generate a dummy variable for each of the 3 conditions above, indicating if the condition is met. Only if all 3 conditions are met, then the observations qualifies as a counterfactual.

    I would very much appreciate your ideas and help with implementing the Stata code. Many thanks in advance!
    Last edited by Diane Smith; 15 Nov 2019, 17:06.

  • #2
    Dear all, would anyone have an idea how to approach this? I would greatly appreciate any help or ideas, also regarding just parts of the problem. Thanks a lot in advance!

    Comment


    • #3
      Hi, Diane, have you solved this problem? I'm trying to replicate a paper by using the methodology in Gompers et al. (2016). I'm little bit confused about the investment date about counterfactual groups. Is this method actually trying to identify potential investors who could have participated in a co-investment with a given investor at the time of an actual investment event but did not. For example, if investor 1 co-invested with investor 2 on March 24, the method seeks to explore hypothetical scenarios where investor 1 might have co-invested with another investor, such as investor 3, on the same date (March 24). The pair investor1-investor3-March 24 is the hypothetical co-investment? Look forward to your reply. Many thanks!

      Comment

      Working...
      X