Dear stata users and experts,
Below I will display the structure of my two datasets.
Data 1 reports how much a supplier sells to one (or multiple ) of its customer in year t.
Data 2, the master file, reports each firm’s financial data (e.g. annual sales), the master file has firm year observations, that is, customers and suppliers both show up in this data 2 . the data source I retrieve data from could have missing records for customers, so not all customers may show up in Data 2 in my real dataset.
Goal: merge data1 into data 2. Need to account for 1 customer buys from 1 or multiple supplier and 1 supplier sells to multiple customers.
Data 1:
Firms in the market buy and sell to one another, my data reports customer identifier (customerid) and how much it purchases from the corresponding supplier (supplierid) in that year.
in my real data, some customers buy from 1 suppliers , some from multiple suppliers. similarly, some suppliers have 1, some have more than 1 customers.
Saletocustomer indicates how much supplierid (e.g. 201) sells to this customer(e.g. id 100) in year 1990.
Data 2:
Id: is the firm id, it could be customers or suppliers in data 1. Sales , not the same variable as customersale in data1. This data could have observations that do not appear in data 1, e.g. id 103.
My the real data are over 10000 panel observations.
Can someone help me with merging Data 1 and 2, if I want to analyze
1. The paired data subsample, i.e. both customerid and supplierid has a match in data2
2. Customerid data, i.e. as long as customerid is matched with data2, keep it, otherwise, delete it.
Thank you,
Below I will display the structure of my two datasets.
Data 1 reports how much a supplier sells to one (or multiple ) of its customer in year t.
Data 2, the master file, reports each firm’s financial data (e.g. annual sales), the master file has firm year observations, that is, customers and suppliers both show up in this data 2 . the data source I retrieve data from could have missing records for customers, so not all customers may show up in Data 2 in my real dataset.
Goal: merge data1 into data 2. Need to account for 1 customer buys from 1 or multiple supplier and 1 supplier sells to multiple customers.
Data 1:
Firms in the market buy and sell to one another, my data reports customer identifier (customerid) and how much it purchases from the corresponding supplier (supplierid) in that year.
in my real data, some customers buy from 1 suppliers , some from multiple suppliers. similarly, some suppliers have 1, some have more than 1 customers.
Saletocustomer indicates how much supplierid (e.g. 201) sells to this customer(e.g. id 100) in year 1990.
Data 2:
Id: is the firm id, it could be customers or suppliers in data 1. Sales , not the same variable as customersale in data1. This data could have observations that do not appear in data 1, e.g. id 103.
My the real data are over 10000 panel observations.
Can someone help me with merging Data 1 and 2, if I want to analyze
1. The paired data subsample, i.e. both customerid and supplierid has a match in data2
2. Customerid data, i.e. as long as customerid is matched with data2, keep it, otherwise, delete it.
Thank you,
Code:
clear input int(year customerid supplierid) byte saletocustomer 1990 100 201 50 1990 101 201 100 1990 101 202 50 1991 101 201 30 1991 100 200 40 1992 100 202 30 end
Code:
clear input int(year id sales) 1991 200 100 1990 201 150 1991 201 200 1990 202 80 1992 202 80 1990 100 120 1991 100 110 1992 100 110 1990 101 80 1991 101 100 1990 103 110 1991 103 110 end