Hi,
I'm trying to generate a new variable named "group_id" that would group connected corporations together. In Canada, these connected corporations each have to file a schedule 23 in their T2 corporate tax return. The code below shows an example of how the data is presented in the database containing all completed schedule 23 as well as the values I would like my new variable "group_id" to take (in this example the business numbers "BN" are one digit, but in reality the business number is a nine-digit number that gives businesses its own unique identifier).
The first column is the business number of the corporation filing the schedule 23. The second column is the business number of any connected corporation (thus if the corporation is connected to more than one other corporation, we will have multiple observations for BN==1), and the third column is the one I would want to generate. As you can see in my example, the corporation with BN == 1 is connected to corporations 2 and 3, corp. 4 is connected to corp. 7, and corp. 5 and 6 are also connected.
Q: How can I get Stata to know that when it reaches the first observation of BN==2, it should assign the value of 1 to "group_id"? Basically, what code do I use to generate variable "group_id"?
Once the variable "group_id" is properly generated, I would simply use the following code to drop the variable "related_BN" and simply keep one observation for each BN:
This should give me something like the data below, which is what I want in the end:
Thanks for any advice.
I'm trying to generate a new variable named "group_id" that would group connected corporations together. In Canada, these connected corporations each have to file a schedule 23 in their T2 corporate tax return. The code below shows an example of how the data is presented in the database containing all completed schedule 23 as well as the values I would like my new variable "group_id" to take (in this example the business numbers "BN" are one digit, but in reality the business number is a nine-digit number that gives businesses its own unique identifier).
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(BN related_BN group_id) 1 2 1 1 3 1 2 1 1 2 3 1 3 1 1 3 2 1 4 7 2 5 6 3 6 5 3 7 4 2 end
Q: How can I get Stata to know that when it reaches the first observation of BN==2, it should assign the value of 1 to "group_id"? Basically, what code do I use to generate variable "group_id"?
Once the variable "group_id" is properly generated, I would simply use the following code to drop the variable "related_BN" and simply keep one observation for each BN:
Code:
drop related_BN duplicates drop BN, force
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input byte(BN group_id) 1 1 2 1 3 1 4 2 5 3 6 3 7 2 end
Comment