Dear Statalists,
I have a dataset with a (to me, at least) complex form of duplicates. The variables PRODUCTSEGMENT`i'SICCODE (i = 1, 2, 3, 4, 5) indicate if a firm has several different product segments by giving the SIC codes of the industries the firm operates in. The variables PRODUCTSEGMENT`i'SALES (i = 1, 2, 3, 4, 5) give the sales revenue the firm obtains from the corresponding segment.
However, there are duplicates, as the example below shows. Look at row 2. The same SIC code (3728) appear three times, and the corresponding sales variables have different values. Thus, I would need a code that combines the sales into one value if the same SIC code appears multiple times. For example, for row 2, I would like it to look something like this (or with new variables combining the duplicates):
The challenge is that it might be several values that I need to combine. For example, there might be two different SIC codes that both appear multiple times for a single firm.
Thank you in advance!
I have a dataset with a (to me, at least) complex form of duplicates. The variables PRODUCTSEGMENT`i'SICCODE (i = 1, 2, 3, 4, 5) indicate if a firm has several different product segments by giving the SIC codes of the industries the firm operates in. The variables PRODUCTSEGMENT`i'SALES (i = 1, 2, 3, 4, 5) give the sales revenue the firm obtains from the corresponding segment.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float NUM int(PRODUCTSEGMENT1SICCODE PRODUCTSEGMENT2SICCODE PRODUCTSEGMENT3SICCODE PRODUCTSEGMENT4SICCODE PRODUCTSEGMENT5SICCODE) long(PRODUCTSEGMENT1SALES PRODUCTSEGMENT2SALES PRODUCTSEGMENT3SALES PRODUCTSEGMENT4SALES PRODUCTSEGMENT5SALES) 1 5942 5961 5722 7375 . 1688752 661374 547190 225117 . 2 3728 3728 3728 . . 419313 357788 97154 . . 3 2834 3841 3845 2834 . 8177000 2929000 2778000 2088000 . 4 4911 4911 4911 4911 . 2729000 2466000 2444000 1688000 . 5 3577 5045 . . . 2464386 760004 . . . 6 3841 8731 . . . 22017 2879 . . . 7 5651 . . . . 1364853 . . . . 8 7363 . . . . 4373244 . . . . 9 3674 3674 . . . 3793962 97792 . . . 10 2835 8731 2834 . . 4640 700 633 . . 11 7372 7379 7372 7372 . 543419 291886 289624 104791 . 12 7372 . . . . 12887 . . . . 13 3523 3523 3523 3523 3523 1470300 472200 250300 195300 153400 14 2813 2869 3569 . . 3944000 1522800 250400 . . 15 3841 8731 2834 . . 17443 14946 9859 . . 16 4581 4513 4522 . . 1973000 86300 81600 . . 17 2297 3442 2297 . . 698187 102273 36236 . . 18 . . . . . . . . . . 19 8399 8742 7379 . . 272785 146826 67649 . . 20 3724 3724 3724 3724 . 9653000 7185000 3457000 3313000 . end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float NUM int(PRODUCTSEGMENT1SICCODE PRODUCTSEGMENT2SICCODE PRODUCTSEGMENT3SICCODE PRODUCTSEGMENT4SICCODE PRODUCTSEGMENT5SICCODE) long(PRODUCTSEGMENT1SALES PRODUCTSEGMENT2SALES PRODUCTSEGMENT3SALES PRODUCTSEGMENT4SALES PRODUCTSEGMENT5SALES) 1 5942 5961 5722 7375 . 1688752 661374 547190 225117 . 2 3728 . . . . 874255 . . . . end
Thank you in advance!
Comment