I have a list of firms that I am comparing to each other using NAIC (Industry ID - 6 digits). I want to compare the NAIC code of each firm_id to each firm_id2 on each digit of the 6-digit level NAIC code. If the digit is the same, I am saying they are similar at the "X-level."
With respect to a portion of the code below:
cross using `temp'
I get the following error - r(459) "sum of expand values exceed 2,147,483,620. The dataset may not contain more than 2,147,483,620 observations." That is because there are duplicate firms- The data looks at a firm's deal activity by year (1990-2016) so it is possible the firm is listed 26 times (See firm_id=3 is listed 4 times)... there are 373,088 unique firms but a total of 1,210,053 observations. I believe this is where the error is coming from...
Thanks in advance for your help!
With respect to a portion of the code below:
cross using `temp'
I get the following error - r(459) "sum of expand values exceed 2,147,483,620. The dataset may not contain more than 2,147,483,620 observations." That is because there are duplicate firms- The data looks at a firm's deal activity by year (1990-2016) so it is possible the firm is listed 26 times (See firm_id=3 is listed 4 times)... there are 373,088 unique firms but a total of 1,210,053 observations. I believe this is where the error is coming from...
Code:
// Example data clear input firm_id M_Acq_Naic str2 M_Acq_Reg 1 511210 "JP" 2 236116 "EU" 3 451120 "AM" 3 451120 "AM" 3 451120 "AM" 3 451120 "AM" 4 441110 "AM" 4 441110 "AM" 5 811310 "EU" 6 221119 "EU" 7 813212 "JP" end // NAIC codes should be strings, especially for current purposes. tostring M_Acq_Naic*, replace // Make a file to pair with itself. preserve tempfile temp rename * *_2 save `temp' restore // rename * *_1 cross using `temp' // the workhorse here // drop if (firm_id_1 == firm_id_2) // no self pairs // Create indicator variables for digit matches on NAIC codes forval i = 1/6 { gen PairSameDigit`i' = substr(M_Acq_Naic_1,`i',1) == substr(M_Acq_Naic_2,`i', 1) } // Drop duplicate firm pairs gen min = min(firm_id_1, firm_id_2) gen max = max(firm_id_1, firm_id_2) bysort min max: keep if _n ==1 drop min max
Comment