Basically, I use vector approach to merge string with "regexm".
That is,
1. reshape wide (one string per column)
2. keep one column
3. set observation number equal to the number of rows on another database
4. make the column all the same with one string
5. merge on _n with another database
6. use regexm to drop those not match
7. save the result
8. do the loop again
However as the database grows, this becomes inefficient.
size: dataset in memory: hundred of thousand of rows, database in using: millions of rows
Any thought on improvement?
That is,
1. reshape wide (one string per column)
2. keep one column
3. set observation number equal to the number of rows on another database
4. make the column all the same with one string
5. merge on _n with another database
6. use regexm to drop those not match
7. save the result
8. do the loop again
However as the database grows, this becomes inefficient.
size: dataset in memory: hundred of thousand of rows, database in using: millions of rows
Any thought on improvement?
Comment