I need to extract codes for risk factors (RF) from long strings, e.g. "RF1mild RF2mild RF3mod RF4sev.." where each risk factor may have several grades of severity (e.g. mild-moderate-severe). Within the string the codes are randomly interspersed with irrelevant codes. I plan to find the codes with strpos and encode separate variables for each risk factor with the severity, e.g. 0 for absent, 1 for mild, 2 for moderate etc.
A tedious way of encoding the new variables would be to use if's, like:
gen RiskFactor1=0
replace RiskFactor1=1 if strpos(variable,"RF1mild")
replace RiskFactor1=2 if strpos(variable,"RF1mod")
replace RiskFactor1=3 if strpos(variable,"RF1sev")
...
But it seems a little primitive and - since I have several risk factors with up to 20 grades/variants and thousands of observations - very cumbersome.
Is there a more slick way to encode the new variables - picking the new codes from a list or something?
Thank you for any ideas!
Hans
A tedious way of encoding the new variables would be to use if's, like:
gen RiskFactor1=0
replace RiskFactor1=1 if strpos(variable,"RF1mild")
replace RiskFactor1=2 if strpos(variable,"RF1mod")
replace RiskFactor1=3 if strpos(variable,"RF1sev")
...
But it seems a little primitive and - since I have several risk factors with up to 20 grades/variants and thousands of observations - very cumbersome.
Is there a more slick way to encode the new variables - picking the new codes from a list or something?
Thank you for any ideas!
Hans
Comment