Combining multiple binary variables into a single categorical variable

Eileen Choi

Join Date: Jul 2022

Posts: 1
#1

Combining multiple binary variables into a single categorical variable

31 Jul 2022, 01:26

Hi

I'm trying to create a new variable that combines the frequency of binary outputs of different variables.

So basically I have 12 variables of chronic diseases, each containing binary reponses of the participants e.g. 1=yes 2=no ('yes' means having the disease, 'no' means not having the disease)
I want to generate a new variable- 'number of chronic disease' into 3 categories: 0, 1, and ≥ 2 by the number of these 12 individual variables that have values of '1'

For example, let's say the binary variables are: 'hypertension', 'diabetes', 'heart disease', 'asthma', 'cancer', 'depression', and 'stroke'.
Each of them has dichotomous outputs: 1 = 'diagnosed', 2 = 'not diagnosed'
Then, I need the number of diseases present in each participant and make it into three categories- 'none', 'one', and 'more than 2' (i.e. 0,1,2+)

I have no idea how to create a new variable according the outcome frequency of different variables. Would appreciate if someone could help me out with this method.

Thanks
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17673

31 Jul 2022, 02:54

Eileen:
welcome to this forum.
Why not considering something along the following lines?

Code:

. set obs 2
Number of observations (_N) was 0, now 2.

. g id=_n

. g disease_A=1 in 1


. replace disease_A=1 in 2


. g disease_B=1 in 1


. replace disease_B=0 in 2


. egen wanted=rowtotal(disease_*)


. label define wanted 0 "None" 1 "One disease" 2 "Two or more diseases"

. label val wanted wanted

. list

     +-------------------------------------------------+
     | id   diseas~A   diseas~B                 wanted |
     |-------------------------------------------------|
  1. |  1          1          1   Two or more diseases |
  2. |  2          1          0            One disease |
     +-------------------------------------------------+

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35432

31 Jul 2022, 04:16

The explanation in #1 is clear but a data example would have been even better. Please note 12.2 at https://www.statalist.org/forums/help#stata

Indicators coded 1 or 2 are not nearly so useful as those coded 0 or 1 (or missing), as Carlo Lazzaro is hinting. See for example https://www.stata-journal.com/articl...article=dm0099 for some detailed discussion.

Still, we can just loop over such variables and count instances of 1. egen offers convenient implementations of such loops.

Code:

clear 
set obs 5 
set seed 2803 
foreach v in hypertension diabetes heart_disease asthma cancer depression stroke {
    gen `v' = cond(runiform() < 0.2, 1, 2)
}

* start here   
egen count = anycount(hyper-stroke), value(1)

gen wanted = min(count, 2)

label def wanted 2 "2 or more"
label val wanted wanted 

list 

     +------------------------------------------------------------------------------------------+
     | hypert~n   diabetes   heart_~e   asthma   cancer   depres~n   stroke   count      wanted |
     |------------------------------------------------------------------------------------------|
  1. |        2          1          2        2        2          2        2       1           1 |
  2. |        2          1          2        1        2          2        2       2   2 or more |
  3. |        2          2          1        1        2          2        2       2   2 or more |
  4. |        1          2          1        2        1          2        1       4   2 or more |
  5. |        2          2          1        2        2          1        2       2   2 or more |
     +------------------------------------------------------------------------------------------+

Announcement