Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding up variables with the same value

    Hi All,

    Seeking help on adding up variables (columns) with the same values.

    I have variables diagnosis_1, diagnosis_2, diagnosis_3, and diagnosis_4, and all the values that can be entered are the same (e.g. "1"=appendicitis, "2"=sepsis, "3"=pneumonia, etc.)
    I am trying to find a way to add up the frequency of those values across those variables together to find the total frequency of "1," "2," "3," etc. For example, the frequency of "1" across the diagnosis variables is 12, 7, 5, and 3, respecitvely, so I want to create a variable where the frequency of "1" for diagnosis_total is 12+7+5+3=27.

    I figured I could do the following but realized it would take me a long time to manually edit each value from 1 to 77.

    gen diagnosis_total=.
    replace diagnosis_total=1 if diagnosis_1 == 1 | diagnosis_2 == 1 | diagnosis_3 == 1 | diagnosis_4 == 1
    replace diagnosis_total=2 if diagnosis_2 == 2 | diagnosis_2 == 2 | diagnosis_3 == 2 | diagnosis_4 == 2
    ...
    replace diagnosis_total=77 if diagnosis_2 == 77 | diagnosis_2 == 77 | diagnosis_3 == 77 | diagnosis_4 == 77
    Any thoughts on a more efficient way to do this would be greatly appreciated.

  • #2
    Paul:
    welcome to this forum.
    Do you mean something along the following lines?:
    Code:
    . set obs 10
    
    . g Diagn_A=_n
    
    . g Diagn_B=_n in 1/5
    
    . replace Diagn_B=_n+1 if Diagn_B==.
    
    . egen Diagn_Tot=rowmax( Diagn_A Diagn_B) if Diagn_A==Diagn_B
    
    . list
    
         +------------------------------+
         | Diagn_A   Diagn_B   Diagn_~t |
         |------------------------------|
      1. |       1         1          1 |
      2. |       2         2          2 |
      3. |       3         3          3 |
      4. |       4         4          4 |
      5. |       5         5          5 |
         |------------------------------|
      6. |       6         7          . |
      7. |       7         8          . |
      8. |       8         9          . |
      9. |       9        10          . |
     10. |      10        11          . |
         +------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Starting with @Carlo Lazzaro's data example, you can get the counts in one table this way,

      Code:
      . clear 
      
      . set obs 10
      number of observations (_N) was 0, now 10
      
      . g diagnosis_1 = _n
      
      . g diagnosis_2 = _n in 1/5
      (5 missing values generated)
      
      .  ssc install tab_chi 
      
      . tabm diagnosis_* , transpose 
      
                 |       variable
          values | diagnosis  diagnosis |     Total
      -----------+----------------------+----------
               1 |         1          1 |         2 
               2 |         1          1 |         2 
               3 |         1          1 |         2 
               4 |         1          1 |         2 
               5 |         1          1 |         2 
               6 |         1          0 |         1 
               7 |         1          0 |         1 
               8 |         1          0 |         1 
               9 |         1          0 |         1 
              10 |         1          0 |         1 
      -----------+----------------------+----------
           Total |        10          5 |        15 
      
      .
      But for many purposes you need a different data structure. You may already have an identifier.

      Code:
       
      . gen long id = _n 
      
      . 
      . reshape long diagnosis_, i(id)
      (note: j = 1 2)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                       10   ->      20
      Number of variables                   3   ->       3
      j variable (2 values)                     ->   _j
      xij variables:
                      diagnosis_1 diagnosis_2   ->   diagnosis_
      -----------------------------------------------------------------------------
      
      . 
      . tab diagnosis_  
      
       diagnosis_ |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                1 |          2       13.33       13.33
                2 |          2       13.33       26.67
                3 |          2       13.33       40.00
                4 |          2       13.33       53.33
                5 |          2       13.33       66.67
                6 |          1        6.67       73.33
                7 |          1        6.67       80.00
                8 |          1        6.67       86.67
                9 |          1        6.67       93.33
               10 |          1        6.67      100.00
      ------------+-----------------------------------
            Total |         15      100.00

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Starting with @Carlo Lazzaro's data example, you can get the counts in one table this way,

        Code:
        . clear
        
        . set obs 10
        number of observations (_N) was 0, now 10
        
        . g diagnosis_1 = _n
        
        . g diagnosis_2 = _n in 1/5
        (5 missing values generated)
        
        . ssc install tab_chi
        
        . tabm diagnosis_* , transpose
        
        | variable
        values | diagnosis diagnosis | Total
        -----------+----------------------+----------
        1 | 1 1 | 2
        2 | 1 1 | 2
        3 | 1 1 | 2
        4 | 1 1 | 2
        5 | 1 1 | 2
        6 | 1 0 | 1
        7 | 1 0 | 1
        8 | 1 0 | 1
        9 | 1 0 | 1
        10 | 1 0 | 1
        -----------+----------------------+----------
        Total | 10 5 | 15
        
        .
        But for many purposes you need a different data structure. You may already have an identifier.

        Code:
        . gen long id = _n
        
        .
        . reshape long diagnosis_, i(id)
        (note: j = 1 2)
        
        Data wide -> long
        -----------------------------------------------------------------------------
        Number of obs. 10 -> 20
        Number of variables 3 -> 3
        j variable (2 values) -> _j
        xij variables:
        diagnosis_1 diagnosis_2 -> diagnosis_
        -----------------------------------------------------------------------------
        
        .
        . tab diagnosis_
        
        diagnosis_ | Freq. Percent Cum.
        ------------+-----------------------------------
        1 | 2 13.33 13.33
        2 | 2 13.33 26.67
        3 | 2 13.33 40.00
        4 | 2 13.33 53.33
        5 | 2 13.33 66.67
        6 | 1 6.67 73.33
        7 | 1 6.67 80.00
        8 | 1 6.67 86.67
        9 | 1 6.67 93.33
        10 | 1 6.67 100.00
        ------------+-----------------------------------
        Total | 15 100.00
        @Nick Cox Dear Nick,Can you help me with this problem?https://www.statalist.org/forums/for...est-of-xtabond
        Best regards.

        Raymond Zhang
        Stata 17.0,MP

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Paul:
          welcome to this forum.
          Do you mean something along the following lines?:
          Code:
          . set obs 10
          
          . g Diagn_A=_n
          
          . g Diagn_B=_n in 1/5
          
          . replace Diagn_B=_n+1 if Diagn_B==.
          
          . egen Diagn_Tot=rowmax( Diagn_A Diagn_B) if Diagn_A==Diagn_B
          
          . list
          
          +------------------------------+
          | Diagn_A Diagn_B Diagn_~t |
          |------------------------------|
          1. | 1 1 1 |
          2. | 2 2 2 |
          3. | 3 3 3 |
          4. | 4 4 4 |
          5. | 5 5 5 |
          |------------------------------|
          6. | 6 7 . |
          7. | 7 8 . |
          8. | 8 9 . |
          9. | 9 10 . |
          10. | 10 11 . |
          +------------------------------+
          
          .
          Thank you for your response. After reading your response, I realize I am mistaken in trying to create a variable to count the total diagnoses. I guess I am simply trying find a way to count up frequencies of values 1 to 77 across diagnosis_1 to diagnosis_4.

          For example:
          diagnosis_1 diagnosis_2 diagnosis_3 diagnosis_4
          1 2 1 2
          1 3 1 3
          2 3 1 1
          3 1 2 2
          Total # of diagnosis for "1" would be 2+1+3+1=7, "2" would be 1+1+1+2=5, and "3" would be 1+2+0+1=4.

          As such, I am looking to produce a frequency list or table that shows me something like:
          Diagnosis code Total Frequency
          1 7
          2 5
          3 4
          Any further thoughts would be greatly appreciated and appreciate the welcome!





          Comment


          • #6
            I think #3 is already an answer.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              I think #3 is already an answer.
              Hi Nick, sorry I missed this response. Could you elaborate on the reshape code? I am unsure how to properly use the code with the variables I have.

              Comment


              • #8
                The reshape code should work with the variables you have -- so long as you use an identifier that works. I can't tell you anything about your data that isn't explicit in #1.

                Otherwise put, it is for you to elaborate on what code you tried if it doesn't work.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  The reshape code should work with the variables you have -- so long as you use an identifier that works. I can't tell you anything about your data that isn't explicit in #1.

                  Otherwise put, it is for you to elaborate on what code you tried if it doesn't work.
                  I was confused what you meant by "diagnosis_" and realized you were creating the variable I wanted, which is diagnosis_total. It worked, much appreciated!

                  Comment

                  Working...
                  X