Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count when multiple rows match on variable values

    Hello all

    I need to be able to return a count of how many rows match on multiple variable values. For instance, with the following 3 variables: ID, varA, and varB...

    sort varA

    list ID varA varB

    1. | 1 2 0 |
    2. | 2 3 0 |
    3. | 1 3 0 |

    4. | 1 5 0 |
    5. | 1 6 0 |
    |-----------------------------|
    6. | 1 7 0 |
    7. | 1 8 0 |
    8. | 2 8 0 |

    9. | 1 10 0 |
    10. | 2 11 0 |


    I can see that 4 total rows (2 different IDs) where the match is the same on both varA and varB (bolded above for clarity). Any advice on a line of code to return this value instead of having to manually count it myself?

    Thanks in advance.






  • #2
    Welcome to Statalist. Please remember to check out the FAQ (link at the top of the forum) and learn more about providing ready-to-use data example using -dataex-. I've done that below for this round.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id vara varb)
    1  2 0
    2  3 0
    1  3 0
    1  5 0
    1  6 0
    1  7 0
    1  8 0
    2  8 0
    1 10 0
    2 11 0
    end
    You may also want to clarify on a few things regarding your algorithm:

    1) It seems you're counting just varA and varB being the same, do they also have to be from a different ID to be considered as a unique count?
    2) Where would you like those count of "2"s to locate? It'd be nice to create a data showing how you envisioned the end product.

    Comment


    • #3
      Thanks.

      1) Yes, they need to be from different IDs
      2) Ideally after a command is entered. I don't need to save this within the dataset.

      Comment


      • #4
        There are many ways to do that.. if you don't need the data, then I'd perhaps use a "preserve-restore" pair to first drop any duplicates with the same id, varA and varB. And then use a duplicate report to see how many of them have extra copies. Here is an example:

        Code:
        preserve
        duplicates drop id vara varb, force
        duplicates report vara varb
        restore
        Results:
        Code:
        --------------------------------------
           Copies | Observations       Surplus
        ----------+---------------------------
                1 |            6             0
                2 |            4             2
        --------------------------------------
        6 observations have no copies (unique), and 4 observations are of two copies (aka, 2 surpluses.)

        To learn more, check out -help duplicates-.
        Last edited by Ken Chui; 14 Feb 2022, 17:40.

        Comment


        • #5
          Thanks. This is pretty clunky but it works.

          Comment

          Working...
          X