Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count unique groups/panels based on a certain characteristic

    Hello,

    This is my first time posting on Statalist and I am looking for some help with code. I am using Stata 17.0 on Windows.

    I have a dataset where I would like to count for each country (hv000), the total number of IDs (team2), that have a certain characteristic (whz==0).
    In the dataset
    • whz is a binary variable 1/0
    • the team IDs are not unique to each country and the number of teams per country vary
    • there are some observations where the characteristic is missing, but for my purposes I can ignore missing
    Below is a random example from the datatset. And for example I would count for "ET7a" 5 teams with my characteristic, "SN6a" would be 2 teams etc.



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str4 hv000 float(team2 whz)
    "ET7a" 1502 0
    "ET7a" 3302 0
    "ET7a" 2002 0
    "ET7a" 2602 0
    "ET7a"  402 0
    "ET7a" 1902 .
    "SN6a"  916 .
    "SN6a"  914 0
    "SN6a"  904 0
    "SN6b"   20 0
    "SN6c"   10 0
    "SN6c"   30 0
    "SN6d"   20 0
    "SN6d"   20 0
    "SN7a"  100 0
    "SN7a"  170 0
    "SN7a"  100 0
    "SN7a"  110 0
    "SN7b"  120 0
    "SN7b"  120 0
    end

  • #2
    Code:
    g whz1 = 1-whz
    bys hv000: egen new = sum(whz1)

    Comment


    • #3
      Code:
      bys hv000: egen new2 = sum(cond(whz==0,1,0))

      Comment


      • #4
        The advice George Ford gives is correct. I suggest a slightly different syntax for reasons of better coding style.

        The -egen- function -sum()- has an alias, -total()-. It is better to use the -total()- name because that will not be confused with the -gen- function -sum()- which does something different. It is unfortunate that in earlier versions of Stata, both of these functions had the same name. Some time back, Stata fixed that problem by providing the -total()- name for the -egen- function. The older -sum()- name still works, but it is deprecated, and is now undocumented. Given that it is easily confused with the -gen- function by that bane, it is best to avoid using it. Make it happen to always use -total()- with -egen- and use -sum()- only with -gen- (and only use -gen ..., sum()- when its different functionality is what you need.)

        Comment


        • #5
          Originally posted by George Ford View Post
          Code:
          bys hv000: egen new2 = sum(cond(whz==0,1,0))
          Specifying the condition directly should work as the condition evaluates to 1 if satisfied and 0 otherwise.

          Code:
          bys hv000: egen wanted= total(!whz)

          Comment


          • #6
            Thanks George Ford and Andrew Musau for the code and @Clye Schechter the explanation re: better syntax. However, this doesn't quite solve my issue.

            With the current code I will get the total number of observations in the specified country (hv000) that meet my criteria (whz==0). If I ran the below code I could get this information.
            Code:
             ta whz if hv000=="ET7a"

            Instead, what I want is a count of the number of IDs (team2), in a country (hv000) that met my criteria (whz==0). My apologies, that the random sample of data I provided is a bit misleading. My dataset is based on individuals, but I want a count by team ID (team2). So for example in country X, there are 10 teams and each team ID can have up to 1000+ observations. I only want to count a team if at least one observation meets my criteria. Does that make sense? I've thought about using collapse, but not sure how to make it work since I want to collapse based on both the team ID (team2) and country (hv000). Any thoughts?
            Last edited by Ruxy Benedict; 19 Jan 2022, 16:44.

            Comment


            • #7
              Code:
              egen flag = group(team2)
              tab whz if hv000 == "ET7a" & flag

              Comment


              • #8
                Thanks all for your help. I figured out the code I needed in the end. It's probably not the best syntax, but it works. Thanks again.
                Code:
                collapse whz, by(hv000 team2)
                bys hv000: count if whz==0

                Comment

                Working...
                X