Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to -tab while accounting for distinct participant IDs?

    Dear Statalist,

    I am trying to use the tab command that accounts for the different participant IDs. However, tab sums across all participants.

    Please see below for the dataex example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int ID float action
     1 2
     1 2
     1 2
     1 2
     2 0
     2 0
     2 0
     2 0
     3 2
     3 2
     4 2
     4 2
     5 2
     5 2
     6 1
     6 1
     7 0
     7 0
     7 0
     7 0
     8 0
     8 0
     9 1
     9 1
     9 1
     9 1
    10 2
    10 2
    end
    Now, when I tab I get

    Code:
     tab action
    Code:
         action |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |         10       35.71       35.71
              1 |          6       21.43       57.14
              2 |         12       42.86      100.00
    ------------+-----------------------------------
          Total |         28      100.00
    while I intend to obtain a table that acknowledges that participants should only be counted once.
    For example, it's true that action 0 occurred 10 times, but only by 3 different participants (2,7,8). Likewise, action 1 occurred 6 times, but only with participants 6 and 9. Lastly, action 3 happened 12 times but only with participants 1,3,4,5, and 10.


    I would like a table that reflects this accordingly:

    action Frequency Percent
    0 3 30%
    1 2 20%
    2 5 50%
    Total 10 100%

    I have tried:

    tab action, by ID (which led to the error: option by not allowed)
    tab action if ID == ID (led to the same first outcome) (I tried to state that ID has to be the same)
    distinct action (which showed the 3 distinct values of action)
    distinct action ID (which showed the distinct values of action and ID)
    bysort ID: tab action (which led to a list of all IDs) (not in the right format and not the goal since I have more than 10 distinct IDs)

    Unfortunately, I have run out of options for what else to try.

    I would be most grateful for any input, help, or comments.

    Thank you very much!

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int ID float action
     1 2
     1 2
     1 2
     1 2
     2 0
     2 0
     2 0
     2 0
     3 2
     3 2
     4 2
     4 2
     5 2
     5 2
     6 1
     6 1
     7 0
     7 0
     7 0
     7 0
     8 0
     8 0
     9 1
     9 1
     9 1
     9 1
    10 2
    10 2
    end
    
    egen tag = tag(ID) 
    
    tab action if tag 
    
        action |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              0 |          3       30.00       30.00
              1 |          2       20.00       50.00
              2 |          5       50.00      100.00
    ------------+-----------------------------------
          Total |         10      100.00

    Comment


    • #3
      Thank you very much, Nick! I appreciate your help greatly!

      Comment


      • #4
        The problem is to select just one observation for each identifier. An old trick going back at least to Statalist in the 1990s is to select either the first or the last -- either is guaranteed to work even if the number of observations for each identifier is only one.

        Code:
        bysort id : gen tag = _n == 1
        or

        Code:
        bysort id : gen tag = _n == _N
        Either produces an indicator variable that is 1 just once for each identifier and 0 otherwise. So if tag works because if tag selects precisely the same observations as if tag == 1 because the only non-zero value possible for tag is 1.

        This was wrapped up as an egen function by 1999, itself folded into official Stata in Stata 7.

        Comment


        • #5
          Thank you very much for your addition, Nick!

          Comment


          • #6
            Hi Nick Cox ,

            Please allow me to return to this question and kindly ask for your help (again):

            In my dataex example, I have added a few participants.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input int ID float action
             1 2
             1 2
             1 2
             1 2
             2 0
             2 0
             2 0
             2 0
             3 2
             3 2
             4 2
             4 2
             5 2
             5 2
             6 1
             6 1
             7 0
             7 0
             7 0
             7 0
             8 0
             8 0
             9 1
             9 1
             9 1
             9 1
            10 2
            10 2
            11 .
            12 2
            12 1
            13 0
            end

            Code:
            egen tag = tag(ID)


            Now if I do
            Code:
             tab tag if (action ==1)
            I receive the following

            Code:
                tag(ID) |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      0 |          5       71.43       71.43
                      1 |          2       28.57      100.00
            ------------+-----------------------------------
                  Total |          7      100.00
            while participants 6,9, and 12 have action ==1. But since the egen tab function only counts each individual once (as per my understanding), it counts participant ID 12 only for action 2 but not for action 1.

            In my opinion, the correct result from egen tab for action == 1 should be 3/13 = 23% (as participants 6, 9, and 12 have an action == 1). Is there any way to implement this?

            I am asking because I have a panel data set of 400 participants, each with 4 years of data—so 1600 observations. I was asked to tab some variables but have them per individual and not per observation (as tab would usually do) to avoid confusion.


            So instead of saying in 22.58% of the observations, individuals choose action 1, I am asked to say 23% (3/13) of the individuals in the sample picked action 1- bearing in mind that they first might choose action 2 and, in a later year, action 1.

            Code:
            tab action
            
                 action |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      0 |         11       35.48       35.48
                      1 |          7       22.58       58.06
                      2 |         13       41.94      100.00
            ------------+-----------------------------------
                  Total |         31      100.00

            I appreciate any help or comments.


            Last edited by Matthew Berg; 14 Mar 2024, 08:05.

            Comment


            • #7
              Using the same data example as in #6 (thanks!) I get this:

              Code:
              . egen tag = tag(ID)
              
              .. egen ever1 = max(action == 1), by(ID)
              
              . tab ever1 if tag
              
                    ever1 |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        0 |         10       76.92       76.92
                        1 |          3       23.08      100.00
              ------------+-----------------------------------
                    Total |         13      100.00
              See also https://www.stata.com/support/faqs/d...ble-recording/

              Comment


              • #8
                Thank you very much! I appreciate the help greatly!

                Comment

                Working...
                X