Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • List first X observations that meet a condition?

    I'd like to list the first 5 observations in a dataset that match a specific condition. For example, in the auto dataset I want to see the first 5 makes that are "foreign." I would like to preserve the order of the dataset and not use a preserve/restore statement. (My real world dataset is very large so sorting, dropping many observations or listing out all observations instead of just a few examples would be cumbersome.)



    I've tried:
    Code:
    list in 1/5 if foreign==1
    But this gives me all cars that are foreign in the first 5 observations (not the first 5 observations that have a foreign car).

    Thanks and please let me know if I can clarify.

    Also, apologies that my username is not my real name -- I made this account many years ago and am not sure how to update it.
    Last edited by Dan theMan; 04 Aug 2022, 10:02.

  • #2
    Code:
    sysuse auto, clear
    gen long foreign_count = sum(foreign)
    list if foreign_count <= 5 & foreign
    Also, apologies that my username is not my real name -- I made this account many years ago and am not sure how to update it.
    Click on Contact Us in the lower right corner of this page and message the system administrator requesting the name change.

    Comment


    • #3
      See also this program from 2008: https://www.stata.com/statalist/arch.../msg00448.html

      Later Robert Picard (2014) put a listsome on SSC with related but not identical aim, so this one should probably be called something like listfirst.

      Comment


      • #4
        I rewrote that.

        Code:
        *! 2.0.0 NJC 5 August 2022
        * 1.0.0 NJC 10 April 2008
        program listfirst
            version 8
            syntax [varlist(def=all)] [if/]  [, First(int 10) Last(numlist int >0) * ]
        
            quietly {
                tempvar OK
                if `"`if'"' == "" local if 1
                gen byte `OK' = `if'
                replace `OK' = cond(`OK', sum(`OK'), 0)
        
                if "`last'" == "" {
                    local min = 1
                    local max = `first'
                }
                else {
                    su `OK', meanonly
                    local min = r(max) - `last' + 1
                    local max = r(max)
                }  
            }
            
            list `varlist' if inrange(`OK', `min', `max'), `options'
        end
        The syntax for Dan's problem would be

        Code:
        listfirst mpg if foreign, first(5) 
        As will be seen from the link in #3, the 2008 program parsed in as Dan wants. But I tried extending that so that syntax like in -10/L is accepted but that's harder given that what syntax does automatically subverts the particular intention here. So I fell back on options.

        The default of listfirst is to list the first 10 observations that apply.

        A different value of first() changes the number shown.

        (Both will show fewer observations if that's all that exist.)

        Any last() option changes the behaviour to show the last # observations that apply.
        Last edited by Nick Cox; 04 Aug 2022, 19:27.

        Comment


        • #5
          Working on it… update later.

          Comment


          • #6
            Update now at https://www.statalist.org/forums/for...6832-listfirst

            Comment

            Working...
            X