Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • listfirst

    Thanks as always to Kit Baum, a small utility listfirst has been posted on SSC.

    listfirst lists the first # observations, either generally or (more usefully) those satisfying an if
    condition.

    Optionally it will also list the last # observations, either generally or (more usefully) those satisfying
    an if condition.

    # defaults to 10. Optionally it may be changed, and need not be equal for first and last subsets.

    If a variable list is not specified, it defaults to all variables in the dataset. Otherwise one or more variables
    may be specified.

    Output may be limited by what exists in the dataset. In particular, no observations will be listed if none
    exist that satisfy a specified if condition. That is not considered an error.


    Many readers will be familiar with utilities in Unix or other operating systems allowing you to see the head (top or first lines) or tail (bottom or end lines) of text files. Similar features have been folded into various statistical software. In Stata most but not quite all the possibilities yield easily to list or edit when the concern is with a dataset.

    You may have special interest in what is at either end of a dataset. Perhaps more commonly the point is just to see a small sample of the dataset, especially in a large dataset. Perhaps a full list or opening edit or browse seems over the top.

    Examples use the auto dataset bundled with Stata, which has 74 observations.

    The simplest applications of listfirst are trivial.

    listfirst by itself is equivalent to list in 1/10.

    listfirst mpg by itself is equivalent to list mpg in 1/10.

    listfirst mpg, first(5) is equivalent to list mpg in 1/5.

    Such examples don't take you beyond what is already easy with list. However,

    listfirst mpg, last

    is equivalent to

    list mpg if inrange(_n, 1, 10) | inrange(_n, 65, 74)

    or more generally to

    list mpg if inrange(_n, 1, 10) | inrange(_n, _N - 9, _N).

    Either is harder to work out or to type.

    listfirst mpg, first(5) last(5)

    is similarly more challenging.

    The use of an if condition is where listfirst scores.

    listfirst mpg if foreign lists the first 10 observations satisfying the condition

    -- which is more difficult otherwise without working out where they are in the dataset, or knowing that for
    another reason. However, a useful trick is

    list mpg if foreign & sum(foreign) <= 10

    given that foreign is a (0, 1) indicator variable. That generalises to any true-or-false expression. See
    (e.g.)

    Cox, N.J. 2007. How can I identify first and last occurrences systematically in panel data?
    http://www.stata.com/support/faqs/da...t-occurrences/


    for more on such ideas.

    listfirst mpg if foreign, last

    shows the last 10 observations too.

    The history here deserves a little note.

    A command listsome was posted on Statalist on 10 April 2008 in

    https://www.stata.com/statalist/arch.../msg00448.html

    in response to a question from Malcolm Wardlaw earlier that day.
    But that command was never documented or made public beyond Statalist.

    Independently Robert Picard posted a listsome command on SSC that was first announced on 18 August 2014 in

    https://www.statalist.org/forums/for...f-observations

    Robert's command has a strong feature of offering random samples, which is not attempted here.

    This listfirst command has two small virtues, being limited, and therefore simple; and showing "last" values
    too if that is also wanted. I happily yield the command name to Robert.

    More or less the same question arises from time to time, recently https://www.statalist.org/forums/for...et-a-condition

    The version now on SSC differs slightly from that posted in that just mentioned thread.

  • #2
    Heh. Thanks for this, and thanks as always to Kit Baum.

    You wrote this response to a question I posted when I was a graduate student back in 2008! I was astounded and immediately packaged your program as listif.ado and have been using it almost constantly ever since. It has been ridiculously useful for rapidly inspecting data. At the time I was a pretty young graduate student and I didn't have the skills to properly document it and submit it to ssc.

    I'm sure you get this ten times a day, but I deeply appreciate the work you (and Kit) do for the Stata community. I was sort of language agnostic at this point in my career, and this was one of a couple of Statalist interactions that made me choose Stata and really start digging into Stata programming in a serious way.

    Comment


    • #3
      Malcolm Wardlaw Thanks very much for the gracious and illuminating thanks. You have lasting fame through a mention in the help file.

      Not ten times a day, but enough positive messages to keep me going....

      Comment


      • #4
        Incidentally, I re-wrote the help file a couple of days after the announcement, so anyone interested enough to download should see if they need to update.
        Last edited by Nick Cox; 11 Aug 2022, 16:41.

        Comment

        Working...
        X