Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help! How do I use the command collapse?


    Hi, I have a question about an exercise we did in my econometrics class. I have this database of teachers and schools of about 62,000 observations, and I'm asked to use the collapse command to count how many schools and how many teachers there are, but when I collapse (count) teacher_id school_id and then list , the result is more than 60 thousand in each of the variables. Now, I know that it is not the result since for each school there must be a certain number of teachers. I've been searching the forums and I see that some people use the command to generate a gen=cons constant but I don't really know how to use it. So if anyone knows or wants to contribute something, I would greatly appreciate it.

  • #2
    The (count) operator in -collapse- does not give you the number of distinct values of a variable. It gives you the number of non-missing values of the variable (and if the same value occurs more than once, it is counted as many times as it occurs). To get a count of the number of distinct teachers using -collapse-, the code would be like this:
    Code:
    collapse put_any_numeric_variable_here, by(teacher_id)
    count
    This will work, but I would not do it this way.

    If what you want is to simply list the distinct number of teachers in the results window, I would install the -distinct- command (by Nick Cox, from SSC) and just run -distinct teacher_id-.

    If you need something a bit more complicated, such as the number of distinct teacher's in each school, then the code gets a bit more complicated:
    Code:
    by school_id (teacher_id), sort: gen wanted = sum(teacher_id != teacher_id[_n-1])
    by school_id (teacher_id): replace wanted = wanted[_N]
    Added: It is the norm in this community to use our real first and last names as our user ID, to promote collegiality and professionalism. The Forum software will not permit you to edit your user ID, but you can request the system administrator to do that. Click on Contact Us (lower right corner of the page) and send him a message to that effect. Thank you in advance for your cooperation.

    Comment


    • #3
      Thanks to Clyde Schechter for mentioning distinct. The distinct command has first author Gary Longton. The most up-to-date public version may be obtained from the Stata Journal, as below (editing out related mentions not directly pertinent to #1):


      Code:
      . search distinct, sj
      
      Search of official help files, FAQs, Examples, and Stata Journals
      
      SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q4/20   SJ 20(4):1028--1030
              sort() option has been added
      
      
      SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q3/15   SJ 15(3):899
              improved table format and display of large numbers of
              observations
      
      SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q2/12   SJ 12(2):352
              options added to restrict output to variables with a minimum
              or maximum of distinct values
      
      SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
              (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
              Q4/08   SJ 8(4):557--568
              shows how to answer questions about distinct observations
              from first principles; provides a convenience command
      The 2008 paper remains the fullest discussion of related issues that I know and includes a suggestion equivalent to Clyde's code in #2.

      All that said, my guess is that your teachers intended something quite different; you can solve these questions using collapse (repeatedly; to do that you may need to read in the original dataset twice), although I don't think that is an especially good way to do it!
      Last edited by Nick Cox; 04 Mar 2023, 03:12.

      Comment

      Working...
      X