Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata: Count distinct values of a variable by another one?

    My little Stata Problem:
    I have a table like this:

    I want to create a variable that counts the number of different cat for each citing. This is... For the A citing there are 2 cat... the 3 and the 6. So I want another variable (dif_cat) with two 2.
    For this sample it would look something like this:

    Can you help me?
    PS: I know this has nothing to do with Stata (but it may inspire someone) with an actually programming language I would try something such as: Having a cycle doing citing column and checking if equal to the one before Having an auxiliary empty vector Having a second cycle within the first that wouldsee if the current cat was in the vector and if not put it there. When the citing changed I would count the lenght of the auxiliary matrix, reset it and do it again. The problem is that I need this in Stata code :S

  • #2
    there are several ways; but here is one: "egen newvar=count(cat), by(citing)"

    Comment


    • #3
      Rich's code counts non-missing values, regardless of whether they are distinct. A solution with similar flavour is

      Code:
      egen tag = tag(cat citing) 
      egen distinct = total(tag), by(citing)
      For a review of this territory, see http://www.stata-journal.com/sjpdf.h...iclenum=dm0042

      Comment


      • #4
        Thanks!... but both codes are generating just a simple count column of the number of different citings....
        Last edited by Rui Duarte; 01 Jul 2014, 09:39. Reason: unsolved yet.

        Comment


        • #5
          You've cross-posted this at http://stackoverflow.com/questions/2...by-another-one Please do read the FAQ Advice as was requested of you before posting to see our policy on that and other points.

          But on your question: isn't that exactly what you asked for? If not, you need to explain the difference.
          Last edited by Nick Cox; 01 Jul 2014, 09:55.

          Comment


          • #6
            Written without much thinking, just to reproduce the results in the initial "want" table:
            Code:
            clear all
            // it is the job of topic starter to write the data generation part, is it so difficult??
            input str1 citing int cat
            "A" 3
            "A" 6
            "B" 5
            "B" 2
            "B" 5
            "B" 2
            "C" 2
            "C" 4
            "C" 3
            "D" 5
            "E" 1
            "E" 1
            end
            
            // start working here
            
            preserve
            
            generate total=1
            collapse (sum) total, by(citing cat)
            drop total
            gen total=1
            
            collapse (sum)total, by(citing)
            list
            tempfile tmp
            
            sort citing
            save `"`tmp'"'
            restore
            
            sort citing
            merge citing using `"`tmp'"'
            drop _merge
            list
            Best, Sergiy Radyakin

            Comment


            • #7
              If someone drops by this old question:

              bys citing (cat): egen distinct_cat=total(cat!=cat[_n-1])

              Comment


              • #8
                #7 I flag the advice in the help for egen

                Explicit subscripting (using _N and _n), which is commonly used with generate, should not be used with egen; see subscripting.


                In practice that will work, so long as you don't mind counting distinct kinds of missing value when they occur, but that is fortuitous as well as fortunate.
                Last edited by Nick Cox; 12 Jun 2020, 07:48.

                Comment

                Working...
                X