Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Developing a Count Measure Using Five Measures with Multiple Values

    I want to develop a count variable that assesses the number of unique values across five variables: NG6, NG7, NG8, NG9, and NG10. I used tablist to get a sense for how many unique combinations I have (a total of 365!). An excerpt from that output is attached.

    For the first line, I want the new count variable to assign a value of "1" (because the same value is reported: 12 and 12, and I don't care about missingness). When I code the second line, the value would be "1" (because only the value of 12 is reported, and I don't care about missingness). When I code the third line, I want the value to be "1" because the same number (12) is being reported across NG6, NG7, and NG8. For the four line, which has values of NG6 = 1 and NG7 = 12, I want the value of my new variable to be "2" because two unique values are reported. When I code the very last line shown here, the values are 3, 12, and 8, so I want the value of my new variable to be 3 to indicate 3 unique values are reported for NG6, NG7, and NG8. Help!

    Click image for larger version

Name:	STATA Forum.PNG
Views:	2
Size:	6.5 KB
ID:	1766368

    Attached Files

  • #2
    Please review FAQ Advice #12 for details on how to use dataex to present data examples.

    Code:
    clear
    input float NG6 NG7 NG8 NG10
    12 12 . .
    12 . 13 14
    . . . 11
    1 2 3 4
    end
    
    gen long obs_no=_n
    reshape long NG, i(obs_no) j(which)
    drop if missing(NG)
    bys obs_no (NG): gen wanted=sum(NG!=NG[_n-1])
    by obs_no: replace wanted= wanted[_N]
    reshape wide NG, i(obs_no) j(which)
    sort obs_no
    Res.:

    Code:
    . l
    
         +------------------------------------------+
         | obs_no   NG6   NG7   NG8   NG10   wanted |
         |------------------------------------------|
      1. |      1    12    12     .      .        1 |
      2. |      2    12     .    13     14        3 |
      3. |      3     .     .     .     11        1 |
      4. |      4     1     2     3      4        4 |
         +------------------------------------------+

    Comment


    • #3
      Try something like
      Code:
      generate long row_id = _n
      
      frame copy default Distincts
      frame Distincts {
          quietly reshape long NG, i(row) j(col)
          quietly drop if mi(NG)
          contract row NG, freq(count)
      }
      
      frlink 1:1 row, frame(Distincts)
      frget count, from(Distincts)
      Untested.

      Comment


      • #4
        Thanks so much, Andrew! This worked beautifully.

        Originally posted by Andrew Musau View Post
        Please review FAQ Advice #12 for details on how to use dataex to present data examples.

        Code:
        clear
        input float NG6 NG7 NG8 NG10
        12 12 . .
        12 . 13 14
        . . . 11
        1 2 3 4
        end
        
        gen long obs_no=_n
        reshape long NG, i(obs_no) j(which)
        drop if missing(NG)
        bys obs_no (NG): gen wanted=sum(NG!=NG[_n-1])
        by obs_no: replace wanted= wanted[_N]
        reshape wide NG, i(obs_no) j(which)
        sort obs_no
        Res.:

        Code:
        . l
        
        +------------------------------------------+
        | obs_no NG6 NG7 NG8 NG10 wanted |
        |------------------------------------------|
        1. | 1 12 12 . . 1 |
        2. | 2 12 . 13 14 3 |
        3. | 3 . . . 11 1 |
        4. | 4 1 2 3 4 4 |
        +------------------------------------------+

        Comment


        • #5
          Calculating the number of distinct (*) values in each observation was discussed in 2009

          Code:
          SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
                  (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
                  Q1/09   SJ 9(1):137--157
                  shows how to exploit functions, egen functions, and Mata
                  for working rowwise; rowsort and rowranks are introduced
          and dedicated egen functions were added afterwards to egenmore on SSC.


          Code:
          clear
          input float NG6 NG7 NG8 NG10
          12 12 . .
          12 . 13 14
          . . . 11
          1 2 3 4
          end
          
          egen wanted = rownvals(NG*)
          
          list
          
               +---------------------------------+
               | NG6   NG7   NG8   NG10   wanted |
               |---------------------------------|
            1. |  12    12     .      .        1 |
            2. |  12     .    13     14        3 |
            3. |   .     .     .     11        1 |
            4. |   1     2     3      4        4 |
               +---------------------------------+
          (*) I (we) recommend the term distinct, not unique, as discussed in

          SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
          (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
          Q4/08 SJ 8(4):557--568
          shows how to answer questions about distinct observations
          from first principles; provides a convenience command

          See especially Section 2.

          https://journals.sagepub.com/doi/pdf...867X0800800408

          Dictionaries typically still explain the primary meaning of unique as occurring once only.

          It's true that in computing circles unique often really means distinct, but in that case distinct
          is still the better word.

          The waters were perhaps muddled by early Unix utility uniq which reduces a list of possibly repeated values so that each occurs once and once only.

          Comment

          Working...
          X