Developing a Count Measure Using Five Measures with Multiple Values

christy erving

Join Date: Oct 2024

Posts: 4
#1

Developing a Count Measure Using Five Measures with Multiple Values

24 Oct 2024, 17:38

I want to develop a count variable that assesses the number of unique values across five variables: NG6, NG7, NG8, NG9, and NG10. I used tablist to get a sense for how many unique combinations I have (a total of 365!). An excerpt from that output is attached.

For the first line, I want the new count variable to assign a value of "1" (because the same value is reported: 12 and 12, and I don't care about missingness). When I code the second line, the value would be "1" (because only the value of 12 is reported, and I don't care about missingness). When I code the third line, I want the value to be "1" because the same number (12) is being reported across NG6, NG7, and NG8. For the four line, which has values of NG6 = 1 and NG7 = 12, I want the value of my new variable to be "2" because two unique values are reported. When I code the very last line shown here, the values are 3, 12, and 8, so I want the value of my new variable to be 3 to indicate 3 unique values are reported for NG6, NG7, and NG8. Help!

Attached Files
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

24 Oct 2024, 17:59

Please review FAQ Advice #12 for details on how to use dataex to present data examples.

Code:

clear
input float NG6 NG7 NG8 NG10
12 12 . .
12 . 13 14
. . . 11
1 2 3 4
end

gen long obs_no=_n
reshape long NG, i(obs_no) j(which)
drop if missing(NG)
bys obs_no (NG): gen wanted=sum(NG!=NG[_n-1])
by obs_no: replace wanted= wanted[_N]
reshape wide NG, i(obs_no) j(which)
sort obs_no

Res.:

Code:

. l

     +------------------------------------------+
     | obs_no   NG6   NG7   NG8   NG10   wanted |
     |------------------------------------------|
  1. |      1    12    12     .      .        1 |
  2. |      2    12     .    13     14        3 |
  3. |      3     .     .     .     11        1 |
  4. |      4     1     2     3      4        4 |
     +------------------------------------------+

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4410

24 Oct 2024, 18:02

Try something like

Code:

generate long row_id = _n

frame copy default Distincts
frame Distincts {
    quietly reshape long NG, i(row) j(col)
    quietly drop if mi(NG)
    contract row NG, freq(count)
}

frlink 1:1 row, frame(Distincts)
frget count, from(Distincts)

Untested.

Comment

christy erving

Join Date: Oct 2024
Posts: 4

24 Oct 2024, 21:04

Thanks so much, Andrew! This worked beautifully.

Originally posted by Andrew Musau View Post

Please review FAQ Advice #12 for details on how to use dataex to present data examples.

Code:

clear
input float NG6 NG7 NG8 NG10
12 12 . .
12 . 13 14
. . . 11
1 2 3 4
end

gen long obs_no=_n
reshape long NG, i(obs_no) j(which)
drop if missing(NG)
bys obs_no (NG): gen wanted=sum(NG!=NG[_n-1])
by obs_no: replace wanted= wanted[_N]
reshape wide NG, i(obs_no) j(which)
sort obs_no

Res.:

Code:

. l

+------------------------------------------+
| obs_no NG6 NG7 NG8 NG10 wanted |
|------------------------------------------|
1. | 1 12 12 . . 1 |
2. | 2 12 . 13 14 3 |
3. | 3 . . . 11 1 |
4. | 4 1 2 3 4 4 |
+------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#5

25 Oct 2024, 02:45

Calculating the number of distinct (*) values in each observation was discussed in 2009

Code:

SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox Q1/09 SJ 9(1):137--157 shows how to exploit functions, egen functions, and Mata for working rowwise; rowsort and rowranks are introduced

and dedicated egen functions were added afterwards to egenmore on SSC.

Code:

clear input float NG6 NG7 NG8 NG10 12 12 . . 12 . 13 14 . . . 11 1 2 3 4 end egen wanted = rownvals(NG*) list +---------------------------------+ | NG6 NG7 NG8 NG10 wanted | |---------------------------------| 1. | 12 12 . . 1 | 2. | 12 . 13 14 3 | 3. | . . . 11 1 | 4. | 1 2 3 4 4 | +---------------------------------+

(*) I (we) recommend the term distinct, not unique, as discussed in

SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
(help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
Q4/08 SJ 8(4):557--568
shows how to answer questions about distinct observations
from first principles; provides a convenience command

See especially Section 2.

https://journals.sagepub.com/doi/pdf...867X0800800408

Dictionaries typically still explain the primary meaning of unique as occurring once only.

It's true that in computing circles unique often really means distinct, but in that case distinct
is still the better word.

The waters were perhaps muddled by early Unix utility uniq which reduces a list of possibly repeated values so that each occurs once and once only.
Comment

Announcement

Developing a Count Measure Using Five Measures with Multiple Values

Comment

Comment

Comment

Comment