Getting the Mode

Cyrus Muriithi

Join Date: Jan 2016
Posts: 50

14 Aug 2018, 10:20

I have a dataset as shown below

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(g2calcA2 g2calc1 g2B4 g2B5 g2B6 g2B7 g2B8 g2calc2_f)
. 1 1 1 1 1 1 1
. 1 1 1 1 1 2 1
. . 1 1 1 . . .
. 1 1 1 1 1 1 1
. 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
. 1 1 1 1 1 1 1
. 1 1 1 1 1 2 1
. 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
. 1 1 1 1 1 1 1
. 2 . . . . 1 .
. 1 1 1 1 1 1 1
. 1 1 1 1 1 2 1
1 1 1 1 1 1 1 1
. 1 1 1 1 1 1 1
. 1 2 . . . 1 1
. 1 1 1 1 1 1 1
. 1 1 1 1 1 1 1
. 1 2 . . . 1 1
end
label values g2B4 yesno
label values g2B5 yesno
label values g2B6 yesno
label values g2B8 yesno
label def yesno 1 "Yes", modify
label def yesno 2 "No", modify
label values g2B7 HIV_status
label def HIV_status 1 "HIV negative", modify

I would love to get the mode of the following variables rowwise g2B4 g2B5 g2B6

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35438
#2

14 Aug 2018, 11:02

You need to be explicit on whether missings should be ignored.

I would reshape, run egen and reshape back. Note for your example data a quick glance suggests that the row median would give the same result.
1 like
Comment
Cyrus Muriithi

Join Date: Jan 2016

Posts: 50
#3

14 Aug 2018, 11:16

Yes, the missing should be ignored
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#4

15 Aug 2018, 12:34

While Nick's suggestion is probably the right way to do this, you can always create a loop that counts the number of 1's, 2's, 3's, etc. in the variables and then pick the max. If you really only want to do it for three variables, you can do it with a few generate and replace statements:
g mode= 1 if (v1==1 & v2==1 ) | (v1==1 & v3==1 ) | (v2==1 & v3==1 )
replace mode= 2 if (v1==2 & v2==2 ) | (v1==2 & v3==2 ) | (v2==2 & v3==2 )

If you can have conditions where you have two values appearing equally frequently or missing data, you'd need to allow for that as well.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35438
#5

15 Aug 2018, 12:59

Looking at this again a little more carefully. I see that your variables seem to have only values 1 and 2 (apart from missing).

That being so, the row median is exactly what you want, with the proviso -- really a bonus -- that a row median of 1.5 tells you that 1s and 2s are equally abundant, so that either or neither is the mode, according to taste.

Otherwise put, if there is a majority of 1s the row median can't fail to be 1 and similarly with a majority of 2s. So, in each case the median is the mode, or to paraphrase McLuhan, the median is the message(*) .

(*) An old joke. Not original.

All that said, if this were my problem, I would want to use all the (non-missing) information and summarize in terms of the fraction saying Yes (1), which is just 2 MINUS the row mean. (Check: if all the answers are 1, the row mean is 1; if all the answers are 2, the row mean is 2.)

Footnote: I strongly recommend indicators that have values 0 and 1 (not e.g. 1 and 2; where did this habit spring from? My prejudice is SPSS). Not only are these in the right form for modelling, whether as responses or predictors, their means have direct interpretation and meaning.

Last edited by Nick Cox; 15 Aug 2018, 13:05.
Comment
Cyrus Muriithi

Join Date: Jan 2016

Posts: 50
#6

16 Aug 2018, 00:34

Thanks nick, i used rowmedian, considering that reshaping with huge amount of data and variables could have consumed some time.
Comment

Announcement

Getting the Mode

Comment

Comment

Comment

Comment

Comment