Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting the Mode

    I have a dataset as shown below
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(g2calcA2 g2calc1 g2B4 g2B5 g2B6 g2B7 g2B8 g2calc2_f)
    . 1 1 1 1 1 1 1
    . 1 1 1 1 1 2 1
    . . 1 1 1 . . .
    . 1 1 1 1 1 1 1
    . 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1
    . 1 1 1 1 1 1 1
    . 1 1 1 1 1 2 1
    . 1 1 1 1 1 1 1
    1 1 1 1 1 1 1 1
    . 1 1 1 1 1 1 1
    . 2 . . . . 1 .
    . 1 1 1 1 1 1 1
    . 1 1 1 1 1 2 1
    1 1 1 1 1 1 1 1
    . 1 1 1 1 1 1 1
    . 1 2 . . . 1 1
    . 1 1 1 1 1 1 1
    . 1 1 1 1 1 1 1
    . 1 2 . . . 1 1
    end
    label values g2B4 yesno
    label values g2B5 yesno
    label values g2B6 yesno
    label values g2B8 yesno
    label def yesno 1 "Yes", modify
    label def yesno 2 "No", modify
    label values g2B7 HIV_status
    label def HIV_status 1 "HIV negative", modify
    I would love to get the mode of the following variables rowwise g2B4 g2B5 g2B6


  • #2
    You need to be explicit on whether missings should be ignored.

    I would reshape, run egen and reshape back. Note for your example data a quick glance suggests that the row median would give the same result.

    Comment


    • #3
      Yes, the missing should be ignored

      Comment


      • #4
        While Nick's suggestion is probably the right way to do this, you can always create a loop that counts the number of 1's, 2's, 3's, etc. in the variables and then pick the max. If you really only want to do it for three variables, you can do it with a few generate and replace statements:
        g mode= 1 if (v1==1 & v2==1 ) | (v1==1 & v3==1 ) | (v2==1 & v3==1 )
        replace mode= 2 if (v1==2 & v2==2 ) | (v1==2 & v3==2 ) | (v2==2 & v3==2 )

        If you can have conditions where you have two values appearing equally frequently or missing data, you'd need to allow for that as well.

        Comment


        • #5
          Looking at this again a little more carefully. I see that your variables seem to have only values 1 and 2 (apart from missing).

          That being so, the row median is exactly what you want, with the proviso -- really a bonus -- that a row median of 1.5 tells you that 1s and 2s are equally abundant, so that either or neither is the mode, according to taste.

          Otherwise put, if there is a majority of 1s the row median can't fail to be 1 and similarly with a majority of 2s. So, in each case the median is the mode, or to paraphrase McLuhan, the median is the message(*) .

          (*) An old joke. Not original.

          All that said, if this were my problem, I would want to use all the (non-missing) information and summarize in terms of the fraction saying Yes (1), which is just 2 MINUS the row mean. (Check: if all the answers are 1, the row mean is 1; if all the answers are 2, the row mean is 2.)

          Footnote: I strongly recommend indicators that have values 0 and 1 (not e.g. 1 and 2; where did this habit spring from? My prejudice is SPSS). Not only are these in the right form for modelling, whether as responses or predictors, their means have direct interpretation and meaning.
          Last edited by Nick Cox; 15 Aug 2018, 13:05.

          Comment


          • #6
            Thanks nick, i used rowmedian, considering that reshaping with huge amount of data and variables could have consumed some time.

            Comment

            Working...
            X