Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate a variable that takes maximium value of two variable and names it accordingly

    Hi, so i have these two variables A and B. I want to generate a variable C which will take the largest row value between A and B, and assign the name of the relevant variable instead of the value.
    This is shown in the table below.
    A B C
    2 5 B
    1 0 A
    What command should I use?
    Last edited by MD Kamruzzaman; 12 Sep 2018, 21:52.

  • #2
    Code:
    gen C =cond(A>B,"A","B")

    Comment


    • #3
      Romalpa's command in #2 will work fine if:
      • you have no ties
      • you have no missing values
      If you have either of those, then the following might be helpful.

      In words, the -cond()- function evaluates the expression A>B. If that expression is true for a given observation, it will fill C with "A". If it is false, it will go to another -cond()- function and evaluate the expression B>A. If this is true, then it will fill C with "B". If this is false, it means that they are equal and will fill C with "neither".

      The way that Stata represents missing values is as very, very large numbers. So if there is a missing value, it will always be larger than any number. So, if you have missing values, you probably don't want them evaluated as large numbers. In this case, the -replace- command fills C with missing values if either A or B is missing.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(A B)
      2 5
      1 0
      3 3
      . 1
      4 .
      . .
      end
      
      gen str C=cond(A>B, "A", cond(B>A, "B", "neither"))
      replace C="" if mi(A) | mi(B)
      list
      Stata/MP 14.1 (64-bit x86-64)
      Revision 19 May 2016
      Win 8.1

      Comment


      • #4
        Hi thanks a lot Wilson and Akzo. I was trying to keep it less complicated and hence gave two variables. But actually I have a series of variables (A, B, A1, A2, A3....) and I want to generate a variable C which will take the largest row value between A, B, A1, A2, A3...., and assign the name of the relevant variable instead of the value happen. How do I do it in this case?

        Comment


        • #5
          See e.g. https://www.statalist.org/forums/for...les-max-in-row

          Finding the maximum is easy as it is returned by the rowmax() function of egen. Finding which variable contains the maximum is trickier as a full approach requires checking for ties.

          Comment

          Working...
          X