Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the maximum value in a list of variables ( max in row)

    Dear Statalist,

    I have data on each JEL codes (KW_A, KW,B, ...) declared by each individual (ref). I would like to create a new variable named max_kw which is the variable name of the maximum JEL code.
    More specifically, I would like that the variable
    max_kw = KW_A if for an individual ref KW_A is the max of (KW_A KW_B KW_C KW_D KW_E KW_F KW_G KW_H KW_I KW_J KW_K KW_L KW_M KW_N KW_O KW_P KW_Q KW_R KW_T KW_Y KW_Z)

    I have tried to implement a code provided by Nick Cox in the forum last year; but I get no results.

    My code is the following:


    unab xvars: KW_A KW_B KW_C KW_D KW_E KW_F KW_G KW_H KW_I KW_J KW_K KW_L KW_M KW_N KW_O KW_P KW_Q KW_R KW_T KW_Y KW_Z
    gen max_kw = ""
    gen max = 0
    quietly foreach x of local xvars {
    replace max_kw = "`x'" if `x' > `max'
    replace max = `x' if `x' > `max'
    }

    I get an "invalid syntax" error message.

    Could someone help me to fix the code.
    Thanks in advance

    Francisco

    NB: I do not want to do the code in long format

  • #2
    Your question is not fully specified. What will you do if there are two or more variables among KW_A through KW_Z that tie for the maximum value? How will you break the tie?

    And why are you opposed to going into long layout? Not only is this easier to do that way, so is nearly everything else in Stata. Would you be willing to go to long layout to solve this and then go back to wide?

    Comment


    • #3
      Welcome to Statalist!

      Please read the Statalist FAQ linked to at the top of this page, note the preference for the use of real names on this forum, and take the time to click "Contact Us" below and request your registration name be changed to include both your personal and family names.

      With that said, addressing the specific question of your syntax error: the following code
      Code:
      replace max_kw = "`x'" if `x' > `max'
      replace max = `x' if `x' > `max'
      will cause the error, because `max' is the syntax for the value of a local macro named max, and what you have is a variable named max. I expect the following will solve your problem.
      Code:
      replace max_kw = "`x'" if `x' > max
      replace max = `x' if `x' > max
      Last edited by William Lisowski; 16 Dec 2015, 08:54.

      Comment


      • #4
        William has helpfully explained that a local macro is being invoked which is never defined and inappropriate any way.

        Sorry, but I can't remember all the details of all the posts I made in 2014. Otherwise put, a precise URL would be helpful here.

        A search points to http://www.statalist.org/forums/foru...value-in-a-row as the likely source, where indeed the macro reference

        Code:
        `max'
        appears. It always should be

        Code:
        max
        My mistake. Sorry about that.




        Comment


        • #5
          Thank you Nick and William. The corrected code works very well.
          William, I have requested to change my registration name.
          Clyde, as a non computer scientist myself, the code proposed by Nick seems to me more intuitive than the one proposed by Maarten in long format in the following source http://www.statalist.org/forums/foru...value-in-a-row. I have no opposition in going into long layout. I just need to understand my codes !

          Francisco Serranito

          Comment


          • #6
            I don't think any of us (Clyde, Maarten, myself) would claim to be a computer scientist.

            In order to advise on whether long structure (personally I find the word "format" overloaded, but "structure" is overloaded too) or wide is better, I would need to know what JEL codes are, as I don't think I've met them.
            Last edited by Nick Cox; 17 Dec 2015, 09:53.

            Comment


            • #7
              personally I find the word "format" overloaded, but structure is overloaded too
              Yes, they are both overloaded. That's why I've taken to referring to long and wide layouts recently. I don't know if it will catch on, but it works for me.

              ...the code proposed by Nick seems to me more intuitive than the one proposed by Maarten in long format...
              Transparent code that can be understood when revisited months or years later is very important, and I would endorse going that route in nearly every circumstance. And I agree that the use of -by- with data in long layout is not, initially, so intuitive. But if you are going to be using Stata regularly and for the medium or long term, it will be well worth the trouble of getting accustomed to it. Once you are used to that way of thinking about data management, the long layout approach nearly always leads to more intuitive and more readable code than working with wide data.

              And, as Nick wisely points out, it also depends on the particular data and the particular problem which is better. So I'm glad that you got a good solution from Nick and William.

              Comment


              • #8
                Layout is a great term. The only small problem remaining is persuading the rest of the world to adopt it.

                Comment

                Working...
                X