Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I often use the long neglected -matrix- facility of regular (non-Mata) Stata to recode small intergers:

    Code:
    matrix input agerangemin =(20,35,45,55,65,75)
    gen  agemin=agerangemin[1,agerange]
    which runs at about 5 million observations per second independent of the number of possible values. Once a recode involves states of the US (51 values) or countries in the world (>200 values) that can be important. -Recode- appears to be dominated by -if-then-else- both for speed, generality and simplicity of learning and remembering.

    Comment


    • #17
      I like

      Code:
      matrix input range2agemin(20,35,45,65,75)
      gen agerangemin=range2agemin[1,AgeRange]
      which uses the long neglected matrix facility of regular (non-Mata) Stata.This runs at 5 million observations per second and that speed is independent of the number of codes. Of course it is limited to small intgers for the code, but I have plenty of situations like that in my tax calculator.

      Comment


      • #18
        For more on the approach commended in #16 and #17 by Daniel Feenberg (@[email protected]) see

        https://www.stata-journal.com/articl...article=pr0054

        as referred to by daniel klein in #7.

        Comment


        • #19
          Thanks for a very helpful discussion! I did not know about the recode command. In addition to its readability, one of recode's virtues is that it can handle recodings where more than one value of x can map onto the same value of y. Here's an example from the code I'm maintaining right now.

          Code:
          generate YearRange = .
          replace YearRange = 1 if year>=1971 & year<=1974
          replace YearRange = 2 if year>=1976 & year<=1980
          replace YearRange = 3 if year>=1988 & year<=1994
          replace YearRange = 4 if year>=1999 & year<=2002
          replace YearRange = 5 if year>=2003 & year<=2006
          replace YearRange = 6 if year>=2007 & year<=2010
          replace YearRange = 7 if year>=2013 & year<=2016
          drop if YearRange == .
          Using recode, this becomes
          Code:
          recode year ///
           ( 1971 / 1974  = 1) ///
           ( 1976 / 1980  = 2) ///
           ( 1988 / 1994  = 3) ///
           ( 1999 / 2002  = 4) ///
           ( 2003 / 2006  = 5) ///
           ( 2007 / 2010  = 6) ///
           ( 2013 / 2016  = 7) ///
           (else = .), gen(YearRange)
          drop if missing(YearRange)
          Here recode has less redundancy and about half as many characters as the gen/replace approach. Like Nick, I had to consult help recode, but maybe I wouldn't after a few uses.

          I believe the matrix and cond() solutions cannot do this as neatly.

          Thanks again.
          Last edited by paulvonhippel; 03 Jan 2020, 16:05.

          Comment


          • #20
            An additional virtue of the recode statement is that it can include variable labels -- for example,
            Code:
            recode year ///
             ( 1971 / 1974  = 1 "1971-1974") ///
             ( 1976 / 1980  = 2 "1976-1980") ///
             ( 1988 / 1994  = 3 "1988-1994") ///
             ( 1999 / 2002  = 4 "1999-2002") ///
             ( 2003 / 2006  = 5 "2003-2006") ///
             ( 2007 / 2010  = 6 "2007-2010") ///
             ( 2013 / 2016  = 7 "2013-2016") ///
             (else = .), gen(YearRange)
            drop if missing(YearRange)
            However, now there is some redundancy. It would be better if I didn't have to specify the year range twice in each line (e.g., 1971/1974 and then "1971-1974").
            Bonus points if you know a way to do that (without macros).

            It might arguably be even better if I didn't have to specify the arbitrary recodes 1 2 3, but recode chose them for me by default, as encode does. Then taking advantage of the defaults, the code could look something like this:
            Code:
            recode year ///
             ( 1971 / 1974) ///
             ( 1976 / 1980) ///
             ( 1988 / 1994)  ///
             ( 1999 / 2002) ///
             ( 2003 / 2006) ///
             ( 2007 / 2010) ///
             ( 2013 / 2016) ///
             (else = .), gen(YearRange)
            drop if missing(YearRange)
            and the labels and recoded values would be assigned by default. That would certainly be compact.


            Thanks again!
            Last edited by paulvonhippel; 03 Jan 2020, 16:28.

            Comment


            • #21
              While I understand the request for an even simpler (more compact) notation in #20, I am having difficulties to imagine situations, other the specific problem described here, where such behavior would be useful or wanted. It would certainly be programmable; I am not sure whether it is worth the effort, though.

              Best
              Daniel

              Comment


              • #22
                Here is how you get the labels; requires either labmask (from labutil, SSC) or elabel (SSC).

                Code:
                recode year ///
                 ( 1971 / 1974  = 1) ///
                 ( 1976 / 1980  = 2) ///
                 ( 1988 / 1994  = 3) ///
                 ( 1999 / 2002  = 4) ///
                 ( 2003 / 2006  = 5) ///
                 ( 2007 / 2010  = 6) ///
                 ( 2013 / 2016  = 7) ///
                 (else = .), gen(YearRange)
                drop if missing(YearRange)
                
                bysort YearRange (year) : ///
                    generate label = strofreal(year[1]) + "-" + strofreal(year[_N])
                
                *ssc install labutil
                labmask YearRange , values(label)
                
                *ssc install elabel
                *elabel define YearRange:YearRange = levels(YearRange label)
                
                drop label
                Best
                Daniel

                Comment

                Working...
                X