Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • rowfirst for string data

    Name2002 Name2003 Name2004 Name2005 (ExpectedCurrentName)
    a b c c
    a a
    x a x x
    a a
    ::::
    I have a string data like this example and want to catch the latest available name into the new var CurrentName. I have tried:

    egen CurrentName = rowfirst(Name*)

    to know that it could not be applied for string value (type mismatch notification). Please kindly advise me whether any simple command to get there, instead of a lengthy loop? Any suggestion would be highly appreciated.
    Last edited by Romalpa Akzo; 16 Oct 2017, 03:35.

  • #2
    Given your example*, you want rowlast() not rowfirst(). The loop might not be as lengthy as you think

    Code:
    generate CurrentName = ""
    forvalues j = 2005(-1)2002 {
        replace CurrentName = Name`j' if !mi(Name`j')
    }
    I am sure there are plenty of other ways to get what you want, but I am not sure whether what you want is all you want. If you plan on doing more data management and/or analysis, alternative approaches might be better suited.

    Best
    Daniel

    *Edit

    Thanks for providing an example. You can even improve it by using dataex (SSC) on your real data next time you post. Note that I am assuming above that the empty cells are indeed missing string values (i.e., ""). They could represent something else. With dataex we could be sure.
    Last edited by daniel klein; 16 Oct 2017, 03:46.

    Comment


    • #3
      Just that to underline that as you want the last non-missing value, not the first, the solution would be rowlast() if it supported strings.

      Daniel's solution is not quite right. I think you need


      Code:
      generate CurrentName = ""
      forvalues j = 2005(-1)2002 {    
          replace CurrentName = Name`j' if mi(CurrentName)  
      }
      as otherwise you may just keep overwriting with previous names.

      Comment


      • #4
        Thank you so much for your advise and your loop does help me (and yes, my target is to get the last, not the first name).

        However, my problem is that: the variables containing Name are plenty and not well systematic coding since they are collected from many sources. For example: NameFull02 NameOK12 FullName15. The only clue for me is that they are collected in order of timing. Then the last non-missing Name is what I target. Therefore I got the feeling that the similar command to rowlast and rowfirst (but for string data) is perfect for my demand.

        Please kindly advise me.

        Regards,

        Romalpa
        Last edited by Romalpa Akzo; 16 Oct 2017, 04:02.

        Comment


        • #5
          Nick is right. You would need to reverse the numlist in my code to get it to work. The more interesting thing is that egen does work. Watch:

          Code:
          clear 
          inp str1 (Name2002 Name2003 Name2004 Name2005)
          "a" "" "b" "c"
          "" "a" "" ""
          "x" "a" "x" ""
          "" "a" "" ""
          end
          
          egen str1 CurrentName = rowlast(Name*)
          gives

          Code:
          . list
          
               +------------------------------------------------------+
               | Name2002   Name2003   Name2004   Name2005   Curren~e |
               |------------------------------------------------------|
            1. |        a                     b          c          c |
            2. |                   a                                a |
            3. |        x          a          x                     x |
            4. |                   a                                a |
               +------------------------------------------------------+
          Now, I would not call this a bug - on the contrary, it is completely consistent behavior. But, it is also certainly inconvenient. The problem is that the code in rowlast() by default creates the new variable according to c(type). I would rather have it create the new variable according to the type of the variables specified.

          Best
          Daniel

          Comment


          • #6
            Thanks so much for your advise.

            Code:
             egen str1 CurrentName = rowlast(*)
            This command (advised by Daniel) is perfect for my demand.

            Comment


            • #7
              Just to confirm that both rowfirst() and rowlast() work fine with strings so long as you ask explicitly for string variable result.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str1(var1 var2)
              "a" "c"
              "b" "d"
              end
              
              . egen first = rowfirst(var*)
              type mismatch
              r(109);
              
              . egen str1 first = rowfirst(var*)
              
              . egen last = rowlast(var*)
              type mismatch
              r(109);
              
              . egen str1 last = rowlast(var*)
              
              . l
              
                   +----------------------------+
                   | var1   var2   first   last |
                   |----------------------------|
                1. |    a      c       a      c |
                2. |    b      d       b      d |
                   +----------------------------+

              Comment

              Working...
              X