Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count total number of non-missing strings in a row

    Hi there!

    I am working on extracting information from text data and was wondering if there is a way to count the number of non-missing variables in each row. The following is what my dataset looks like. I would like a variable that shows how many variables are non-empty by row (e.g. 2 in row 1, 2 in row 2, so forth).

    One option I considered is to encode the string variables and use egen's -rownonmiss-. However, that seems a bit roundabout. Is there an alternative?

    Thanks very much!
    Krishna

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 var1 str6 var2 str5 var3
    "asdfdgf" ""       "mnbvc"
    ""        "qwerty" "mnbvc"
    ""        "qwerty" ""     
    "asdfdgf" "qwerty" "mnbvc"
    "asdfdgf" "qwerty" "mnbvc"
    end

  • #2
    Thanks for the data example. You missed the strok option that makes your work-around unnecessary.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str7 var1 str6 var2 str5 var3
    "asdfdgf" ""       "mnbvc"
    ""        "qwerty" "mnbvc"
    ""        "qwerty" ""     
    "asdfdgf" "qwerty" "mnbvc"
    "asdfdgf" "qwerty" "mnbvc"
    end
    
    egen wanted = rownonmiss(var?) , strok 
    
    list 
    
         +-----------------------------------+
         |    var1     var2    var3   wanted |
         |-----------------------------------|
      1. | asdfdgf            mnbvc        2 |
      2. |           qwerty   mnbvc        2 |
      3. |           qwerty                1 |
      4. | asdfdgf   qwerty   mnbvc        3 |
      5. | asdfdgf   qwerty   mnbvc        3 |
         +-----------------------------------+

    Comment


    • #3
      Oh, that's right! Thanks very much, Nick Cox and Jean-Claude Arbaut!

      Comment

      Working...
      X