Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to avoid missing values with unique function?

    Hi!

    I'm using the very helpful unique command, however I have one issue which I came accross. The command creates a new variable which is coded missing except for the first record in each group defined by the levels of the by variable. Nevertheless, I don't want the missing values but the same value which is created for the first record in each group.

    The code I used was:

    Code:
    unique aanbodid3, by(kokpc_41)
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float aanbodid3 int kokpc_41
    204230   26
    157300   26
    179122   26
    178531   38
    179290   38
    179293   38
    179289   38
    161012  611
     29566 1003
     29127 1003
     29562 1003
     29136 1003
     29565 1003
     29557 1003
     29564 1003
     29135 1003
     29560 1003
     29561 1003
     29137 1003
    end

  • #2
    I guess -unique- is some user written program? It's not part of official Stata and I don't know where it comes from.

    Anyway, if the problem is you just want to overwrite the missing values with the value placed in the first observation per by group, that can be done from first principles without knowing anything in particular about -unique-. Let's say the new variable created by -unique- is called newvar

    Code:
    by kokpc_41 (newvar), sort: replace newvar = newvar[1]

    Comment


    • #3
      Code:
      bysort kokpc_41: egen uniquevalues = mean(_Unique)
      Is that what you're looking for?

      If you have a huge database and timing is an issue, you can also alter the unique command. I think when you remove the if-condition of this line, it will put the number in all observations.

      Code:
      qui by `by': gen `generate' = `uniq'[_N] if _n==1
      I think you could save even more time by replacing

      Code:
      * Original code
      qui by `by': replace `uniq' = sum(`uniq')
      qui by `by': gen `generate' = `uniq'[_N] if _n==1
      
      * Alternative code
      qui by `by': egen `generate' = total(`uniq')
      But well, chances are those recodings will cost you more time than to just use the original command...

      Comment


      • #4
        unique is a user-written (community contributed) command (not function) from SSC, so following FAQ Advice #12 you are asked to specify that.


        Running your example makes your problem evident, and then there is a one-line fix, given at the end.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float aanbodid3 int kokpc_41
        204230   26
        157300   26
        179122   26
        178531   38
        179290   38
        179293   38
        179289   38
        161012  611
         29566 1003
         29127 1003
         29562 1003
         29136 1003
         29565 1003
         29557 1003
         29564 1003
         29135 1003
         29560 1003
         29561 1003
         29137 1003
        end
        
        unique aanbodid3, by(kokpc_41)
        
        list, sepby(kokpc_41)
        
             +-------------------------------+
             | aanbod~3   kokpc_41   _Unique |
             |-------------------------------|
          1. |   204230         26         . |
          2. |   157300         26         3 |
          3. |   179122         26         . |
             |-------------------------------|
          4. |   178531         38         4 |
          5. |   179290         38         . |
          6. |   179293         38         . |
          7. |   179289         38         . |
             |-------------------------------|
          8. |   161012        611         1 |
             |-------------------------------|
          9. |    29566       1003         . |
         10. |    29127       1003        11 |
         11. |    29562       1003         . |
         12. |    29136       1003         . |
         13. |    29565       1003         . |
         14. |    29557       1003         . |
         15. |    29564       1003         . |
         16. |    29135       1003         . |
         17. |    29560       1003         . |
         18. |    29561       1003         . |
         19. |    29137       1003         . |
             +-------------------------------+
        
        
        bysort kokpc_41 (_Unique): replace _Unique = _Unique[1]
        This variable is the number of distinct values in a group, and there are several other ways to get it without any add-ons. Here's one:


        Code:
        egen tag = tag(kokpc_41 aanbodid3)
        egen ndistinct = total(tag), by(kokpc_41)
        For much more discussion, see http://www.stata-journal.com/sjpdf.h...iclenum=dm0042

        Comment


        • #5
          Perhaps Maarten refers to the user-written programme -unique- (-ssc inst unique-; https://www.stata.com/support/faqs/d...-observations/).

          PS: at the time of writing Nick's comprehensive reply did not appear yet.
          Last edited by Carlo Lazzaro; 22 Sep 2017, 08:55.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks for your answers. I was indeed referring to the user written programme unique. Apologies for omitting this info. Your solution to the problem works indeed Nick, thanks.

            Comment

            Working...
            X