Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variables for table values for percentile

    Hey guys,

    I want to create one dummy variable which assigns a value of 1 for values above 90 percentile cites for each journal.
    What I mean is that I want a dummy variable = ( if journal = "A", cites >107 AND if journal = "B", cites >92 AND if journal = "C", cites >55 AND if journal = "D", cites >51 AND if journal = "E", cites >106)
    How do I do it?

    journal N(cites) p90(cites)
    ----
    A 152 107
    B 227 92
    C 181 55
    D 156 51
    E 144 106
    ----

    Thanks!




  • #2
    See if this works. I created some toy data, so my p90 won't match yours.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 journal byte article int cites
    "A" 1  38
    "A" 2 119
    "A" 3  76
    "A" 4  34
    "A" 5  73
    "A" 6  89
    "A" 7 210
    "A" 8 164
    "B" 1 189
    "B" 2 176
    "B" 3 148
    "B" 4 130
    "B" 5  26
    "C" 1 124
    "C" 2 182
    "C" 3 139
    end


    Code:
    . table journal, c(N cites p90 cites)
    
    ----------------------------------
      journal |   N(cites)  p90(cites)
    ----------+-----------------------
            A |          8         210
            B |          5         189
            C |          3         182
    ----------------------------------
    
    bysort journal: egen journal_90 = pctile(cites), p(90)
    gen article_90 = 0
    replace article_90 = 1 if cites >= journal_90 & cites!=.  // these two could be combined into single line to create indicator variable
    label var article_90 "1 if article in 90th percentile in cites for journal"
    
    . list journal article cites journal_90 article_90, sepby(journal)
    
         +-------------------------------------------------+
         | journal   article   cites   journ~90   artic~90 |
         |-------------------------------------------------|
      1. |       A         1      38        210          0 |
      2. |       A         2     119        210          0 |
      3. |       A         3      76        210          0 |
      4. |       A         4      34        210          0 |
      5. |       A         5      73        210          0 |
      6. |       A         6      89        210          0 |
      7. |       A         7     210        210          1 |
      8. |       A         8     164        210          0 |
         |-------------------------------------------------|
      9. |       B         1     189        189          1 |
     10. |       B         2     176        189          0 |
     11. |       B         3     148        189          0 |
     12. |       B         4     130        189          0 |
     13. |       B         5      26        189          0 |
         |-------------------------------------------------|
     14. |       C         1     124        182          0 |
     15. |       C         2     182        182          1 |
     16. |       C         3     139        182          0 |
         +-------------------------------------------------+
    
    .

    Comment


    • #3
      Thank you David! It works perfectly! Exactly what I was looking for.

      I just changed this replace article_90 = 1 if cites >= journal_90 & cites!=. to this replace article_90 = 1 if cites >= journal_90 cause I got an error, but the solution still holds.

      Comment


      • #4
        Hi Laksh,

        Glad I could help! Just check to make sure you don't have observations where cites is missing, because Stata treats missing (cites==. ) as larger than any number. And you probably don't want obs with missing cites as listed in the top 10% . (You probably already know that, but just wanted to make sure.)

        Comment


        • #5
          Thank you for the tip! Thankfully there were no missing data in my data set, so I haven't faced the issue.

          Also, is there anyway to sum up duplicate data?


          Code:

          Code:
           * Example generated by -dataex-.
          input str3 author str1 journal byte article int cites
          "Lak" "A" 1  38
          "Tak" "A" 2 119
          "Sak" "A" 3  16
          "Lak" "A" 4  34
          "Sak" "B" 1 189
          "Mak" "B" 2 176
          "Dak" "C" 1 14
          "Tak" "C" 2 10
          end

          So, here the author might make the cut off in one journal and be assigned 1 (article_90 variable) and be assigned 0 for another journal. The author's name may or may not be repeated multiple times. Is there any way to sum up the cites for each author, and give them 1 if they get assigned it once at least, and then remove all duplicate names?

          Comment


          • #6
            Sure, the only challenge would be how to handle multiple authors on the paper. Also, I would create a numeric author_id (rather than try to do this by author name).

            Code:
            NOTE: This includes the count_article_90 variable created
            * dataex, varlabel
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str1 journal byte article int cites float(journal_90 article_90) str7 author float count_article_90
            "A" 1  38 210 0 "Amanda"  0
            "B" 2 176 189 0 "Amanda"  0
            "A" 2 119 210 0 "Brooke"  1
            "B" 1 189 189 1 "Brooke"  1
            "A" 3  76 210 0 "Charles" 0
            "B" 3 148 189 0 "Charles" 0
            "A" 4  34 210 0 "David"   0
            "A" 8 164 210 0 "David"   0
            "B" 4 130 189 0 "David"   0
            "A" 5  73 210 0 "Emily"   0
            "B" 5  26 189 0 "Emily"   0
            "C" 3 139 182 0 "Frank"   0
            "A" 6  89 210 0 "Laksh"   0
            "C" 1 124 182 0 "Laksh"   0
            "A" 7 210 210 1 "Preeti"  2
            "C" 2 182 182 1 "Preeti"  2
            end
            label var article_90 "1 if article in 90th percentile in cites for journal"
            label var count_article_90 "Count of articles in top 10% an author has"
            Code:
            egen count_article_90 = total(article_90), by(author)  // this is what creates it
            label var count_article_90 "Count of articles in top 10% an author has"
            
            . list journal article cites journal_90 article_90 author count_article_90, sepby(journal)
            
                 +----------------------------------------------------------------------+
                 | journal   article   cites   journ~90   artic~90    author   count~90 |
                 |----------------------------------------------------------------------|
              1. |       A         1      38        210          0    Amanda          0 |
              2. |       A         2     119        210          0    Brooke          1 |
              3. |       A         3      76        210          0   Charles          0 |
              4. |       A         4      34        210          0     David          0 |
              5. |       A         5      73        210          0     Emily          0 |
              6. |       A         6      89        210          0     Laksh          0 |
              7. |       A         7     210        210          1    Preeti          2 |
              8. |       A         8     164        210          0     David          0 |
                 |----------------------------------------------------------------------|
              9. |       B         1     189        189          1    Brooke          1 |
             10. |       B         2     176        189          0    Amanda          0 |
             11. |       B         3     148        189          0   Charles          0 |
             12. |       B         4     130        189          0     David          0 |
             13. |       B         5      26        189          0     Emily          0 |
                 |----------------------------------------------------------------------|
             14. |       C         1     124        182          0     Laksh          0 |
             15. |       C         2     182        182          1    Preeti          2 |
             16. |       C         3     139        182          0     Frank          0 |
                 +----------------------------------------------------------------------+
            
            
            sort author journal
            list journal article cites journal_90 article_90 author count_article_90, sepby(author)
            
                 +----------------------------------------------------------------------+
                 | journal   article   cites   journ~90   artic~90    author   count~90 |
                 |----------------------------------------------------------------------|
              1. |       A         1      38        210          0    Amanda          0 |
              2. |       B         2     176        189          0    Amanda          0 |
                 |----------------------------------------------------------------------|
              3. |       A         2     119        210          0    Brooke          1 |
              4. |       B         1     189        189          1    Brooke          1 |
                 |----------------------------------------------------------------------|
              5. |       A         3      76        210          0   Charles          0 |
              6. |       B         3     148        189          0   Charles          0 |
                 |----------------------------------------------------------------------|
              7. |       A         4      34        210          0     David          0 |
              8. |       A         8     164        210          0     David          0 |
              9. |       B         4     130        189          0     David          0 |
                 |----------------------------------------------------------------------|
             10. |       A         5      73        210          0     Emily          0 |
             11. |       B         5      26        189          0     Emily          0 |
                 |----------------------------------------------------------------------|
             12. |       C         3     139        182          0     Frank          0 |
                 |----------------------------------------------------------------------|
             13. |       A         6      89        210          0     Laksh          0 |
             14. |       C         1     124        182          0     Laksh          0 |
                 |----------------------------------------------------------------------|
             15. |       A         7     210        210          1    Preeti          2 |
             16. |       C         2     182        182          1    Preeti          2 |
                 +----------------------------------------------------------------------+

            Comment


            • #7

              How can I sum up the cites for each author, and consolidate it to one author entry like the one given below? Would it be possible? (Thank you for your time - really sorry if the questions seem silly)

              Code:
                  
                +----------------------------------------------------------------------  
                      cites    author   count~90                                            
                  ----------------------------------------------------------------------
              1. |      214   Amanda       0                                            
              2. |      308   Brooke       1                                                
              3. |      224   Charles      0                                                
              4. |      328    David       0                                                                                      
              5. |      392    Preeti      2                                              
                 +---------------------------------------------------------------------



              Last edited by Laksh Noori; 03 Dec 2018, 17:23.

              Comment


              • #8
                You'll use the collapse command. NOTE: Save the data before collapsing, because collapsing deletes data and creates a new dataset. (For example, save it as "One line per author.dta"). You will probably also want to rename the variable cites in the collapsed data to total_cites or something like that because it now represents the total cites received by an author (over *all* his or her articles), whereas in the original data it represented cites a *specific article* received.

                Code:
                preserve   // allows you to do restore afterwards
                collapse (sum) cites (max) count_article_90, by(author)
                
                . list author cites count_article_90 , noobs
                
                  +----------------------------+
                  |  author   cites   count~90 |
                  |----------------------------|
                  |  Amanda     214          0 |
                  |  Brooke     308          1 |
                  | Charles     224          0 |
                  |   David     328          0 |
                  |   Emily      99          0 |
                  |----------------------------|
                  |   Frank     139          0 |
                  |   Laksh     213          0 |
                  |  Preeti     392          2 |
                  +----------------------------+
                
                . desc
                
                Contains data
                  obs:             8                          
                 vars:             3                          
                 size:           152                          
                ---------------------------------------------------------------------------------------------------------------------------------------------
                              storage   display    value
                variable name   type    format     label      variable label
                ---------------------------------------------------------------------------------------------------------------------------------------------
                author          str7    %9s                  
                cites           double  %8.0g                 (sum) cites
                count_articl~90 float   %9.0g                 (max) count_article_90
                ---------------------------------------------------------------------------------------------------------------------------------------------
                Sorted by: author

                Comment


                • #9
                  After I posted the previous, I thought you might want to keep more data in the collapsed file (like how many articles the person had, the max cites any one of their papers received, etc). So from the data in post #6, I did the following:

                  Code:
                  preserve
                  collapse (sum) cites (count) article_count = cites (mean) avg_cites = cites (min) min_cites = cites (max) max_cites=cites count_article_90, by(author)
                  * restore (if desired)
                  * Obviously, you could also create avg_cites = cites / article_count
                  
                  list author cites article_count avg_cites min_cites max_cites count_article_90, noobs
                  
                    +------------------------------------------------------------------------+
                    |  author   cites   articl~t   avg_ci~s   min_ci~s   max_ci~s   count~90 |
                    |------------------------------------------------------------------------|
                    |  Amanda     214          2        107         38        176          0 |
                    |  Brooke     308          2        154        119        189          1 |
                    | Charles     224          2        112         76        148          0 |
                    |   David     328          3    109.333         34        164          0 |
                    |   Emily      99          2       49.5         26         73          0 |
                    |------------------------------------------------------------------------|
                    |   Frank     139          1        139        139        139          0 |
                    |   Laksh     213          2      106.5         89        124          0 |
                    |  Preeti     392          2        196        182        210          2 |
                    +------------------------------------------------------------------------+
                  
                  
                  . desc
                  
                  Contains data
                    obs:             8                          
                   vars:             7                          
                   size:           248                          
                  ---------------------------------------------------------------------------------------------------------------------------------------------
                                storage   display    value
                  variable name   type    format     label      variable label
                  ---------------------------------------------------------------------------------------------------------------------------------------------
                  author          str7    %9s                  
                  cites           double  %8.0g                 (sum) cites
                  article_count   long    %8.0g                 (count) cites
                  avg_cites       float   %8.0g                 (mean) cites
                  min_cites       int     %8.0g                 (min) cites
                  max_cites       int     %8.0g                 (max) cites
                  count_articl~90 float   %9.0g                 (max) count_article_90
                  ---------------------------------------------------------------------------------------------------------------------------------------------
                  Sorted by: author
                  Last edited by David Benson; 03 Dec 2018, 18:03.

                  Comment


                  • #10
                    Thank you so, so, so much! Thank you for taking time to help me out with this.

                    Comment

                    Working...
                    X