Calculating cumulative h-index with Stata?

Hyeonjin Cha

Join Date: Feb 2020
Posts: 2

Calculating cumulative h-index with Stata?

19 Feb 2020, 15:45

Hello statalisters,

I've been trying to trying to calculate the h-index for a large dataset consisting of scientists. The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:

author_id	year	article_id	citation	hindex	c_hindex
A	1990	1	7
A	1990	2	5
A	1990	3	13
A	1990	4	12
A	1990	5	17
A	1991	6	11
A	1991	7	9
A	1991	8	19
A	1991	9	15
A	1992	10	14
A	1992	11	4
A	1992	12	3
A	1992	13	7
A	1992	14	5
A	1992	15	4
A	1992	16	11

With a little bit of help from the Stata forum (https://www.stata.com/statalist/arch.../msg00625.html), I could calculate the h-index of each authorid-year (hindex, column 5) using the following command:

bysort authorid year : egen temp = rank(-citation), unique
bysort authorid year citation : egen rank = max(temp)
by authorid year : egen hindextemp = max(rank) if citation >= rank
bysort authorid year : egen hindex = max(hindextemp)
drop rank temp hindextemp

What I'm having a hard time with is calculating the cumulative h-index of each authorid-year (c_hindex, column 6).

For instance, there are 7 articles that have been cited at least 7 times from 1990 to 1991, therefore the cumulative h index for A in 1991 would be 7. As of 1992, the cumulative h index would be 9.

Could anybody help me up with the command to generate the cumulative h-index? Thank you very much in advance!

Hyeonjin

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

19 Feb 2020, 18:03

I believe the following does what you want:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 author_id int year byte(article_id citation) "A" 1990 1 7 "A" 1990 2 5 "A" 1990 3 13 "A" 1990 4 12 "A" 1990 5 17 "A" 1991 6 11 "A" 1991 7 9 "A" 1991 8 19 "A" 1991 9 15 "A" 1992 10 14 "A" 1992 11 4 "A" 1992 12 3 "A" 1992 13 7 "A" 1992 14 5 "A" 1992 15 4 "A" 1992 16 11 end capture program drop one_author program define one_author gsort -citation gen indexable = (_n >= citation ) egen index = max(cond(indexable, citation, .)) replace index = min(_N, citation[_N]) if missing(index) drop indexable keep in L exit end // CALCULATE CUMULATIVE H-INDEX FOR EACH AUTHOR rangerun one_author, by(author) interval(year . 0) rename index c_hindex

To use this code you need the -rangerun- program, written by Robert Picard and available from SSC. To use -rangerun- you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
2 likes
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#3

20 Feb 2020, 03:08

Inspired by sensei Clyde Schechter's solution, which revokes the beauty of the wonderful package -rangerun-, I would like to contribute my part: A shorter road to go.

Code:

capture program drop one_author2 program define one_author2 egen f = rank(citation), f egen c = max(f*(citation >= f)) drop f end rangerun one_author2, by(author) interval(year . 0)

Notice that the h_index by each author year (as in your original post) could also be captured with the same mechanism.

Code:

bysort authorid year: egen f2 = rank(citation), f bysort authorid year: egen h = max(f2*(citation >= f2)) drop f2
2 likes
Comment

Announcement

Calculating cumulative h-index with Stata?

Comment

Comment