Hello statalisters,
I've been trying to trying to calculate the h-index for a large dataset consisting of scientists. The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:
With a little bit of help from the Stata forum (https://www.stata.com/statalist/arch.../msg00625.html), I could calculate the h-index of each authorid-year (hindex, column 5) using the following command:
bysort authorid year : egen temp = rank(-citation), unique
bysort authorid year citation : egen rank = max(temp)
by authorid year : egen hindextemp = max(rank) if citation >= rank
bysort authorid year : egen hindex = max(hindextemp)
drop rank temp hindextemp
What I'm having a hard time with is calculating the cumulative h-index of each authorid-year (c_hindex, column 6).
For instance, there are 7 articles that have been cited at least 7 times from 1990 to 1991, therefore the cumulative h index for A in 1991 would be 7. As of 1992, the cumulative h index would be 9.
Could anybody help me up with the command to generate the cumulative h-index? Thank you very much in advance!
Hyeonjin
I've been trying to trying to calculate the h-index for a large dataset consisting of scientists. The h-index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:
author_id | year | article_id | citation | hindex | c_hindex |
A | 1990 | 1 | 7 | ||
A | 1990 | 2 | 5 | ||
A | 1990 | 3 | 13 | ||
A | 1990 | 4 | 12 | ||
A | 1990 | 5 | 17 | ||
A | 1991 | 6 | 11 | ||
A | 1991 | 7 | 9 | ||
A | 1991 | 8 | 19 | ||
A | 1991 | 9 | 15 | ||
A | 1992 | 10 | 14 | ||
A | 1992 | 11 | 4 | ||
A | 1992 | 12 | 3 | ||
A | 1992 | 13 | 7 | ||
A | 1992 | 14 | 5 | ||
A | 1992 | 15 | 4 | ||
A | 1992 | 16 | 11 | ||
With a little bit of help from the Stata forum (https://www.stata.com/statalist/arch.../msg00625.html), I could calculate the h-index of each authorid-year (hindex, column 5) using the following command:
bysort authorid year : egen temp = rank(-citation), unique
bysort authorid year citation : egen rank = max(temp)
by authorid year : egen hindextemp = max(rank) if citation >= rank
bysort authorid year : egen hindex = max(hindextemp)
drop rank temp hindextemp
What I'm having a hard time with is calculating the cumulative h-index of each authorid-year (c_hindex, column 6).
For instance, there are 7 articles that have been cited at least 7 times from 1990 to 1991, therefore the cumulative h index for A in 1991 would be 7. As of 1992, the cumulative h index would be 9.
Could anybody help me up with the command to generate the cumulative h-index? Thank you very much in advance!
Hyeonjin
Comment