Two h-Index Benchmarks for Evaluating the Publication Performance of Medical Informatics Researchers

Background The h-index is a commonly used metric for evaluating the publication performance of researchers. However, in a multidisciplinary field such as medical informatics, interpreting the h-index is a challenge because researchers tend to have diverse home disciplines, ranging from clinical areas to computer science, basic science, and the social sciences, each with different publication performance profiles. Objective To construct a reference standard for interpreting the h-index of medical informatics researchers based on the performance of their peers. Methods Using a sample of authors with articles published over the 5-year period 2006–2011 in the 2 top journals in medical informatics (as determined by impact factor), we computed their h-index using the Scopus database. Percentiles were computed to create a 6-level benchmark, similar in scheme to one used by the US National Science Foundation, and a 10-level benchmark. Results The 2 benchmarks can be used to place medical informatics researchers in an ordered category based on the performance of their peers. A validation exercise mapped the benchmark levels to the ranks of medical informatics academic faculty in the United States. The 10-level benchmark tracked academic rank better (with no ties) and is therefore more suitable for practical use. Conclusions Our 10-level benchmark provides an objective basis to evaluate and compare the publication performance of medical informatics researchers with that of their peers using the h-index.


Evaluations of the h-Index
Criticisms of the h-index include that it is a single measure and consequently could result in an unfair rating of an individual. Therefore, other factors should be taken into account in the evaluation of an individual as there can be exceptions to the rule [3]. For example, the h-index is insensitive to the impact of an author with a small number of highly cited papers [1][2][3], which has been seen as both an advantage and a disadvantage of the metric.
The h-index has also been criticized because authors could use self-citation to inflate their h values. However, there have been many studies on this issue and overall, it has been shown that removing self-citations does not impact h values greatly while making the task of calculating h much more difficult. One of the largest impacts of self-citation was found by Kosmulski in his study of chemistry professors [1,4], where he found that on average, self citations increased a professor's h-index scores by 26%. Self-citation also seems to have a greater effect on the h-index values of younger scientists [1]. On the other hand, it has been argued that within a field, self-citation habits are expected to be more or less uniform [1]. Therefore, no one author within a given field should have a great advantage over the others. Also, the impact of self-citation was shown to decrease over time as external citations for an author's work accumulate [1,5].
The h-index has been said to be unfair to early career researchers, as their scores will invariably be low for a period of time as they build up their body of work. On the other hand, a strong correlation has been found between the original h-index and variants adjusted for age and career stage [6]. Similarly, the h-index has been shown to be biased against those who have interruptions in their work and those who work only part-time [1]. As a result, a gender bias has been found owing to the fact that women generally produced fewer papers than men due to such interruptions and/or reduced work schedules [1].
Another criticism of the h-index is that it is insensitive to the number and ranking of co-authors [1]. All co-authors for a given paper receive the same h-index value for that paper, from the primary author(s) to the last author in the list. Some have proposed methods to correct for this insensitivity. For example, Egghe proposes two methods for dealing with the co-author issue: (a) "fractional citation counts" and (b) "fractional paper counts" [7]. Methods such as these are useful in lessening the effect that papers with many co-authors would have on any individual author's h score, but could undervalue the contributions of first authors who may have been more involved in the work than subsequent co-authors, and introduces the practical difficulty of quantifying individual contributions to a paper.

Citation Databases
The source of the citation counts used to calculate the h-index is integral to its accuracy [3]. Currently, the three most common interdisciplinary databases used to calculate the h-index are Web of Science (WoS), Scopus, and Google Scholar (GS) [1]. Results of the h-index vary between these databases because the sources they index also vary, and some benefits and drawbacks have been Let us begin with GS, a relatively new resource for finding citations introduced in 2004. In terms of benefits, GS indexes a wide range of sources, some of which are not covered by Scopus and WoS, such as books, theses, reports, some conference proceedings, and pre-prints [8]. This can be important for some disciplines such as engineering, computer science and mathematics where such publications are more common [1, [9][10][11][12]. GS could be used to supplement materials found in Scopus and/or WoS for authors who published in formats other than journal papers. However, GS indexes only materials that are digitally available and therefore misses many relevant references and citations, such as older works which have not yet been converted to digital format [10].
Furthermore, GS may not be able to access some publishers' electronic data, limiting its coverage [8,13,14]. The evidence is inconsistent as to whether GS finds more citations than WoS and Scopus [14][15][16], and the explanation for such inconsistencies is likely discipline specific [17]. While a strong correlation in citation counts was reported between WoS and GS [17], Meho and Yang found that the overlap between GS and the union of WoS and Scopus was 30.8%, and that GS was lacking 40.4% of the citations found by the other two databases [10].
Another study reported that GS lacked between 47% and 50% of the citations which were found by the other databases [16]. Many authors have also found that the citation counts in GS are inflated by duplicate citations, false positives and unscholarly citations [1, 9,10,12]. In order to ensure an accurate rating, citations should be verified to weed out any extraneous records [14]. This can sometimes be an arduous process, as Meho and Yang found out when they spent some 3,000 hours verifying GS citations in their study of LIS faculty [10].
Scopus is also a recent addition, launched by Elsevier in 2004. Scopus has been held up as an excellent resource for calculating the h-index [1, 2, 18]. According to its Content Coverage Guide, Scopus currently holds over 44 million records, indexed from 18,500 active serial titles, including peer-reviewed journals, trade journals, book series, and conference materials [19]. Jacso, in his study of the calculation of the h-index across the three databases, indicates that Scopus has "the best software module for presenting results lists" as it sorts articles by decreasing citation counts [18]. Jacso also asserts that Scopus has the most accurate automated calculation of the h-index [18]. The automated calculation in Scopus may still exclude some citations that cannot be properly matched by the software, but it has been found to be more accurate than the WoS automated hindex calculator [18]. However, Scopus is limited in that it only indexes cited Meho and Yang found that for the group they were studying, the overlap between citations found in WoS and Scopus was not considerable, 58.2%, and that Scopus had a higher percentage of unique citations than did WoS (26% vs. 15.8%) [10]. Another study found that the overlap in citations between Scopus and WoS was between 41% and 59%, depending on the discipline [16]. Scopus was found to have greater coverage of conference proceedings than WoS, which may account for much of the difference between these two databases [10].
However, we should note that WoS has expanded its coverage of conference proceedings since Meho and Yang's study was conducted.
WoS originated from the Science Citation Index developed by Eugene Garfield in the early 1960s [20]. Now part of the Thomson ISI Web of Knowledge product, it has been a trusted resource for academics over the years [3]. Jacso holds WoS above all other databases as the most complete [18]. Currently, the Thomson Reuters website indicates that WoS content covers 12,000 journals and 150,000 conference proceedings dating back as far as 1900, for a total of over 46 million records [21]. It has been found, however, that in order to obtain accurate citation counts in WoS, one must seek out and identify all stray and orphan citations that could not be properly matched by its software [1, 9,18]. The automatic h-index calculations, as well as citation searches by author, can exclude many citations that are integral to the h-index score of an individual [1]. Jacso found that the hindex calculated in WoS doubled from the automated version to the manually verified version [18]. WoS indexes ISI journals and therefore seems to have a wider range of sources relating to the natural sciences than other disciplines such as computer science [9]. Consequently the results will vary depending on the discipline [9]. Although WoS is held up as a gold standard of sorts, and contains the widest range of historical records, it is susceptible to similar problems and limitations as Scopus and GS. Jacso warns, "Corroborative tests must be done in every database for important research whose results may affect people" [12].
It could be argued that a union of results from all databases may present the most complete picture of an individual's impact, but does combining citation counts from different databases affect h-index scores in a significant way?
Combining the citations for faculty in library and information science departments from WoS and Scopus significantly altered the ranking of individuals who "appear in the middle of the rankings", but when looking at the group as a whole the overall ranking did not change significantly [10]. In addition, adding GS citations increased the total number of citations for the group by 93.4%; however, these citations did not significantly change the ranking of the group members [10]. One study found very high correlations between GS and Scopus h-index values for US neurosurgeons [22]. Another study found that h-index results were ≥30% higher when calculated using GS for individuals in the areas of mathematics and computer science [11]. Therefore, the impact of citations from GS on h-index values may be discipline specific. Unfortunately, we are not aware of attempts to combine citations from the three databases so the effects of such a compilation are not known in this case. However, to the extent that rankings are not affected, the computation of percentiles in our benchmarks should not be affected by the choice of database.