Four problems of the h-index for assessing the research productivity and impact of individual authors

This paper reveals that when the h-index is used to assess the research productivity and impact of individual authors, four major problems exist because the h-index does not take into account the number of authors in each publication. This paper shows that the fractional h-index (or the individual hi-index in this paper), which distributes each publication’s received citations among its authors, can solve these problems effectively. This paper recommends that websites (such as scholar.google.com and researchgate.net) should add the hi-index for the sake of building a fairer and more ethical research community.

The h-index (Hirsch, 2005) has been widely used to assess the productivity and impact of journals (such as www.scimagojr.com) and authors (such as scholar.google.com and researchgate.net). Because a citation is a reference in a publication to another publication, it is appropriate to use the h-index to assess a journal's productivity and impact based on its number of publications (h) that have received at least h citations each.
However, because the h-index does not take into account the number of authors in each publication (Schubert & Schubert, 2019), the following four problems exist in the research community when the h-index is used to assess each author's research productivity and impact: Problem #1-individually taking full credit for a multiauthored publication's all contributions. A multiauthored publication is generated by intellectual contributions from multiple authors. Each citation cites a multiauthored publication as a whole, not each of its authors. When the number of all citations that a multiauthored publication has received is used in computing an individual author's h-index, it means that this author takes full credit for this publication's all contributions (including those from other authors), which is fundamentally and ethically improper.
Problem #2-creating inflation in counting citations. When the h-index is used to assess the research productivity and impact of authors, it also creates a problem of "inflation in counting citations". For instance, suppose that a three-authored publication has received 60 citations. If each of these three authors credits this publication's all 60 citations to herself or himself when computing her or his h-index, then these 60 citations will be redundantly counted three times on these authors' separate webpages (e.g., scholar.google.com). This generates totally 180 citation counts by these three authors, thus inappropriately inflating their research productivity and impact from one publication.
Here is an example from Google Scholar 1 : The publication "Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)" had 10,477 citations as of December 18, 2021. This publication has 2931 authors. 2 If Google Scholar would list this publication's 10,477 citations on the webpage of each of those 2931 authors, then each author would have over 10,000 citations on Google Scholar, and Google Scholar would redundantly generate enormous 30,708,087 citation counts in total from this publication to drastically exaggerate research productivity in the research community.
Problem #3-creating unfairness in evaluating research contributions. Each coauthor makes a partial contribution to a multiauthored publication. Because the h-index does not take into account the number of authors in publications, the h-index is computed by ignoring the difference between partial contributions in multiauthored publications and full contributions in single-authored publications, thus creating unfairness in assessing research contributions.
Problem #4-driving the unethical behavior of gift authorship. The h-index has been used for decision-making on appointments, promotion and tenure, etc. Its popularity and importance may drive some people to unethically increase their h-index through gift authorship. Because the h-index does not take the number of authors in each publication into account, some people may add each other's names to the author lists of publications, in which they have no or little contributions, to unethically boost their h-index.

Solution: taking the number of authors in each publication into account
Although many variants of the h-index have been proposed (Alonso et al., 2009;Batista et al., 2006;Hirsch, 2019;Schreiber, 2008a, b;Schubert & Schubert, 2019;Todeschini & Baccini, 2016), the h-index is still widely used for two reasons: it is easy to understand, compute, and interpret by almost all users to evaluate their research productivity and impact; it can be used across different disciplines, different types of publications, etc.

Assessing the research productivity and impact of individual authors more accurately
Among the proposed variants of the h-index, "the fractional h-index" has taken into account the number of authors in publications by "giving an author of an m-authored paper only a credit of c m if the paper received c citations" (Egghe, 2008). This paper renames "the fractional h-index" to "the h i -index"; one reason is that although a fractional number of citations that each multiauthored publication has received are used to compute each author's h i -index, the subscript "i" is used to emphasize the individual contributions in publications that each author should be accredited. Note that this paper does not propose any new variant of the h-index, but attempts to use the h i -index (i.e., the fractional h-index) to address the aforementioned four problems.
Note that the h i -index is different from the h I -index (Batista et al., 2006), the h m -index (Schreiber, 2008a, b), and the h α index (Hirsch, 2019). The h I -index is obtained by dividing "h by the mean number of researchers in the h publications" (Batista et al., 2006); the h Iindex "disfavours people with some papers with a large number of co-authors" (Schreiber, 2008a). The h m -index is determined by comparing "an effective rank" with the number of citations that publications have received (Schreiber, 2008a, b); it seems more difficult for users (especially non-technical users) to interpret their research productivity and impact by using "an effective rank" than by directly using the number of citations that publications have received. The h α index is proposed as "a measure of the scientific production of a scientist that counts only those papers where the scientist is the leading author", with an assumption that "the coauthor with the highest h-index is the most likely" leading author in a multiauthored publication (Hirsch, 2019); in reality, however, this assumption is questionable for many publications.
To discuss how the h i -index can address the aforementioned four problems, it is necessary to first explain how it works: The h i -index is used to assess the research productivity and impact of individual authors. For an author who has published k publications in total, the author's h i -index is defined as the maximum value of n such that the author has n publications, each of which has where the jth publication has m j authors and has received c j citations (c j ≥ 1, m j ≥ 1, 1 ≤ j ≤ k, 1 ≤ n ≤ k), or is 0 if each publication has c j m j < 1 (c j ≥ 0, m j ≥ 1, 1 ≤ j ≤ k).
When computing the h i -index, it is recommended that if the percentages of contributions from the authors to a publication are known, then these percentages should be used to distribute the number of citations that the publication has received among the authors (Tscharntke et al., 2007); otherwise, the computation of the h i -index assumes equal contributions from the authors by default.
Mathematically, suppose that an author has k publications in total, and these k publications have m 1 , m 2 , …, m k (m j ≥ 1, 1 ≤ j ≤ k) authors and have received c 1 , c 2 , …, c k (c j ≥ 0, 1 ≤ j ≤ k) citations, respectively; for each publication, let f be the function that corresponds to the number of citations per author, i.e., (1 ≤ j ≤ k) are ordered in descending order (i.e., the highest value f (1) in the 1st position and the lowest value f (k) in the kth position), then the h i -index is computed as follows: Table 1 An example of the h-index As an illustrative example to compare the h-index and the h i -index, Table 1 shows the number of citations (c j ) and the number of authors (m j , in parentheses) of an author's 15 publications (i.e., k = 15). According to Table 1, this author's h-index = 9 based on the publications highlighted in bold. Next, Table 2 calculates the division f (j) = c j m j for each publication. Then, Table 3 reorders the values of f (j) in Table 2 from high to low, along with the corresponding publications. According to Table 3, this author's h i -index = 7 based on the publications highlighted in bold.
Here are two insightful observations from this example. First, publication #11 does not contribute to the h-index = 9 in Table 1, but it contributes to the h i -index = 7 in Table 3. It is a single-authored publication. Because #11's all citations are accredited to this single author's intellectual contribution, it makes sense that #11 boosts this author's individual h i -index.
Second, publications #4 and #5 contribute to the h-index = 9 in Table 1, but they do not contribute to the h i -index = 7 in Table 3. Although #4's 37 citations and #5's 34 citations are higher than most of the other publications in Table 1, these two publications have the highest numbers of authors among all publications. Having a relatively large number of coauthors probably implies that this author's contributions in these two publications are relatively small, and thus it makes sense that #4 and #5 do not contribute to increasing this author's individual h i -index in Table 3.

Solving the four problems
The aforementioned four problems can be effectively addressed by using the h i -index: The h i -index prevents each author from taking full credit for a multiauthored publication's all contributions (Problem #1). The h i -index eliminates inflation in counting citations, because it ensures that the portions of a publication's received citations distributed among its authors will add up to its total number of received citations (Problem #2). By taking a publication's all authors into account when assessing their contributions to the publication, the h i -index promotes fairness in assessing the research contributions of authors (Problem #3).
To discuss how the h i -index addresses Problem #4, assume that one more person was added to Publication #7 in Table 1 Table 3. In general, adding more people to a publication's author list through gift authorship will slow down this publication's potential contribution to each author's h i -index. The h i -index can make it difficult to "inflate results with coauthorship of documents for reasons other than good scientific performance" (Vieira & Gomes, 2011). Due to this effect, the use of the h i -index will discourage authors from adding people with no or little research contributions to their publications, and thus can effectively curb the unethical practice of gift authorship.

Potential impact and causes
To study how the change from the h-index to the h i -index may potentially impact authors in different fields, this paper compares the h-index and the h i -index of 12 Nobel laureates in four scientific fields. As shown in Table 4, the citation data of three Nobel laureates in each field are obtained from the Google Scholar webpages.
For each author in Table 4, (1) where the author has k cited publications, and the jth publication has m j authors and has received c j citations (c j ≥ 1, m j ≥ 1, 1 ≤ j ≤ k).
Column (8) in Table 4 shows that if the citations in column (1) would be redundantly counted for each author listed in these publications, then Google Scholar would generate much higher total citation counts for all authors. Such huge inflation in counting citations may create misleading impressions of research productivity and impact in the research community.
Based on the data in Table 4, Fig. 1 shows that for these four scientific fields, the authors in Economic Sciences (i.e., E1, E2, and E3) generally have lower average numbers of authors per cited publication; the authors in Physiology or Medicine (i.e., M1, M2, and M3) generally have higher average numbers of authors per cited publication; however, P2 has the highest average number of authors per cited publication. Figure 1 reveals that different fields, as well as different research areas within the same field (e.g., physics), may have different practices of research collaboration and publication.
Although Fig. 1 does not show a simple linear relationship between the blue bars and the red line or the green line, it shows that in general, when the average number of authors per cited publication becomes smaller, or when the percentage of an author's fractional number of citations becomes larger, the author's percentage decrease from the h-index to the h i -index becomes smaller.
Based on the data in Table 4, Table 5 shows the rankings of 12 authors when the h-index is used; Table 6 shows how their rankings change when the h i -index is used instead of the h-index.
There are a few observations from the analysis of these 12 authors in four research fields. First, the rankings of the authors from different fields are all mixed in both Table 5 and Table 6; no field is ranked definitely higher or lower than all other fields based on the h-index or the h i -index. Second, when the h i -index is used instead of the h-index, the rankings of all three authors in Economic Sciences are increased (see Table 6); this is probably because these three authors have lower average numbers of authors per cited publication than the authors in the other three fields (see Fig. 1). Lastly, when the h i -index is used instead of the h-index (see Table 6), for each field other than Economic Sciences, some authors' rankings are increased while some authors' rankings are decreased; the rankings of M1 and P1 remain unchanged. These observations reveal that changing from the h-index to the h i -index may have different impacts on the rankings of authors in different fields or in different research areas within the same field.  Fig. 1 The relationship between Table 4's column (6) (in blue) and columns (3) (in green) and (7) (in red)

Determining the percentages of partial contributions of coauthors
To calculate an author's h i -index, it is necessary to determine the author's percentage of contributions in each multiauthored publication. This paper considers the following two methods: The first method is that the authors disclose the percentages of their respective contributions in a multiauthored publication. Because different fields may have different practices of research collaboration and publication, it should be the authors who decide their publication's author list. In general, any individuals who have contributed significantly to a publication should be individually named in the author list. When the authors decide the author list and the order of their names, they typically have some sense about their respective contributions. It is thus possible to estimate such sense into certain percentages of contributions, which sum to 100% for each publication.
The second method is to assume equal contributions from coauthors. When the authors do not disclose the percentages of their respective contributions in a multiauthored publication, this method is desirable for a few reasons: First, because different fields may have different practices of determining the author list, the contributions of authors, the order of authors, etc., this second method practically simplifies and standardizes the implementation of the h i -index across all fields. Second, this method does not force the authors to fight over agreeing on the percentages of their respective contributions in a multiauthored publication, thus encouraging productive research collaboration. Lastly, a prior study showed that for overcoming the h-index's problem of ignoring the number of authors in each publication, the improvement of the authorship-weighted methods (e.g., first-authoremphasis, corresponding-author-emphasis) compared to the equal-contribution method "is not as high as one would expect" (Vavryčuk, 2018). It is also worth mentioning that while the authorship-weighted methods "may be very useful as applied to a particular field or discipline, they cannot be used across the board because of the very different practices in different disciplines regarding order of authors, significance of authorship position in the author's list, etc." (Hirsch, 2019).
For calculating the h i -index, this paper recommends that if the authors disclose the percentages of their respective contributions in a multiauthored publication, then these percentages should be used for allocating the publication's received citations among its authors; if the authors do not disclose the percentages of their respective contributions in a Table 6 From the h-index to the h i -index: Change of rankings (based on Table 4) publication, then the second method can be used. The second method achieves a good balance of overcoming the h-index's problem of ignoring the number of authors in each publication, making the h i -index implementable across different fields, encouraging productive research collaboration among coauthors, and keeping it easy for authors to interpret their research productivity and impact in a straightforward way based on the number of citations that each publication has received.

Discussion and conclusion
This paper revealed that because the h-index does not take into account the number of authors in each publication, four major problems exist when the h-index is used to assess the research productivity and impact of authors: individually taking full credit for a multiauthored publication's all contributions, creating inflation in counting citations, creating unfairness in evaluating research contributions, and driving the unethical behavior of gift authorship. This paper showed that the h i -index (i.e., the fractional h-index), which distributes each publication's received citations among its authors, can help solve these four problems effectively.
The h i -index has several advantages. First, the h i -index assesses each author's research productivity and impact more accurately and fairly than the h-index. Second, like the original h-index, the h i -index is still easy to understand, compute, and interpret by almost all users (including non-technical users) in comparison with many other variants of the h-index such as the h I -index and the h m -index. Lastly, the h i -index can be used across different disciplines, different types of publications, etc.
This paper used the Google Scholar data of 12 Nobel laureates in four scientific fields to show what happens when the h i -index is used instead of the h-index. These examples demonstrated that the existing h-index drastically exaggerates research productivity and impact. In addition, these examples showed that the percentage decreases from the h-index to the h i -index are generally smaller for the authors whose average number of authors per publication is smaller (or whose portions of contributions to publications are larger). This finding provides a useful implication that the use of the h i -index can potentially motivate more effort to make research contributions.
Although software has been developed for authors to install and compute their own h iindex, such as Publish or Perish (Harzing 2021), the h i -index (i.e., the fractional h-index) is still not widely used. This is probably because nowadays, most authors rely on websites where they can find their h-index instantly. Therefore, this paper recommends that websites (such as scholar.google.com and researchgate.net) should add the h i -index for the sake of building a fairer and more ethical research community, assessing each author's research productivity and impact more accurately, and encouraging more contributions to research and publication.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.