Academic metrics and their potential distortion by citing retracted literature

There are many metrics in science that have been used to quantify quality and impact and thereby to measure research impact and evaluate the scientific value of a journal and/or a researcher or even an institution or research group (Alonso et al. 2009; Kim and Chung 2018). Academic metrics can be fundamentally divided into two groups, journal-based metrics (JBMs) and author-based metrics (ABMs). The most widespread JBM is, even nowadays, the Clarivate Analytics journal impact factor (JIF) (Garfield 1972), for example in Sweden (Hammarfelt and Rushforth 2017), despite its limitations (Teixeira da Silva and Dobránszki 2017a), abuses (Teixeira da Silva and Bernès 2018), unwarranted use in academic policy and decision making (Paulus et al. 2018), and the need to complement it by corrective measures (Winkmann and Schweim 2000; Aixelá and Rovira-Esteva 2015; Liu et al. 2016). CiteScore was introduced in December 2016 by Elsevier as an alternative to the JIF, and thus serves as a direct competitor of the JIF, is more transparent than the JIF, and plays an increasing role in journal rating and evaluation because it can bridge some of JIF’s limitations, including the fact that it is freely accessible, uses a larger database [Scopus vs. Web of Science (WoS)] and applies a larger period of evaluation than the JIF (Courtney 2017; Teixeira da Silva and Memon 2017). What is common to both the JIF and CiteScore is that both JBMs rely on citations when evaluating scientific impact, prestige and quality, so they are citation-based metrics or citation impact indicators (Waltman 2016; Walters 2017), just like the h-index (Hirsch 2005), which is the most commonly used ABM, either in its original or its modified forms or derivatives (Alonso et al. 2009; Hammarfelt and Rushforth 2017), despite some flaws and limitations (Teixeira da Silva 2018). Since such metrics are used by funding bodies and tenure committees (Roldan-Valadez et al. 2019), corrections to adjust to accommodate for changes in the publishing landscape are needed.

A spike in the number of retractions (Grieneisen and Zhang 2012; Fanelli 2013; Steen et al. 2013; Kuroki and Ukawa 2018), in large part as a result of post-publication peer review, for example anonymous and named commentary on PubPeer leading to retractions (Coudert 2019), as well as an increase in retraction policies (Resnik et al. 2015a), has also cast light on the heightened risks of post-retraction citations, including inflated and undeserved citations (Teixeira da Silva and Bornemann-Cimenti 2017). The numbers themselves may appear insignificant, for example, 331 retracted papers from a pool of 1,114,476 papers published in the fields of chemistry, materials science, and chemical engineering in 2017 and 2018 (i.e., a rate of about 3 retractions per 10,000 publications, or 0.03%) (Coudert 2019). Among 16 open access mega journals, PLOS ONE displayed the highest rate of corrections (3.16%), but the level of retractions was much less (0.023%), the same as Scientific Reports (Erfanmanesh and Teixeira da Silva 2019). In some cases, where the literature has not been sufficiently corrected, possibly as a result of the variation in retraction policies even among leading publishers (Resnik et al. 2015b; Teixeira da Silva and Dobránszki 2017b), citations continue to be assigned to faulty or error-laden papers (Teixeira da Silva and Dobránszki 2018a), further accentuating the need for corrective factors or measures for JBMs and ABMs. This is because a number of highly cited papers continue to be cited, even though they have been retracted (Teixeira da Silva and Dobránszki 2017c). Kuroki and Ukawa (2018) noted that about 10% of retractions resulted from 1 to 2% of retracting authors that had retracted five or more papers. There is also a body of retractions based on unintentional error (Hosseini et al. 2018), further enhancing the notion that retractions are equated with a punitive, and thus stigmatization and a reticent desire to correct the literature as a result (Teixeira da Silva and Al-Khatib 2019). Finally, a whole host of new and experimental corrective measures, such as retract and replace, is complicating the publishing landscape (Teixeira da Silva 2017), including how to deal with citations to papers that are retracted and then republished, a topic that merits greater analysis.

Citation impact indicators, including citation-based JBMs and ABMs, may be distorted if the scientific literature and databases that they are based on contain citations to retracted papers. How does a journal accommodate, for example, retractions that are based on fake peer reviews (Qi et al. 2017)? Using false, skewed or distorted indicators in academic rating and evaluation may result in unfair rewards both for journals whose papers are cited after retraction (i.e., JBMs) and for academics whose retracted papers are cited undeservedly after retraction, i.e., ABMs (Teixeira da Silva et al. 2016; Bar-Ilan and Halevi 2017, 2018). High-profile journals with high JIFs have higher rates of retraction.Footnote 1 Therefore, if retracted papers are cited, there is a demand not only for correcting the downstream literature (Teixeira da Silva 2015) but for correcting the academic metrics that cite them, to regain or reflect their true value.

In this paper, we propose and describe simple models by which citation-based JBMs and ABMs can be corrected by adjusting their equations to account for citations to retracted literature. We propose the use of our models to the two most widespread JBMs, the JIF and CiteScore, to the h-index, which is the most widely used ABM for evaluating the scientific achievement of a researcher, as well as to additional JBMs, such as WoS-based Eigenfactor Score (ES) and Article Influence Score (AIS) and Scopus-based Raw Impact per Paper (RIP). Moreover, we show the practical use of this correction using the JIF of two actual cases. We caution readers and others who may eventually apply these corrective measures, and/or derivatives or improvements thereof, not to use them as punitive measures or shaming tools but rather as academic tools for the pure correction of JBMs and ABMs so as to improve the fairness of the citation of the scientific literature that takes into account retractions and citations to retracted papers.

Proposals to correct citation-based journal- and author-based metrics

Correction of two journal-based metrics, JIF and CiteScore

In our recent paper using JIF as a model metric, we briefly described a prototype concept of how to restore academic metrics that may be distorted by unrewarded citations (Teixeira da Silva and Dobránszki 2018b). We introduced the corrected JIF (cJIF) and described its theoretical basis. The correction was based on the use of a corrective factor (c) which measures the ratio of number of citations to retracted papers to the number of published citable items, as follows:

$$ c = \frac{rc}{n} $$

where rc indicates the number of citations to retracted papers, while n indicates the number of total (citable) publications in a journal in the previous 2 years. The cJIF is calculated accordingly as cJIF = JIF(1 − c). In extreme cases, if rc equals or exceeds the number of citable published items (n), 1 − c ≤ 0, and this causes the value of cJIF to decrease to 0, but a negative value is almost never attained, except for extreme cases of retractions (Table 1). For practical purposes, we recommend that negative values (**in Table 1) be assigned a cJIF of 0 (equivalent to losing the JIF). Separate examples are also provided in Teixeira da Silva and Dobránszki (2018b).

Table 1 Hypothetical outcomes to the cJIF, a corrective measure to correct a journal’s JIF when that JIF is based on citations to literature that it has retracted

In such a case, it could be argued that a journal could or should lose its JIF, even though we affirm throughout this paper that these corrective factors should not be used as punitive factors. This is because, as indicated in Table 1, it is not unreasonable to expect citations to a retracted paper(s) to total a certain fraction of total citations in two previous years, which we label as “realistic” in Table 1. Such values could be equated with bad, poor, non-reproducible or failed science being removed from the main body of reproducible science, which should be cited, leaving behind science that was not yet challenged, or challenged but remained valid or intact (i.e., cJIF). Consequently, a journal which has a large or excessive number of citations (which we refer to as “unrealistic” in Table 1) to retracted papers (i.e., rc) could be equated with a journal that is not fulfilling its academic or scholarly responsibilities, and is publishing bad, poor, erroneous or irreproducible science, and thus does not deserve to be cited. Even if, in an extreme hypothetical scenario, a journal were to be discovered in which most (≥ 50–80%) research findings were “false” (Ioannidis 2005; Colquhoun 2014), leading to a high number of retractions, the correction of the JIF would still remain valid since the JIF reflects the total number of citations and not the spread of papers that receive citations. It is unclear what the distribution of citations to retracted papers might follow, or even if their distribution may be skewed or follow distributions shown by regular citations, i.e., citations to unretracted literature (Blanford 2016),Footnote 2Footnote 3 simply because the body of retracted literature is still small, but growing. This theory has yet to be tested for the retracted literature. We believe that it may thus be irrelevant if, for example, an rc value of 50 is derived from one highly cited (i.e., cited 50 times) retracted paper, or from 50 retracted papers that are cited only once, because ultimately the journal’s cJIF will remain the same.

JIF and CiteScore are calculated in a similar manner. A journal’s JIF for a given year is the quotient of the number of citations in a given year to the number of citable items (only articles and reviews) of the journal in the previous 2 years (Garfield 1972), as assessed from the Clarivate Analytics WoS database, i.e. “a ratio between citations and recent citable items published”.Footnote 4 CiteScore, which is calculated from the Scopus (Elsevier) database, is the quotient of the number of citations to journal documents in a given year and the number of citable items (all documents) of the journals published in the previous 3 years (Kim and Chung 2018). Therefore the corrected CiteScore (cCiteScore) is:

$$ c{\text{CiteScore}} = {\text{CiteScore }}(1{-}c) $$

while interpreting the corrective factor to apply to the same years where citations were measured according to CiteScore’s definition, i.e. for the previous 3 years.Footnote 5

Extending the definition of the corrective factor (c)

Using a similar form of calculation, as was described for the cJIF and cCiteScore, we suggest a corrective factor for correcting a series of JBMs based on citations. A universal corrective factor (cu) would thus be defined as:

$$ c_{u} = \frac{{rc_{i} }}{{n_{i} }} $$

where rci is the number of citations to retracted papers, and ni indicates the number of total citable publications in the journal during the period i, as considered by the given, examined metric. As for the JIF, in extreme cases, if rci equals or exceeds the number of citable published items (ni), and thus \( 1 - c_{\text{u}} \le \, 0 \), the value of the JBM should become 0, i.e., the journal loses its value.

The cu can be applied to some additional JBMs that are used in practice (some examples in Walters 2017) that are calculated similarly to the JIF or CiteScore and provided by WoS, such as the 5-year impact factor, immediacy index, cited half-life, and some additional JBMs provided by Scopus, such as RIP. In these cases, the corrected metric can be obtained by simple corrections, i.e. by multiplying the original indicator by (1 − cu).

We describe next our proposal for using cu to correct additional JBMs, such as ES and AIS, by using equations. “The Eigenfactor calculation is based on the number of times articles from the journal published in the past 5 years have been cited in the JCR year, but it also considers which journals have contributed these citations so that highly cited journals will influence the network more than lesser cited journals. References from one article in a journal to another article from the same journal are removed, so that Eigenfactors are not influenced by journal self-citation”.Footnote 6

Calculation of AIS is based on the ES:

$$ {\text{AIS}} = \frac{{0.01 \times {\text{ES}}}}{X} $$

where X a quotient of the 5-year article number of the j journal and the 5-year article number of all journals.Footnote 7

Calculations of both the ES and the AIS are based on, as the first step, the determination of a cross citation matrix of Zij:

$$ Z_{ij} = \frac{{{\text{cit}}_{{{\text{Year}}\left[ X \right]_{{}} }} }}{{n_{{{\text{Years}}\left[ {X - 1:X - 5} \right]_{{}} }} }} $$

where Zij indicates the citations from journal j in year X to documents published in journal i during years X − 1 to X − 5. After constructing that citation matrix, the ES can be obtained by additional steps that exclude self-citations and involve some normalizations.Footnote 8

Therefore, we propose a correction for both scores (ES and AIS) during the first step of their calculations, i.e. the Zij value should be corrected. The corrected Zij (cZij) value should be calculated by using the cu corrective factor for the five calculated years, as follows:

$$ cZ_{ij} = \left( {1 - c_{u} } \right)\frac{{{\text{cit}}_{{{\text{Year}}X_{{}} }} }}{{n_{{{\text{Years}}\left[ {X - 1:X - 5} \right]_{{}} }} }} = \left( {1 - c_{u} } \right)Z_{ij} $$

Since the calculation of SJR from Scopus is very similar to the calculation of the ES,Footnote 9 both can be corrected in a similar way, i.e., correction of the number of cites by a cu corrective factor, as described above.

Thereby a set of JBMs may be corrected by including the cu which enables excluding the potential distorting effect of citations of retracted literature.

In contrast to 1 − cu, which is an acute form of correction, we also propose a milder form of correction, and apply it to two 2015 retractions, a highly cited (928 citations on journal website) paper published in Wiley’s The Plant Journal (Voinnet et al. 2003), and a less cited paper published in Elsevier’s Experimental Cell Research (zero citations on journal website). Citation data for both papers for 2016 and 2017, were drawn from Clarivate Analytics’ Journal Citation Reports. Citations to these two retracted papers were compared (Table 2). That analysis reveals that when using the mild form of correction to discount the citations accredited to Voinnet et al. (2003), the 2018 JIF of The Plant Journal is reduced by 4.9%, or by 30% when a more acute form of the correction JIF(1 − c) [i.e., 5.726(1 − 0.294)] was applied. Similarly, the correction of the 2018 JIF of Experimental Cell Research accredited to citation to Yin et al. (2015) in 2016 (no citations to the paper in 2017) results in a reduction of 0.00286, or 0.09%. Therefore, highly cited papers alone strongly impact the weighting of a journal’s JIF, and thus merit a stronger correction to compensate for that illegitimate credit. Antonoyiannakis (2019) showed that in over 200 journals, the 2017 JIF was affected and/or influenced by citations to the most cited paper.

Table 2 A comparison of how the 2018 JIF of Wiley’s The Plant Journal and Elsevier’s Experimental Cell Research could be adjusted to compensate for citations in papers retracted from those journals in 2015 (Voinnet et al. 2003; Yin et al. 2015, respectively), using 2016 and 2017 Clarivate Analytics’ Journal Citation Reports data)

Correction of author-based metrics

The most commonly used ABM for formal academic evaluations and to measure the productivity and impact of a researcher is the h-index, despite its limitations (Hirsch 2005; Alonso et al. 2009; Costas and Bordons 2007; Bornmann and Daniel 2009) and practical risks of its use (Teixeira da Silva and Dobránszki 2018c, d).

The h-index is an indicator whose score indicates that an academic published h papers each of which has been cited at least h times (Hirsch 2005). Citations of retracted papers before or after the actual retraction may result in a biased or skewed h-index score and therefore undeserved rewards, salaries or grants for academics. In some cases, citations to retracted papers can be even higher than before the paper was retracted (Teixeira da Silva and Dobránszki 2017b), such as Fukuhara et al. (2005), which was cited 233 (Wos) or 282 (Scopus) times until retraction in 2007, but has since then been cited 887 (Wos) or 1072 (Scopus) times from 2008 until April 18, 2018.Footnote 10 Therefore, the simplest way to hinder the distortion of the h-index is if citations of retracted papers are selected from valid ones—see possible exceptions in the “limitations” section below—and do not form the part of the evaluation (e.g., job interview), although they should be maintained, but clearly indicated, in a curriculum vitae (Teixeira da Silva and Tsigaris 2018). Invalid citations (either pre- or post-retraction) from all ABMs based on citation count as an indicator of scientific output should be eliminated, e.g., from the different derivatives of the h-index (Alonso et al. 2009).

A relatively new ABM, the Relative Citation Ratio (RCR), provides a score relative to NIH-funded research, giving relative weighting and serving as a measure of relative influence (Hutchins et al. 2016). Using the free online tool iCite,Footnote 11 the RCR for a highly topical medical researcher, Paolo Macchiarini, was calculated as 335.26 for the period 1995–2019. Macchiarini currently (assessed on March 11, 2018) has seven retractions, two expressions of concern and two corrections.Footnote 12 Using a simple user-controlled method to deselect the seven retracted publications, an adjusted RCR of 303.48 was obtained. Although we do not debate the pros and cons of the RCR, it has a few large limitations: (1) papers, including retractions, prior to 1995 cannot be assessed; (2) only PubMed-listed papers are considered, thus skewing the RCR towards medicine and/or PubMed-listed articles; (3) citations prior to or after a retraction has been issued cannot be assessed, and are calculated as a binary 0 or 1. The RCR may have some practical value for assessing funding in a highly competitive field such as biomedical research, and the ability to adjust for retractions thus becomes an important application.

Possible limitations

We wish to point out several possible weaknesses or limitations of our study and/or the potential future use of such corrective measures:

  1. (1)

    As indicated in the introduction, we urge that these corrective factors, either in the form that we have used, or any derivatives thereof, be used cautiously, and responsibly. By caution, we imply that much more testing and wider acceptance by a broad group of academics should first occur before such corrective metrics are applied to a large mass of journals or publishers. If the citation of invalid literature is one day accepted to be an ethical issue, corrective measures for JBMs and ABMs can be adopted by groups with branding potential at the global scale within the publishing workflow, such as COPE (Committee on Publication Ethics), the ICMJE (International Committee of Medical Journal Editors) or WAME (World Association of Medical Editors). By responsibility, we imply that such corrective measures should not be used by science watchdog groups or anti-science lobbyists to shame academics or the publishing-related infrastructure, but rather used as a purely corrective form to correct an already skewed and/or imperfect set of citation-based metrics already in wide use by global academia.

  2. (2)

    A very crucial issue is a debate that decides, preferably by the academic community rather than by ethics policy-makers or for-profit commercial publishers, which citations of the retracted literature constitute“valid” citations, and which citations are invalid. For example, in Teixeira da Silva and Dobránszki (2017b), we discuss a number of highly cited retracted papers in a number of academic disciplines. Among the total citations to those retracted papers, there are several which we believe are valid citations of those papers because they discuss, within a bibliometric context, the wider use of such retracted papers. Therefore, we propose that such bibliometric-based citations to retracted papers be considered as “valid” citations to the retracted literature when calculating different JBMs but never for calculating ABMs. In contrast, a citation to a study that has been proved to be methodologically flawed and was retracted as a result (e.g., Fukuhara et al. 2005), or that may have been retracted due to fraud and/or misconduct, should not be considered as a “valid” citation. Furthermore, we also consider that the citation of the retracted paper by the retraction notice should also be considered an “invalid” citation, and should not be used to calculate any JBM.Footnote 13

  3. (3)

    In some cases, citations are awarded unfairly to papers that should have been retracted, but were not, such as duplicate or near-duplicate publications, in violation of formally stated retraction policies, offering an unfair advantage to the authors of the duplicate paper, and equally unfair advantage to the journal that published the duplicate paper, a term we coined as “citation inflation” (Teixeira da Silva and Dobránszki 2018a). The corrective factors that we propose in this paper are not be applicable to such cases, simply because the duplicate or near-duplicate paper has not been retracted, but they should be applicable.

  4. (4)

    It is plausible that provided there are weaknesses in the system related to corrections, including retractions (Wiedermann 2018), that publishers might not find an adjustment of JBMs and ABMs to be a priority.