Basic independence axioms for the publication-citation system

*aKHBO (Association KU Leuven), Faculty of Engineering Technology, Zeedijk 101, B-8400 Oostende, Belgium bKU Leuven, Dept. Mathematics, Celestijnenlaan 200B, B-3000 Leuven (Heverlee), Belgium cUniversiteit Antwerpen, IBW, Venusstraat 35, B-2000 Antwerpen, Belgium dDept. Information Resources Management, Zhejiang University, Hangzhou 310027, China eSchool of Information Management, Nanjing University, Nanjing 210093, China


INTRODUCTION
In recent years it has come to the attention of informetricians that basic indicators used for research evaluation do not always have the properties one might expect.A wellknown example is the discussion between ratios of averages and averages of ratios (or in other words, between the original Leiden crown indicator and the Karolinska indicator) (Lundberg, 2007;Opthof & Leydesdorff, 2010;Larivière & Gingras, 2011).Even the well-known impact factor may yield some surprises (Rousseau & Leydesdorff, 2011).
In order to solve these problems and to derive meaningful rankings based on indicators Bouyssou and Marchant, as well as Waltman and van Eck initiated a series of studies aiming at clarifying the properties of indicators, see (Marchant, 2009;Bouyssou & Marchant, 2010, 2011a, b;Waltman & van Eck, 2009, 2012).We will not provide all details but just focus on the independence axiom proposed by Bouyssou and Marchant (2011b).This independence axiom for an indicator f (any!) can be formulated as follows.Let f(S) ≤ f(T), where f(S), resp.f(T), denotes the value of the indicator f derived from sets of articles written by scientist S, resp.scientist T. If one adds now to both sets an article with n (a natural number) citations, then the axiom requires that the relation f(S) ≤ f(T) must still hold.Note that the above-mentioned sets of articles can be restricted by a specific publication window, and if the indicator f involves citations, then too this indicator can be restricted by a specific citation window; see (Liang & Rousseau, 2009).
We refer to this requirement as the B-M independence axiom.Bouyssou and Marchant (2011b) show that the h-index, the average number of citations per publication and even the median number of citations do not satisfy the B-M independence axiom.So-called scoring rules (see further for a definition) do.
Clearly the B-M independence axiom is a reasonable requirement.Yet, this requirement is not how authors' article sets grow.The aim of this article is to introduce independence axioms that are, in a sense, more basic (more natural) than Bouyssou and Marchant's.We investigate which indicators satisfy the new independence axioms.
Notation.If scientist S has published three articles that are, respectively, cited two times, once and four times then we denote this in the following ranked order: S = [4, 2, 1].

Basic Steps and Basic Independence Axioms
We first define two types of basic steps in a scientist's career:

Definition: two basic steps
Step (P): A basic publication step (P) occurs if a scientist publishes a new article, with no citations.
Step (C): A basic citation step (C) occurs if a scientist receives one citation to an already existing publication.
One may say that at any moment in a scientist's career nothing happens in terms of publications and citations, or one of these basic steps occurs.

The basic independence axiom for indicator f
If f(S) ≤ f(T) and the same type of basic step occurs to these two scientists then still f(S) ≤ f(T).
Note that this axiom is not a weak version of the B-M independence axiom.In the B-M independence axiom one always adds a new publication.When considering basic steps one may add a publication, but then this publication must have no citations.It is however also possible to add a citation and no new publication, which is not considered in the B-M axiom.
Before we turn our attention to the basic independence axiom we study the outcome of a basic step on some indicators, namely the total number of publications (PUB), the total number of citations, counted as whole numbers (CIT), the h-index (h) (Hirsch, 2005), the average number of citations per publication (AVG) (this includes the journal impact factor), the median number of citations (MED), the number of citations received by the mostcited article (TOP), and the number of citations received by the top 3 most-cited articles (TOP3).If a scientist has less than three publications one counts the number of citations received by the articles he/she did publish.Of course one can also consider TOP5, TOP10 and so on (these will not be discussed as results are essentially the same as for TOP3).
There is, however, an exceptional case we have to deal with first.If a scientist has no publications then PUB = 0, CIT = 0 and all other indicators are undefined.If this scientist now publishes one article (step (P)) then PUB = 1, while CIT, h, AVG, MED, TOP and TOP3 are all equal to zero.For such a scientist step (C) cannot occur.From now on we always assume that a scientist has at least one publication.Now we discuss the influence of the two basic steps on scientists who have at least one publication.

1) A basic publication step (P)
a) The total number of publications (PUB).Its influence is that PUB increases by one.b) The total number of citations (CIT).Its influence is that CIT stays the same.c) The h-index.Adding one uncited publication never changes the h-index.d) The average number of citations per publication.If the average was AVG = CIT/PUB, then this average will decrease and become CIT/(PUB + 1), unless CIT = 0, in which case AVG stays zero.e) The median number of citations (MED).
We recall that the median of a finite sequence (x 1 , x 2 , …, x n ), ranked in decreasing order, is defined as: As step (P) does not add any citation, the corresponding article can be placed at the end of the row.Hence, if n was odd, the length of the sequence is now even and MED becomes . If n was even, the length of the sequence is now odd and MED Becomes x n f) The highest number of received citations (TOP).This indicator never changes by a basic step (P).g) The number of citations received by the three mostcited articles (TOP3).This indicator never changes by a basic step (P).

2) A basic citation step (C)
a) The total number of publications (PUB) stays the same.b) The total number of citations (CIT) increases by one.c) The h-index.The h-index may stay the same, but it is possible that h increases by 1.Consider S = [2, 1, 0] with h-index equal to 1.If the second article receives one more citation then the h-index becomes 2. If, however, the first or the last article receives one more citation then the h-index stays equal to 1. d) The average number of citations per publication (AVG).
e) The median number of citations (MED).
An article that receives one more citation may have no influence on the median, may increase the median by 0.5, or may increase the median by 1.We provide examples.If S = [2, 1, 0], with MED(S) = 1, and if the first article receives one more citation then the median stays the same.If the second article receives one more citation then the median becomes 2 (an increase by 1).If T = [2, 1, 1, 0], with MED(T) = 1, and if the second article receives one more citation then MED(T) becomes 1.5, an increase by 0.5.f) The highest number of received citations (TOP).This indicator stays the same, unless it happens to be the most-cited article that receives the extra citation.In that case this indicator increases by one.g) TOP3.Again this indicator stays the same unless one of the three most-cited articles receives the extra citation.
In that case TOP3 increases by one.
A Discussion of the Basic Independence Axiom and the Indicators PUB, CIT, AVG, MED, TOP and TOP3 1) The total number of publications (PUB) If PUB(S) = PUB(T) and step (P) occurs for S as well as for T, then trivially PUB(S) is still equal to PUB(T).Similarly, if PUB(S) < PUB(T), then this inequality stays the same after a (P)-step.
Step (C) has no influence on the number of publications, hence this requirement is always satisfied for PUB.

2) The total number of citations (CIT)
If CIT(S) = CIT(T) or CIT(S) < CIT(T) and step (P) occurs then, trivially, the respective total number of citations stay the same and hence their equality or inequality.
Similarly, a step (C) increases the total number of citations by one and hence the equality or inequality between the total number of citations of scientists S and T stays the same.

5) The number of citations received by the most-cited article
If TOP(S) = TOP(T) and step (P) occurs then TOP(S) is still equal to TOP(T).The same is trivially true if TOP(S) < TOP(T).
If TOP(S) = TOP(T) and step (C) occurs then it is possible that TOP(S) > TOP(T), namely if the most-cited article of T receives one more citation, while this is not the case for the most-cited article of S. If TOP(S) < TOP(T) then it is only possible that TOP(S) = TOP(T), namely if TOP(S) was equal to TOP(T)-1, and the mostcited article of S receives the extra citation, while this is not the case for the most-cited article of T. In any case, if TOP(S) < TOP(T) there is no violation of the basic independence axiom.
If TOP3(S) = TOP3(T) and step (C) occurs then it is possible that TOP3(S) > TOP3(T), namely if one of the three most-cited article of T receives one more citation, while this is not the case for any of the three most-cited article of S. If TOP3(S) < TOP3(T) then step (C) leads either to either to TOP3(S) = TOP3(T), or the inequality TOP3(S) < TOP3(T) stays true.In either case there is no violation of the basic independence axiom.
Because we would like to have that TOP and TOP3 satisfy our basic independence axiom (or at least some version of it) we consider weaker versions.

Two Weaker Axioms
For the reason mentioned above, it might be useful to weaken the axiom related to (C).We propose two weaker forms: Axiom WCR: a weak (C) rank form.If f(S) ≤ f(T) and either a new, uncited article, is added to S and T, or a citation is given to two articles on the same rank (where for this axiom, publications with the same number of citations are considered to have the same rank), then still f(S) ≤ f(T).
Axiom WCS: a weak (C) size form.If f(S) ≤ f(T) and either a new, uncited article, is added to S and T, or a citation is given to two articles with the same number of citations, then still f(S) ≤ f(T).
Clearly if an indicator f satisfies the basic independence axiom then it also satisfied the weaker forms, but the opposite is not true.
The indicators TOP and TOP3 do not satisfy the (C)-part of the basic independence axiom but TOP does satisfy the two weaker axioms.Indeed, TOP3 does not satisfy WCR but trivially satisfies WCS.The following example shows that TOP3 does not satisfy WCR.Let S = [6, 5, 1, 0] and T = [5,4,3,3] then TOP3(S) = TOP3(T).Adding one citation to the 4th ranked article yields: TOP3(S) = 12 and TOP3(T) = 13, or TOP3(T) > TOP3(S), contradicting the WCR requirement.We also note that if S and T collaborated on the article that received one extra citation then we are automatically in the case that the size form of the weak axiom applies.
As the requirement related to the (P)-part has not changed every indicator that fails because of the (P) part of the basic independence axiom also fails the weaker axioms.This is the case for AVG and MED.

A Discussion of the Basic Independence Axiom and the h-index
In order to discuss this aspect we introduce the notion of an h-critical publication.

Definition: an h-critical publication
A publication is an h-critical publication if it is such that by receiving one more citation the h-index increases.
Of course, this increase in h-index is automatically by one.Hence, an h-critical publication always has h citations.In reality one might expect more h-critical publications for junior scientists than for senior ones, as in general young scientists may have many articles with similarly low numbers of citations.Senior scientists will probably have less often h-critical publications in their publication set.

Proposition
An actor's publication list (with h-index h) has h-critical publications if and only if the following two requirements are satisfied: (1) There do not exist articles in the h-core with h citations; (2) there exists an article in the h-tail with h citations.
Proof.Indeed, we first note that h-critical publications never belong to the h-core.If the article ranked h+1 has h citations and if it happens to be the article that receives an extra citation, then it enters the h-core.Yet, if the article ranked h has only h citations then it drops from the h-core and the h-index stays the same.Only if the article ranked h has at least h+1 citations the h-index will increase by one.Further, if the article ranked h+1 has strictly less than h citations it can never enter the h-core and the h-index can never increase to h+1.