Bounds and Inequalities Relating h-Index, g-Index, e-Index and Generalized Impact Factor

Finding relationships among different indices such as h-index, g-index, e-index, and generalized impact factor is a challenging task. In this paper, we describe some bounds and inequalities relating h-index, g-index, e-index, and generalized impact factor. We derive the bounds and inequalities relating these indexing parameters from their basic definitions and without assuming any continuous model to be followed by any of them.


Introduction
A lot of research is carried out by people working in different areas. Sometimes, one needs to evaluate the quality of the research produced by individual authors or groups of authors. The quality of research produced by authors is, generally, evaluated in terms a ranking parameter which is, generally, based on the number of citations received by the papers produced by the authors. There are many types of ranking parameters presented in the literature for evaluating the quality of research such as h-index [1], g-index [2], eindex [3], and impact factor [4]. The impact factor in the long term becomes the average number of citations per paper. This long term impact factor is termed as the generalized impact factor.
While one has computed an index for evaluating the quality of research, one would like to get an indication about the other types of indices. To have such an indication, one needs to know how an index is related to other indices. The relationships among h-index, g-index, and e-index are described in [5]. However, in [5], the indices are assumed to follow a continuous distribution. A relation between h-index and impact factor is described in [6] using a power law model called the Lotka's model.
In this paper, we describe the bounds for the h-index and gindex in terms of the indices and the generalized impact factor. We derive these bounds from the very basic definitions of the indices and the generalized impact factor without assuming any model or any continuous distribution to be followed by any of these indices. We verify the theorems for citation records of five Price Medalists.
Also, we compare the values of h-index with those obtained using Schubert-Glanzel formula and Egghe-Liang-Rousseau's power law model. Further, we discuss the tightness of the upper bound on g-index for Price-Medalists.
In what follows, we present an analysis of the indices and the generalized impact factor.

Analysis
In this section, we wish to analyze the relationships among the indices and the generalized impact factor. To do so, we first present an overview of the indices and the generalized impact factor, and then we shall analyze the relationships among them.

Overview of Indices and Impact Factor
In this subsection, we briefly define the generalized impact factor and different types of indices.
The h-Index. Suppose the papers are arranged in descending order of the number of citations. Let c i be the number of citations of a paper numbered i. The h-index [1], when papers are arranged in descending number of their citations, can be defined as follows.
By definition, h-index is the largest number, h, such that the papers arranged in their decreasing order of citations have at least h number of citations.
The g-Index. According to the definition of g-index, if the papers are arranged in the descending order of their number of citations, g is the largest number such that the summation of the number of citations is at least g 2 . In other words, when papers are arranged in descending order of their citations, g-index can be defined as follows.
Note that g-index is the largest number i such that P i c i §i 2 . The e-Index. The e-index is defined in [3] to serve as a complement for the h-index. The definition of e-index is as follows.
Alternatively, (3) can be written as follows.
Remark: In the definitions of h-index (as given by (1)) and that of gindex (as given by (2)), we have intentionally ignored the time T at which we are considering their values. This is done to keep their definitions simple, and defining so there is no loss of generality as far as the discussion in this work is concerned. For precise definitions of the indices incorporating the time, one is referred to [7]. The same is true for the e-index. Secondly, while defining the indices and the impact factor, we assume that the number of papers, P §1, and the numbers of citations received by ith paper, c i §1. This is also true for the theorems proved in this paper. Generalized Impact Factor. Let c i §1 be the number of citations of the paper numbered i, and let P §1 be the number of papers. The generalized impact factor is defined as follows.
Note that the generalized impact factor is simply called impact factor in [6]. We have added the prefix ''generalized'' to differentiate it from the impact factor that uses a time window constraint. Actually, the impact factor given by (5) (and also that given in [6]) denotes the average number of citations received per paper.

Analysis of Relationships
In this subsection, we describe how indices and generalized impact factor are related to one another.
Impact Factor, h-Index and e-Index. We state the following theorem that relates these parameters.
Theorem 1 Let P §1 be the number of papers and let c i §1 be the numbers of citations received by ith paper. The h-index, e-index and impact factor are related by the following inequality.
Proof. Using (5), the total number of citations can be written as follows.
The citations appearing in the L.H.S. of (7) can be broken into two parts, one from 1 to h and the other from hz1 to P, as given below.
Using (4) and (8), we have, Now, we have, c hz1 ƒh c hz2 ƒh :::ƒ::: Therefore, we have, Using (9) and (11), we have, In other words, we have, Since h is a whole number, therefore, we can write, In other words, we can say that where, V denotes the lower bound. For definitions of different types of bounds, we refer the readers to [8].
The g-Index, h-Index, and e-Index. We state the following theorem that provides an inequality relating these indices.
Theorem 2 The h-index, g-index, and e-index are related by the following inequality.
h §tg{ e 2 g s: Proof. Let the the papers are arranged in the descending order of their citations. From the definition of g-index, as given in (2), we have, At i~g, we have, Breaking the number of citations in the L.H.S. of (17) into parts, we have, Using (4) and (18), we have, In other words, Now, we have, c hz1 ƒh c hz2 ƒh :::ƒ::: Therefore, we have, Using (20) and (22), we have, Or, Rearranging (24), we have, Since all these indices, h, g, and e are integers, therefore, (25) can be written as follows.
h §tg{ e 2 g s: In other words, Theorem 2 provides a lower bound for h-index in terms of the g-index and the e-index.
h~V tg{ e 2 g s : We have the following lemma that provides a bound for the gindex.
Lemma 1. An upper bound for g-index is as follows.
Proof. From (20), we have, In (21), if we put g at the R.H.S. for hz1ƒiƒg, c i ƒg, we get, Therefore, from (20), we have, Or, Or, This gives us, Again, all these indices are whole numbers, therefore, we can write, Alternatively, g~O thz e 2 h s : We now prove another theorem that provides an upper bound for the g-index in terms of h-index and e-index. Theorem 3. An upper bound for g-index in terms of h-index and eindex is as follows.
Proof. Using (24), we have, This resembles to the quadratic equation ax 2 zbxzc~0, whose roots are as follows.
The h-Index, g-Index, and Impact Factor. We state the following theorem that relates these parameters.
Theorem 4. The generalized impact factor, g-index, and h-index are related as per the following inequality.
Proof. From (5), we have, Breaking the number of citations in the L.H.S. of (42), we have, Now, we have, c gz1 ƒh c gz2 ƒh :::ƒ::: Therefore, we have, Using (43) and (44), we have, Or, In other words, Theorem 4 states an upper bound for the generalized impact factor which is as follows.
Utility of Bounds. We wish to point out that lower and upper bounds are very common in the area of Computer Science and Engineering. They are useful when either one cannot find exact expressions or it is difficult to derive the exact expressions. Using the bounds, one can say that the parameter lies above it (for a lower bound) or below it (for an upper bound). To the best of our knowledge, the exact relationships among the h-index, g-index, eindex, and impact factor have not been described by any researcher till date. In the absence of such exact expressions, we suggest to use the lower and upper bounds, and it forms the motivation behind the derivation of bounds and inequalities presented in this paper. In our view, one can realize where the value of an indexing parameter lies given another set of parameter(s) without going through the whole citation database (of an author, a journal, an institution, a country or a region).

Existing Relationship Models
In this subsection, we briefly describe the existing models that relate some of the indices.
Schubert-Glanzel Formula. Let P be the number of papers referenced and C be the number of citations. According to Schubert-Glanzel model [9], the h index is given by the following expression. h!C where, c is a proportionality constant. Another form of Schubert-Glanzel formula is, which is equivalent to that given by (48), however, (49) is in terms of the generalized impact factor. The major drawback of Schubert-Glanzel formula is that it does not say anything about the value of the proportionality constant. In [10], the proportionality constant c is assumed to be 0:9 for journals and 1 for other sources. In the absence of a specific value of the proportionality constant, we assume it to be equal to 1.
Egghe-Liang-Rousseau Model. A relationship between hindex and generalized impact factor, I f , is presented by Egghe, Liang and Rousseau in [6], which is based on power law model and is as follows.
Since h-index is an integer, therefore, it is better to consider the ceiling of the R.H.S. of (50). In [6], it has been argued that when I f tends to ?, h tends to ffiffiffiffi C p . In what follows, we verify the theorems and lemma proved in the previous section and compare them with the existing models.

Results and Discussion
In this section, we first verify our theorems using citation data for a set of scientists, for example, a set of five Price Medalists, and then compare them with the existing models. We collected the citation data for the given set of Price Medalists using scHolar index [11], which is based on Google Scholar. The numbers of citations of each referenced paper of Price Medalists are given in Medalist S1, S2, S3, S4, and S5. Table 1 shows the number of citations (C), the number of papers referenced (P), h-index, g-index, and generalized impact factor (I f ) for Price Medalists as per the citation data given in Medalist S1, S2, S3, S4, and S5. The values of h-index, g-index, and generalized impact factor shown in Table 1 are the actual values. In what follows, we verify the theorems for Price Medalists. Table 2 shows a verification of Theorems for Price Medalists. The first row of the table shows the statements of each theorem and lemma. The symbol H under the bound shows that the given theorem is verified. For example, consider Medalist S1 for whom P h i c i~8 567, and therefore, e 2~6 542. Theorem 1 gives h §11, and the value of h-index for Medalist S1 is 45. Since 11ƒ45, therefore, Theorem 1 is verified. Theorem 2 gives h §36, which is less than 45, therefore, Theorem 2 also is verified. Lemma 1 gives gƒ191, and the value of g-index for Medalist S1 is 101. Since 101 is less than 191, therefore, Lemma 1 is verified. Theorem 3 gives gƒ126, and since 101 is less than 126 therefore, Theorem 3 is verified. For verification of Theorem 4, we have, P h i~1 c i~8 567, and P g i~hz1 c i~1 521. Therefore, P g i c i~P h i~1 c i z P g hz1 c i8 567z1521~10088. Theorem 4 gives I f ƒ55:66, and since I f for Medalist S1 is 24:37, which is smaller than 55:66, therefore, Theorem 4 is verified. Similarly, we can verify the theorems and lemma proved in this paper for other Price Medalists. The supplement data in terms of the values of intermediate parameters needed to verify the theorems and lemma is shown in Table 3.

Tightness of Bounds
Note that there are two lower bounds for h-index, the one given by Theorem 1 and the other given by Theorem 2. Using Table 2, we see that the lower bound on h-index given by Theorem 2 is closer to the actual values as compared to that given by Theorem 1. Similarly, there are two upper bounds for g-index, the one given by Lemma 1 and the other given by Theorem 3. We observe from Table 2 that the upper bound on g-index given by Theorem 3 is closer to the actual values of g-index as compared to those given by Lemma 1. In other words, the bounds given by Theorem 2 and Theorem 3 are more tight as compared to those given by Theorem 1 and Lemma 1, respectively. Table 4 shows the actual values of g-index and the values of gindex obtained using Theorem 3. Also, we computed the errors in the values given by Theorem 3 as compared to the actual values of g-index for Price Medalists. We observe that the upper bound on the g-index given by Theorem 3 is reasonably tight.

Improvements over Schubert-Glanzel and Egghe-Liang-Rousseau Models
We computed the h-index using Theorem 2. Also, we computed the values of h-index for Price Medalists using Schubert-Glanzel formula given by (48) and using Egghe-Liang-Roussea's power law model given by (50). Note that the values of h-index using any of these three models are approximate values. To study closeness of these approximate values to the exact values, we computed the percentage errors in the approximate values of h-index with respect to the exact values, which are shown in Table 5. We observe that the percentage error in case of the values obtained using Theorem 2 is significantly less as compared to those obtained using either Schubert-Glanzel formula or Egghe-Liang-Rousseau power law model. For example, for Medalist S1, the exact value of h-index is 45, the lower bound given by Theorem 2 is 36. The values of h-index obtained using Schubert-Glanzel formula is 68 and that obtained using Egghe-Liang-Rousseau's power law model is 100. The error using Theorem 2 is 20% and the error in the value obtained using Schubert-Glanzel formula is 51%. The error in the value of h-index using Egghe-Liang-Rousseau's model is 122:22%. Similarly, one can see from Table 5 that Theorem 2 provides a significant improvement over both Schubert-Glanzel formula and Egghe-Liang-Roussea's power law model.

Conclusion
Finding the relationships among indexing parameters for determining the quality of research is a challenging task. In this paper, we described some inequalities relating h-index, g-index, eindex, and generalized impact factor. We derived the inequalities from the very basic definitions of these indexing parameters and without assuming any continuous model to be followed by any of them. However, the relationships in the form of bounds and inequalities among the indices are not trivial, and to the best of our knowledge, we are the first ones to present such kinds of relationships.
We verified the theorems and lemma presented in this paper for citation records of Price Medalists. We observed that the lower bound on h-index given by Theorem 2 is more tight as compared to that given by Theorem 1. The upper bound on g-index given by Theorem 3 is more tight as compared to that given by Lemma 1.
We compared the values of h-index obtained using Theorem 2 with the values of h-index obtained using either Schubert-Glanzel formula or Egghe-Liang-Rousseau model. We observed that the values of h-index obtained using Theorem 2 are significantly closer to the exact values as compared to those obtained using either Schubert-Glanzel formula or Egghe-Liang-Rousseau's power law model. This enables us to conclude that Theorem 2 provides significant improvements over both Schubert-Glanzel formula as well as Egghe-Liang-Rousseau's model.
Further, we computed the upper bound given by Theorem 3 which states that gƒ(hze), where e denotes the e-index. We observed that the upper bound on g-index given by Theorem 3 is reasonably tight for the given citation record of Price Medalists. In Table 4. Errors in the g-index using Theorem 3 for the given set of Price Medalists.  future, one may propose more tight bounds for either h-index or gindex.

Supporting Information
Medalist S1 Citation data for Price Medalist 1 using scHolar index [11], which is based on Google Scholar. Includes the numbers of citations of each referenced paper of Price Medalist 1.

(DOC)
Medalist S2 Citation data for Price Medalist 2 using scHolar index [11], which is based on Google Scholar. Includes the numbers of citations of each referenced paper of Price Medalist 2. (DOC) Medalist S3 Citation data for Price Medalist 3 using scHolar index [11], which is based on Google Scholar. Includes the numbers of citations of each referenced paper of Price Medalist 3.

(DOC)
Medalist S4 Citation data for Price Medalist 4 using scHolar index [11], which is based on Google Scholar. Includes the numbers of citations of each referenced paper of Price Medalist 4.

(DOC)
Medalist S5 Citation data for Price Medalist 5 using scHolar index [11], which is based on Google Scholar. Includes the numbers of citations of each referenced paper of Price Medalist 5. (DOC)