The world of science according to performance indicators based on percentile ranking normalization

Academic research assessment is mainly based on using direct publication and citation data from aggregators like Web of Science, Scopus or Google Scholar, etc. If only quantity comparisons are required, it is a simple matter to add the numbers of publications; the whole is the sum of the parts, but with one caveat. Most publications are now multi-authored, and so each publication must be fractionally attributed to countries based on institutional authorship so that the whole becomes simply the sum of the fractional parts. However, when the quality angle has to be factored in, the number of citations each publication has earned was taken as a proxy for the quality. But citation practices differ enormously across disciplines. Thus if a portfolio comprises publications from various disciplines, it becomes difficult to compare the integrated research performance of large groups unless an appropriate aggregation of the product of normalized quality and quantity of various components are used. The challenges are: (i) a suitable protocol for normalizing for quality (ii) a suitable protocol for identifying which integrated or aggregated product of quantity and quality to use.


INTRODUCTION
Academic research assessment is mainly based on using direct publication and citation data from aggregators like Web of Science, Scopus or Google Scholar, etc.If only quantity comparisons are required, it is a simple matter to add the numbers of publications; the whole is the sum of the parts, but with one caveat.Most publications are now multi-authored, and so each publication must be fractionally attributed to countries based on institutional authorship so that the whole becomes simply the sum of the fractional parts.However, when the quality angle has to be factored in, the number of citations each publication has earned was taken as a proxy for the quality.But citation practices differ enormously across disciplines.Thus if a portfolio comprises publications from various disciplines, it becomes difficult to compare the integrated research performance of large groups unless an appropriate aggregation of the product of normalized quality and quantity of various components are used.The challenges are: (i) a suitable protocol for normalizing for quality (ii) a suitable protocol for identifying which integrated or aggregated product of quantity and quality to use.
In this paper, we use data in six percentile rank classes as categorised in the recently issued report of the National Science Board of the United States (Science and Engineering Indicators 2012).The Science and Engineering Indicators (SEI) uses 13 broad fields of PatentBoards/ NSF for the normalization.This may not be the best protocol for normalization of bibliometric information for differences in citation practices between fields.Indeed, SEI's use of percentiles further enhances the differences in citation practices between fields.The SEI classes are created over the total set of papers without any field differentiation and this could mean that those fields where intense citation practices are common will be present in the top classes while other fields like mathematics are in the lower end of the percentile rank class system.Also, by applying different weighting terms (6 for the highest, 1 for the lowest class) the effect of the field differentiation is intensified.However, the present paper is not about the delineation of reference sets, but about the statistics to be used after this is done.For this purpose, we combine the percentile ranking normalization scheme proposed by Bornmann and Mutz (2011) and the understanding in terms of higher-order performance indicators like quasity and exergy (Prathap 2011a,b) to assess the progress of academic research performance of the world over the decade from 2000 to 2010.

THE QUALITY NORMALIZATION CHALLENGE
The need for some form of normalization for quality has been recognised for a long time and many competing schemes are currently available, using different methodologies for normalizing citation scores (see for example the concise list of 'bibliometric indicators ', Karolinska Institutet, 2007).An influential contribution in evaluative bibliometric practice for the normalization mechanism was the so called crown indicator (originally introduced by the Centre for Science and Technology Studies at Leiden (Moed et al. 1995, Mode 2010), and hence known as the CWTS approach).
The "crown indicator" itself is a variation of Schubert & Braun's (1986)  In this paper, we shall restrict attention to the percentile rank normalization scheme (Leydesdorff & Bornmann 2011).
The major advantage of this measure is that non-parametric statistics can be applied.Here, the shape of the "citation curve" is the basis for normalization for quality -i.e.nonparametric statistics using the percentiles of the distribution as the basis for assigning quality values.When normalization is introduced using the percentile rank approach, the quality values are modified taking into account the shape of the underlying distributions of citations ("the citation curves").This can be implemented following Leydesdorff, Bornmann, Mutz, & Opthof (2011) as quality values attached to percentile-rank class.The quality along the skewed citation curve is first normalized in terms of percentiles.The results can be aggregated in terms of six percentile-rank classes, as shown in this paper, but the more general case is normalization in terms of quantiles as a continuous variable.

AGGREGATED INDICATORS FOR PERFORMANCE ASSESSMENT
However, once quality is assigned, there is still the question of how best to measure performance.Here, the choice is to go beyond simple quantity (the number of papers published) to aggregated measures like quasity and energy/exergy (Prathap 2011a(Prathap ,b,2012) ) or for a measure like the Integrated Impact Indicator (I3) proposed by Leydesdorff & Bornmann (2011).Tables 1 and 2 provide an overview of how indicators of various orders can be generated from the generalized quality-quantity parameter space.
Table 1 deals specifically with the bibliometrics problem.Where P is the number of articles published and C is the number of citations received by these P articles, the impact i = C/P becomes the simplest measure of the overall quality of the portfolio.If P is viewed as a quantity or size indicator, we see from Table 1 that in parameter space terms, P is a zeroth-order indicator.C is then a firstorder indicator.Prathap (2011bPrathap ( ,2012) ) has shown that the product of C and C/P is an energy like term called the exergy X.This can also be expressed as X = i 2 P, and is thus a second-order term.
Table 2 is a generalization of Table 1, in that it can be used in any context where performance is to be rated.
Where Q is a generalized quantity indicator and q is a generalized quality indicator, the product qQ is called the quasity term and is the first order indicator of performance.
The second order indicator is therefore q 2 Q.
One can therefore think of the aggregated indicators as belonging to the first-order and second-order moments of quantity as shown in the Tables.Leydesdorff & Bornmann (2011) introduced a scalar measure, the Integrated Impact Indicator (I3), which in the terminology of Prathap (2011a,b), is a normalized quasity term.We shall see later that I3 is a first order indicator, like quasity, whereas energy or exergy are second order indicators.Note that although I3 is initially defined as ∑q(i)P(i), from this generalised quality and quantity indicators of  the form qQ can be obtained.In this paradigm, the raw quantity value becomes a zeroth order performance indicator.The quality profile is generalized from the citation curve using the analogy shown in Table 2, and a scalar sum of quasity or first-order indicator can be obtained.
By continuing the same operations further, one can use a thermodynamic metaphor and get to the "energy" terms (when differentiated at the country and percentile class levels) and the "exergy" terms (now at the system level, that is, in this case, the country or region level) based on the same quality classes.
A further clarification of the thermodynamic analogy based on the zeroth, first and second-order indicators, may be useful here.Note that the traditional bibliometric zeroth and first order indicators are the numbers of papers and citations.Because of the "normalization" implied by the use of percentile rank classes, the first and second order indicators are integrated values of impact based on the quality values assigned to each class.

THE RESEARCH PERFORMANCE OF THE WORLD ACCORDING TO SCIENCE AND ENGINEERING INDICATORS 2012
Every two years, by Presidential mandate, the National Science Board (2012) of the United States releases their Science and Engineering Indicators.The latest release can therefore be accepted as the most authoritative assessment of the overall research performance as well as preparedness for research of the country.To complete the assessment, comprehensive benchmarks are made against indicators from leading countries (like China and Japan), leading regions like Asia-8 (comprising India, Indonesia, Malaysia, Philippines, Singapore, South Korea, Taiwan, Thailand) and the European Union (comprising now 27 member countries).
A key feature of the comparative assessment is the adoption of percentile class ranking by SEI in their recent reports.The share of a country's articles that are highly cited is taken as a proxy for academic research performance.The share is dispersed across six percentile classes.In the words of SEI 2012, "a country whose global research influence was high would have higher proportions of articles in higher citation percentiles, whereas a country whose influence was low would have greater proportions of articles in lower citation percentiles.In other words, a country whose research is highly influential would have higher shares of its articles in higher citation percentiles."3.As a fractionalcount basis has been used, i.e., for articles with collaborating institutions from multiple countries/regions, each country/region receives fractional credit on the basis of proportion of its participating institutions, the fractional numbers shown in Table 4 are accepted as being meaningful, but as seen, the whole will be obtained from the sum of the parts (i.e.completeness and consistency are maintained within the round-off errors that are part of this accounting process).We can think of this table as displaying the zeroth order performance indicator, i.e. the number of S&E articles counted on a fractional basis from 2000 and 2010.Thus, from 2000 to 2010, the number of articles world-wide has grown by 29.1% (see Table 7), the US share has grown only modestly (10.8%).Both China, and Asia-8 show triple digit progress.
To go from here to the first order performance indicator, which is a product of quality and quantity, there is a need to assign quality values to each percentile class.Following Leydesdorff et al. (2011), quality values are assigned as shown in the last column of Table 5.The product of quantity and quality gives the quasity values (Prathap 2011a,b), where "quasity" is the first order performance indicator.It is assumed here that quasity values can be added up across percentile impact classes as shown in Table 5.The result is that we have exactly what Leydesdorff and Bornmann (2011) proposed as their Integrated Impact Indicator (I3) on the basis of six percentile rank classes.Curiously, the ratio of the Integrated Impact Indicator to the number of articles (i.e. the ratio of quasity to quantity) can be viewed as the average impact or proxy for quality.These values of "average" are obtained, not as a statistical mean, but can be considered as a thermodynamic statement of both conservation of momentum (or in the present context, conservation of quasity) of the system and conservation of mass.Going ahead to Table 7, we can see some interesting insights.From 2000 to 2010, the "average" quality of world output has remained nearly the same (1.74 on the scale 1-6).There has been a mild drop (-2%) in the quality of US academic research.On this measure, both China and Asia-8 have shown two digit increases in percentage terms.
The next stage of evaluation follows in an identical fashion: the values in Table 5 are multiplied by the quality value in the last column once more.The energy values for each percentile class for each country/region are obtained as the product of the quality (values ranging from 1 to 6) and the quasity as shown in Table 6.Energy is the first of the second-order indicators (Prathap 2011a(Prathap ,b,2012) ) and as a scalar term, can be added.There is yet another second-order term; this is Exergy, also a scalar term (Prathap 2011a(Prathap ,b,2012)).Exergy is meaningful at the system level (in this case for the country or region level), and is simply the product of average quality and the total quasity for the system.Thus, apart from the average quality term, which links quantity to quasity to exergy at the system level, there is the possibility of another higher order measure of quality.The square root of the ratio of total energy to quantity (that is, the number of publications) can be viewed as a Root Mean Square quality proxy (hence RMS quality).From the thermodynamic perspective, this conserves energy and mass at the same time.By this quality measure, Japan, the US and the World overall are in decline and the EU barely stays ahead.China and Asia-8 are improving their research ecosystems at a relatively remarkable pace.These trends are captured in Table 7.

CONCLUDING REMARKS
We use data from the recently issued report of the National Science Board of the United States (Science and Engineering Indicators 2012) and combine the percentile ranking normalization scheme proposed by Leydesdorff and Bornmann (2011) and the understanding in terms of higher order performance indicators like quasity and exergy (Prathap 2011a,b) to assess the progress of academic research performance of the world over the decade from 2000 to 2010.Both first-order and secondorder indicators show a decline of the US and Japan and an increase in impact of China and Asia-8.The EU can be considered as marginally increasing its impact at the aggregated level of all fields.
relationship linking relative citation rate to mean observed citation rate and mean expected citation rate, namely, RCR = MOCR/MECR.Development of indicators that can capture the latent parameter accurately, the problem of inter-relation between the indicators, and the latent variables remains a central one in scientometrics.There has been a long tradition of research in this in bibliometrics and has contributed to novel understanding and refinement of indicators.The recent issue of the journal Scientometrics Volume 92, Issue 2, August 2012 provides a good account of the current discourse/debate in this field.

Table 3
is an extract from the original appendix table 5-44 showing citation percentiles for 2000 and 2010 by field for the top five S&E article-producing countries/regions, dispersed in this six-class format (Available at http:// www.nsf.gov/statistics/seind12/appendix.htm#c5).To this, we have added one more group, called Rest of the World, to account for the remaining regions of the world.

Downloaded free from http://www.jscires.org on Friday, May 16, 2014, IP: 145.18.109.185] || Click here to download free Android application for this journal
Table4reconstructs the number of articles in a country or region from the percentages in Table