Modelo de estimación de los indicadores del Academic Ranking of World Universities (Shanghai Ranking) ; Model for estimating Academic Ranking of World Universities (Shanghai Ranking) scores

The interest in global university rankings has grown significantly in the last 10 years. The use of just a handful of indicators, the ease of interpretation of the information they contain and the furtherance of inter-university competition and comparability are some of the factors that have popularised their implementation. However, at the same time their critics have identified some conceptual, technical and methodological issues. This paper addresses three such issues that have prompted intense methodological debate around university rankings: replicability of results, relevance of indicators and data retrieval. It also proposes a tool for estimating the scores for the two indicators of the greatest interest for most universities (Papers published in Nature or Science and Papers listed in the WoS). It reports on an alternative method developed to calculate any university’s score in the two most significant Shanghai ranking indicators. One of the foremost features of the proposed method is that the inputs needed are readily available to policymakers, academic authorities, students and other stakeholders and can be applied directly. Furthermore, with this model, scores can also be estimated for universities not listed among the first 500 in the Shanghai ranking.


INTRODUCTION
The interest in global university rankings has risen exponentially in recent years (Fowles et al., 2016;Gonzalez-Riano et al., 2014;Safón, 2013).Their popularity is largely due to the fact that they furnish readily understandable, simple and synthetic information, favouring comparisons among intra-or cross-country higher education institutions (Hazelkorn, 2014).This approach has also been criticised, however, for it assumes that academic performance can be assessed with the same simplicity as football teams (van Raan, 2005a).For other authors (Pusser and Marginson, 2013), rankings may be selective, for they '...serve as a particularly useful lens for studying power in higher education, as they are used to confer prestige, in the allocation of resources, as a form of agenda setting, as a means of stratifying national higher education systems...' The appearance of international university rankings may be said to have had beneficial implications for higher education systems, for it has required universities to adapt to a reality fully assumed by other social sectors, namely the need to survive in increasingly complex and competitive global environments.Regarding this, Andersson and Mayer (2017) have pointed out that rankings can provide a "picture of the university concerned combining a variety of pieces of information and, before rankings, there was no measure or semiquantitative comparison available and there was an opacity in the system".Nonetheless, the harsh criticism and heated debate to which rankings are subject must not be overlooked.The most prominent beneficial factors include increasing competition among universities and a more attentive focus on measuring higher education institutions' performance for assessment (Rauhvargers, 2013;Sanz-Casado, 2015;Olcay & Bulu, 2016).Some of the issues addressed by critics are conceptual, questioning, among others, the capacity of such tools to measure and rank very heterogeneous universities with the same parameters (Rauhvargers, 2011;Andersson & Mayer, 2017), the trend to assess all universities to the same criteria as the top 500 (Rauhvargers, 2011), to rank institutions by size (van Raan, 2005a;Docampo and Cram, 2015;Docampo et al., 2015), favouring those heavy on research (Rauhvargers, 2011) or to prioritise English-speaking establishments (Marginson & van der Wende, 2007;van Raan et al., 2011;Rauhvargers, 2011).In addition, the absence of teaching measures and the lack of recognition to the local or national contribution of the universities have been a shortcoming recognized by international ranking organizations ever since (O'Leary, 2017).
Another conceptual problem is the inequality among different subject areas.Ways of organizing researchers and resources, ways of publication and citation or ways of embedding research teams are not equal in different subject areas, and general (Turner, 2017).
One of the technical factors called into question is the capacity of databases to correctly assign authors' names and affiliations (van Raan, 2005a;Moed, 2002).The solution to this problem would call for careful data cleansing as well as clear and standardised guidelines.As these technical problems affect 60 % of the Academic Ranking of World Universities (hereafter ARWU) indicators, they introduce considerable uncertainty (van Raan, 2005a).Problems have also arisen around the differences in the values accorded to journals, citations and so on, depending on the database queried (Fiala, 2012;Lascurain et al., 2015).
Regarding methodology, Robinson and Jimenez-Contreras (2017) have pointed out that "the development of proper methodologies for the elaboration of research rankings is an on-going research front in which many variables unexplored and questions still remain unsolved".In this sense, methodological problems have also been identified in connection with selection criteria (Zornic et al., 2014) and indicator weighting (Stewart, 2014).The scant relevance and discriminatory capacity of factors based on Nobel Prizes or Fields Medals have been challenged, for instance, along with the justification for the choice of indicator time periods or the bias toward research-based criteria (van Raan, 2005a;Billaut et al., 2010).The reliability of the findings (van Raan, 2005a(van Raan, , 2005b;;Saisana et al., 2011) and the scant replicability of ranking results (Florian, 2007;Docampo, 2013;van Raan, 2005a;van Raan, 2005b;Jovanovic et al., 2012) have also been criticised.
The perspective adopted here is that solving formulation-related problems holds the key to strengthening the prestige and credibility of rankings among the scholastic community, as well as their 'practical utility' as an intra-university assessment tool and a benchmarking instrument for inter-university comparisons.This paper addresses three controversial matters in connection with the methodology used in the Shanghai ranking.Firstly, the debate revolves around the suitability of the most relevant indicators.
Here criticism has focused on the arbitrariness of evaluations based on Nobel Prizes or Fields Medals, a debate that has apparently been internalised, for the ARWU has now launched the possibility of establishing a 'related' ranking excluding these two indicators (Dobrota & Dobrota, 2016).The arguments against Nobel Prize-or Fields Medalbased indicators include the following:

•
These indicators are geared to measuring the prestige of only a few universities but are scantly able to evaluate the quality of their student bodies or academic staff.
They are barely related to the object of the measurement.• Some authors find the time period for assigning points unjustified (Billaut et al., 2010).
• Both indicators may be measuring university characteristics that are better assessed by other variables.

•
These indicators fail to quantify universities' efforts to improve their quality, they limit recently founded (under 50 years) universities' visibility and consequently are overly conservative, essentially benefiting universities of consolidated prestige.
In another vein, factors related to the strategies followed to retrieve papers published and their subsequent formulation to compute the scores in the respective indicators are also debated.Problems have been detected in the assignment of points under the indicators based on article counts.
Lastly, inasmuch as some authors have pointed out the difficulty of replicating the findings following the methodology described in the ranking (Florian, 2007;Billaut et al., 2010;Docampo, 2013), this study puts forward a proposal as an alternative to the one recently suggested by Docampo (2013).
This paper puts forward a simpler alternative for estimating the points with which universities are scored under two of the ranking's key indicators: the number of articles in Nature and Science (N&S) and the number listed in the Web of Science (PUB), based on raw data (non-fractionated counts) obtained directly from the Web of Science (hereafter WoS) using the WoS search field 'Organisation-enhanced'.
It addresses three aspects of the methodological debate on university rankings: the relevance of the indicators, data retrieval and replicability of results.It proposes a tool for estimating the points assigned under these two indicators of highest interest for universities.

ACADEMIC RANKING OF WORLD UNIVERSITIES (ARWU)
The ARWU ranking appeared in 2003 in the wake of the success obtained with the '985 project' that analysed a number of Chinese institutions to help them attain 'world class university' status.
In other words, this first global university ranking was created to determine the worldwide position of Chinese higher education institutions.
According to the methodology described by the founders, the values of each indicator are assigned proportionally, with the highest ranking institution scoring 100 (Liu et al., 2005;Florian, 2007).However, it was unable to replicate the findings when applying that methodology.Docampo (2013) proposed a formula for estimating an institution's final score for each indicator further to the methodology laid down in the ranking.Although that formula is simple and operational, a number of factors hinder the estimation of an institution's score, per indicator and overall: the difficulty of obtaining fractionated counts for articles and determining the search strategy for each institution involved are two such impediments.

ARWU methodology and indicators
Every year, the ARWU analyses 2 000 universities and ranks 500 in a list published annually.Its methodology focuses on universities' teaching and research performance, assessing six indicators which it weights and classifies under four criteria.
The quality of the education delivered by the institution is based on the number of Nobel Prizes or Fields Medals earned by its alumni.As noted in the introduction, this indicator has been highly criticised.
Academic staff quality is determined by two indicators.The first is the number of professors holding a Nobel Prize or a Fields Medal, weighted depending on the time lapsing since the award.The objections raised to this indicator are likewise discussed in the introduction (Billaut et al., 2010).The second criterion is the number of the institution's researchers heavily cited in the 21 Thomson-Reuters scientific categories (Bornmann, & Bauer, 2015).One of the problems with this indicator identified by some authors is the lack of justification for applying a 10-year timeframe (Billaut et al., 2010).
Institutions' scientific production is also measured with two indicators.The first is the number of articles published in Nature and Science in the last five years.This indicator weights authors by the order of appearance (corresponding author 100 %, first 50 %, second 50 % and all others 10 %).The lack of justification for this type of count as well as the problems surrounding institutional affiliation are two factors that have been criticised in connection with this indicator (Billaut et al., 2010).The second is the number of papers published by the institution's academic staff in journals listed in the two WoS indices, Science Citation Index-Expanded (SCI-E) and Social Sciences Citation Index (SSCI).Here one of the problems is that the number of papers in the SCI-E doubles the number in the SSCI.
Lastly, the ARWU ranking assesses institution productivity relative to size, calculated from the values of the five indicators described, divided by the number of full-time equivalent academic staff members.
Each institution is ranked in accordance with the findings for the aforementioned indicators and a value of 100 is assigned to the university with the highest score in each indicator.The percentage values of the remaining institutions are then recalculated on that basis.

RESEARCH OBJECTIVE
This article aims primarily to establish a reliable method for quickly and easily estimating universities' scores in the Shanghai ranking.The model proposed calculates the scores for research output indicators N&S and PUB based on simple Web of Science queries, with no need to standardise the raw data.It can consequently be used to find the score for any university, whether or not it is included in the Shanghai ranking.

METHODOLOGY
Firstly, the PUB score was estimated for all the institutions ranked in the 2015 edition of the ARWU based on the number of articles listed in the Web of Science indices in 2014 under each university's standardised name.The same procedure was followed to estimate the N&S indicator, except that the interval used was 5 years (2010-2014), i.e., the same period applied by the Shanghai ranking to calculate this indicator.The flow chart in Figure 1 shows the steps followed.In both cases gross (i.e., non-fractionated) counts were used.The data were obtained on 15 may 2016.
A two-step approach was followed to develop a mathematical model able to minimise the difference between the actual and the predicted points found for PUB and N&S:

•
A number of mathematical models were first considered, most based on exponential or logarithmic functions.When, as in this case, the regression curve is non-linear, the problem is broached using the likewise nonlinear regression function.Its expression is y=f(x,θ), where y is the output (the indicator), x is the data vector (the regressors, here the number of papers published by the institution and the number by the highest scoring institution) and θ is the vector for the model parameters (α and β).The error is defined as the difference between actual output, y T , and model output, y: e= y T -y.
• Secondly, the model parameters (α and β) were selected to minimise the total error.Whilst any of a number of error functions can be used, quadratic cost, C T = ∑ i e i 2 is the one most commonly applied.An alternative less sensitive to outliers, the absolute value of the error, C T = ∑ i |e i |, was also considered here.

•
Several optimisation methods have been developed to determine the parameters that minimise the total error.Typically, the derivative of the quadratic cost function is equated to zero and the resulting equation solved, an approach that leads to the well known least squares method.This solution cannot be applied to the absolute value of the error, however, because the derivative is not properly defined at the minimum.As the aim sought in this study was to find an optimisation method compatible with either function, quadratic cost error or the absolute value of the error, the differential evolution optimisation method was adopted.As a sampling-based optimisation method, this procedure does not require the use of the derivative of the total error function and is consequently suitable for determining the minima or maxima of a wide spectrum of non-linear, non-differentiable and multivariate functions.
After testing several non-linear functions, the expression that provided the best fit to the problem posed was as shown in the following paragraph.The same basic form of the mathematical function was used to fit the input data to the output scores considered.
The input data for the N&S non-linear function were the number of articles published by institution (and the university with the largest number) in Nature and Science, and for PUB the total number of papers listed in the SSCI and SCI-E databases.The output consisted in the scores obtained for the indicator, i.e., the gross values for each institution, including the values for the highestranking university under that indicator and the scores obtained by the institution at issue in the respective year.The resulting function estimated the scores for each institution in the two indicators.After analysing a number of non-linear functions, the one that afforded the best fit was as shown below: The functions were seasonally adjusted to provide a model that could be applied to several years.The term (1+x)/c represents the relationship between the total articles published (N&S or PUB) by an institution and the institution with largest number of publications (N&S or PUB).This averts any dependence on a specific year's values.The term (+1), in turn, excludes the possibility of zero values in the denominator of the expression (log c/(1+x)).
Table II shows the summary statistics of the coefficients values obtained after having applied the model to the years 2008-2015.
Figures 2 and 3 show the results: the actual scores published by ARWU 2015 (in blue) and the points estimated after fitting the non-linear function for each indicator (in red).
Whilst the same non-linear function was applied to both indicators to find the regression curve, the coefficients that afforded the best fit logically varied.
The choice of N&S and PUB was justified by the fact that the former was the indicator most closely correlated to the overall ranking (N&S, r= 0.935), as shown in Figure 4, and the latter the sole indicator affecting all universities (see Figure 5).Most of the institutions assessed showed no values for the other three indicators (Alumni, Award and HiCi) or the correlation between the respective scores and the final score was very low.Consequently, only the values for previous years were factored into the equation as a constant to improve the fit.

Min
The goodness of fit for the model proposed was tested with the Kolmogorov-Smirnov procedure.The test findings revealed the similarity between the shapes of the two distributions (the estimated and the actual points), as shown in Table III and Figure 6.The calculations and graphics were performed with 'rgr' data exploration analysis software (Garret, 2015).

Nature and Science (N&S) indicator
The determination coefficient (R²) was computed to determine the quality of the model used for the estimate.Table IV lists

WoS listing indicator (PUB)
Table VI lists the determination coefficients for the number of papers listed in the WoS in 2014, the PUB scores obtained in the ARWU 2015 and the scores estimated for the indicator in this study (Estimated PUB score, ARWU 2015).The model used here delivered a better fit than the gross values for this indicator also.

DISCUSSION AND CONCLUSIONS
Since the Academic Ranking of World Universities (ARWU) was first published in 2003, the number of university rankings has multiplied each using different criteria and indicators to classify higher education institutions.The so-called league tables (ARWU, THE, QS) that have headed this trend are the ones most highly known among the public at large.Hand-in-hand with their growing popularity, considerable research has been conducted to analyse their conceptual grounds, techniques and methodologies.Reproducibility of their findings is one of the areas addressed.

Florian
(2007), attempting to calculate universities' scores for the 2005 ARWU, unexpectedly found that the Shanghai 2005 findings could not be reproduced using the methodology outlined by the ranking.A plot of the paper counts and actual PUB scores obtained by listed institutions showed that the relationship between the two variables was not proportional, but rather non-linear.Inasmuch as the function is purported to fit the end scores to the raw paper count, the author reported that non-reproducibility of the results for the ranking could only be justified by the indication in the methodology to the effect that 'the data distribution for each indicator is reviewed for any significant distortion: standard statistical techniques are used to adjust the indicator as necessary' (ARWU, 2015).Docampo (2013), in turn, addressed the difficulties encountered when attempting to reproduce the ARWU results.His paper included a procedure for estimating an institution's score for each of the ranking indicators with which he confirmed 'that the results of the Shanghai ranking are in fact reproducible'.The discrepancies found by the author between the scores estimated with his procedure and the actual scores published for each university in the ranking were justified by difficulties in assigning published papers to institutions (Docampo, 2013).
The aforementioned approaches used by Florian (2007) to estimate PUB scores and Docampo (2013) to calculate the scores for all the indicators in the ranking entail search (the search strategy applied to identify an institution's output) and paper assignment (fractionated counts and article weighting) processes that appear to be overly complex for non-experts.
The model introduced here is geared to replicating and predicting the ARWU results simply and with readily accessible data.It was successfully applied to estimate scores for the indicator that measures papers listed in WoS databases (PUB) as well as the indicator for the number of articles published in Nature and Science (N&S), as shown in the results described above.
The prediction model developed delivered a high correlation between the scores predicted and the scores published by the Shanghai ranking.The determination coefficients for the observed and estimated values were R 2 =0.952 for N&S in the 2015 edition of the ARWU and a similarly high R 2 =0.953 for PUB (tables IV and VI).The Kolmogorov-Smirnov test was run to verify whether the median, mean or shape of the distribution of the two populations (model-estimated and actual points shown in the ranking) differed.The results showed that the two distributions had the same shape for the indicator PUB, as well as for the indicator N&S (table III).
The model proposed supports the reproducibility of the Shanghai ranking for, while the findings could not be reproduced with absolute accuracy, they were estimated very reliably.It may, then, constitute a very useful tool for estimating an institution's PUB and N&S scores with a view to enhancing both its scientific strategy and its benchmarking skills.It may also be applied in the context of a country's research policy to ascertain the role of its universities from an international perspective, for the model can be used to estimate the scores for universities not ranked among the ARWU's top 500.Lastly, the alternative method proposed in the paper contributes to the theoretical understanding of the methodology used to build rankings and the calculation mechanisms applied, as well as to interpreting the indicators.Practically speaking, the simplicity of the calculations involved makes the model accessible to any user (including policy 7. REFERENCES makers, students and other stakeholders) seeking to obtain significant information on an institution.It consequently favours the development of strategic plans and benchmarking policies and is particularly useful for assessing universities not listed in the Shanghai ranking.

Figure 1 .
Figure 1.Flow chart for estimating N&S and PUB

Figure 4 .Figure 3 .
Figure 4. Correlation coefficient (r) between ARWU indicators and Shanghai ranking scores (2015).P-values: < 0.0001 the determination coefficients for the number of N&S papers in 2010-14 compared to the N&S scores published in the ARWU 2015 and the scores estimated for the indicator with the methodology proposed.The model used here delivered a better fit than the gross values.By way of example of the fit obtained with the proposed model, TableVshows the estimated N&S scores (E_N&S ARWU-2015) for 50 universities chosen randomly from the top 500, selecting 10 from each range of 100 institutions.

Figure 5 .
Figure 5. Number and percentage of institutions per ARWU 2015 indicator

Table I .
ARWU Ranking for 2015: indicators and weights

Table IV .
Determination coefficients (R²) for gross values, actual and estimated scores for the N&S indicator

Table V .
Papers published in Nature and Science (N&S), actual and estimated scores

Table VII ,
in turn, gives the estimated PUB scores for the same universities as listed in TableV.The first column shows the number of papers (WoS papers in 2014), the second lists the scores published for each institution in the ranking (PUB score, ARWU 2015) and the third the estimated scores (Estimated PUB score, ARWU 2015).

Table VI .
Determination coefficients (R²) for gross values, actual and estimated scores for the PUB indicator

Table VII .
Papers listed in WoS (PUB), actual and estimated scores