Aggregation of rankings produced by different multi-criteria decision-making methods

One of the essential problems in multi-criteria decision-making (MCDM) is ranking a set of alternatives based on a set of criteria. In this regard, there exist several MCDM methods which rank the alternatives in different ways. As such, it would be worthwhile to try and arrive at a consensus on this important subject. In this paper, a new approach is proposed based on the half-quadratic (HQ) theory. The proposed approach determines an optimal weight for each of the MCDM ranking methods, which are used to compute the aggregated ﬁnal ranking. The weight of each ranking method is obtained via a minimizer function that is inspired by the HQ theory, which automatically fulﬁlls the basic constraints of weights in MCDM. The proposed framework also provides a consensus index and a trust level for the aggregated ranking. To illustrate the proposed approach, the evaluation and comparison of ontology alignment systems are modeled as an MCDM problem and the proposed framework is applied to the ontology alignment evaluation initiative (OAEI) 2018, for which the ranking of participating systems is of the utmost importance. © 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
Multi-criteria decision-making (MCDM) is a branch of Operations Research that has numerous applications in a variety of areas involving real decision-making problems. In a typical MCDM problem, K alternatives are evaluated on the basis of n criteria, and the outcome of the evaluation is summarized in a so-called performance matrix, within which MCDM methods are used to select the best, sort, or rank the alternative(s). The focus of this study is on ranking, where a set of K alternatives needs to be ranked. There exist several MCDM methods which can be used for the ranking problem, including value and utility-based methods such as AHP (analytic hierarchy process) [48] , ANP (analytic network process) [49] , BWM (best-worst method) [47] , SMART (simple multiattribute rating technique) [14] , and Swing [36] , and also the outranking methods like ELECTRE (ELimination and Choice Expressing REality) and its extensions [17] , and PROMETHEE (Preference Ranking Organization METHod for Enrichment of Evaluations) and its extensions [7] . For more information about popular MCDM methods, see [55] . One of the main controversial issues in this area is that different MCDM methods, even when they use the same input, produce different and potentially conflicting rankings, which means that finding an overall aggregated ranking of alternatives is of the essence. Some studies ignore the existence of such a conflict [29] , or use a simple ranking statistic, like averages [43] , while yet other methods attempt to reconcile the difference and work out a compromise [28,42] . Ku et al. [28] estimate the weight for each MCDM method based on the Spearman's correlation coefficient. The underlying idea is that if the ranking of an MCDM method deviates from those of other methods, it would then be assigned a lower weight. As such, the weight of each MCDM ranking is computed using the correlation coefficient. By the same token, Ping et al. [42] has proposed an optimization problem to determine the weight of each individual MCDM method and then aggregate them accordingly. The optimization problem assumes that the final aggregated ranking is a weighted linear combination of the rankings provided by different MCDM methods, and it tries to determine the weights accordingly. Although these methods do come up with a final aggregated ranking, they do not provide any further information about the consensus or reliability of the aggregated ranking.
In this paper, a new ensemble method is proposed based on the half-quadratic (HQ) theory [18,19,37] . In this regard, a new model is proposed based on a general non-convex HQ function, https://doi.org/10.1016/j.omega.2020.102254 0305-0483/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) and the procedure involved in determining the optimal solution to the given minimization is provided with guaranteed convergence. Although no weights for the MCDM methods are considered explicitly, the proposed model estimates a weight for each of the MCDM methods by using the so-called minimizer function inspired by the HQ theory, whose estimation improves adaptively throughout the optimization procedure. An MCDM method whose ranking is different from those of most of the other MCDM methods being used is treated as an outlier in the proposed framework and, as such, is assigned a lower weight. The aggregated final ranking is also obtained by the weighted combination of rankings of the MCDM methods being used, which means that the methods whose rankings deviate from others will have a lower impact on the final ranking. Although the proposed model is unconstrained, interestingly, the computed weights by the minimizer function preserve the non-negativity and unit-sum properties, that are required for the MCDM methods. The proposed compromise method is also objective, since it does not need to elicit preferences from decisionmakers. However, the MCDM methods being used in the framework could belong to either class of MCDM methods (subjective or objective).
For some of the HQ functions, there are parameters that have to be tuned. To that end, we take advantage of several recent studies to tune the parameters efficiently [22,24] . Having such parameters helps compute a consensus index and trust level based on the computed weights. The outcome of the proposed method is to determine the weights of MCDM methods and compute the final aggregated ranking of alternatives, as well as two indicators showing the level of agreement and reliability of the final aggregated ranking.
As a real-world implementation, we study the evaluation and comparison of ontology alignment systems by using different MCDM methods. Such a comparison is of the essence for two major reasons. First, there are numerous ontology alignment systems in the existing literature [13,16,25,35,46,59] , each claiming to be superior to the other available systems. To support that claim, the developers of the systems involved typically look at solely one performance score, on which the claim of superiority is based. If there are multiple benchmarks, the average of these scores is computed and regarded as the overall performance representation. However, the main drawback of using averages is that it only allows a comparison on the basis of one performance score. As a result, it is not possible to take into account different facets of a system measured by several metrics. For instance, an important criterion for alignment is execution time, which also has to be included in an evaluation and comparison. Here, we formulate the comparison of ontology alignment systems as an MCDM problem, where the performance metrics are the criteria, and the ontology alignment systems are the alternatives. Consequently, the decision which system is superior is transformed into an MCDM problem, making it possible to compare the systems based on multiple metrics. The second reason for using MCDM methods to assess alignment systems is the competition that exists in the ontology alignment evaluation initiative (OAEI), with several standard benchmarks in divided tracks with an available reference (or gold standard). Within that competition, the participating systems conduct the alignment on the given ontologies, and their outcome is then juxtaposed with the reference for evaluation. In addition, there are various performance metrics for different benchmarks, making the final ranking of the systems, which is potentially one of the principal goals of the competition in the first place, much more difficult. In this paper, we review the performance metrics for five OAEI tracks, and apply the MCDM methods along with the proposed ensemble method to determine the final ranking of the systems. The methodology proposed in this paper can also be used by the OAEI organizers to evaluate the participating systems with respect to multiple performance metrics.
In summary, this paper makes the following contributions: • A new approach for ensemble ranking is proposed based on the HQ theory. • The proposed method can assign weights objectively to the MCDM methods being used, since no decision-maker is involved in determining the weights of the final aggregated ranking. • The proposed method can also be used to compute a consensus index and a trust level for the final aggregated ranking. • As a real-world implementation, we study the ranking of ontology alignment systems with respect to multiple performance metrics. Such a ranking is of the utmost importance, particularly for the OAEI where there is a competition involving several standard benchmarks. The proposed ensemble method can be used in other ontology alignment benchmarks as well as any other MCDM problem that uses multiple MCDM methods.
The remainder of this article is structured as follows. In Section 2 , we present the proposed ensemble method, followed by an overview of MCDM methods being used in Section 3 . Sections 4 and 5 are devoted to our real-world implementation of the proposed method in ontology alignment, while the lessons learned are discussed in Section 6 , and conclusions and future research directions are presented in Section 7 . The MATLAB code and the MS Excel solver of the proposed method are freely available at https://github.com/Majeed7/EnsembleRanking .

Ensemble ranking: A half-quadratic programming approach
The MCDM methods may provide different rankings for the same problem because they use different mechanisms, making it hard to provide sufficient support for the ranking of one MCDM method compared to the others. As such, in this section, a compromise method is developed to estimate the final ranking of all alternatives based on the rankings of different MCDM methods. The proposed method utilizes the HQ theory which results in estimating a weight for each of the MCDM methods. The weights obtained by the method satisfy the non-negativity and unit-sum properties, which are necessary for the MCDM methods. In addition, the proposed method is objective, since the weights are computed without any expert input. Another important property of the proposed method is that, in contrast to averaging, it is insensitive to outliers, owing to the use of the robust HQ functions. For aggregating MCDM rankings, outliers are indeed the rankings that are different from the majority of rankings, which means that it is to be expected that they contribute less to the final aggregated ranking. In addition to the aggregated ranking, a consensus index and a trust level are calculated for the aggregated ranking. In the following, we first explain the notations used in the study which follows by reviewing the fundamentals of the HQ theory.
We begin by explaining the notations used in this article. The alternatives are referred to as A i , i = 1 , 2 , . . . , K, while the performance metrics or criteria are denoted by P j , j = 1 , 2 , . . . , n . Thus, there are K alternatives which are evaluated with respect to n criteria (or performance metrics). Furthermore, the matrix containing all performance scores are shown as X , and X i . , X . j , X ij referring to the i th row, the j th column, and the element at the i th row and the j th column, respectively. By the same token, the i th element in a vector like s is shown by s i . Also, we show the Euclidean norm with e 2 = s i =1 e 2 i , ∀ e ∈ R s . The ranking of the alternatives computed by the m th MCDM method is shown as R m , m = 1 , . . . , M, and the final aggregated ranking is shown by R * . In addition, the ranking of alternative k obtained by method m and by the aggregated ranking are shown by R m k and R * k , respectively.

Half-Quadratic minimization
In this section, we review the fundamental theory of the HQ minimization, introduce the appropriate HQ functions and look at the minimization procedure of the HQ programming.
The Euclidean norm is arguably the most popular loss function used in various circumstances, while least square fitting is the most popular regression technique that utilizes the Euclidean norm as the loss function. Although it is simple and also yields a closedform solution, it is highly sensitive to outliers and shows diminished performance in noisy environments. A viable way to solve that sensitivity is to use various robust estimators. In robust statistics, M-estimator is a family of the robust estimators, by which the HQ functions are inspired. Although these functions are not convex, their optimum can be obtained using HQ minimization with guaranteed convergence. Table 1 tabulates the HQ functions g (.) along with their minimizer functions δ(.) that are used in the optimization procedure.
Consider the following minimization, where g (.) is one of the HQ functions tabulated in Table 1 . To solve problem (1) , there are two forms of the HQ programming (multiplicative [18] and additive [19] ) that can efficiently find a local optimal solution. Both forms have been applied to different areas, including robust estimation [34,57] , signal processing [33,38,58] , image processing [21,23] , and machine learning [22,24] . In this paper, we use the multiplicative form since its optimization procedure can be interpreted meaningfully within MCDM. Based on the multiplicative form of the HQ programming [18,37] , problem (1) can be rewritten as min s,w j w j s 2 where w j > 0 is the HQ auxiliary variable, and ψ(.) is the convex conjugate of g (.) defined as [5] , To solve minimization (2) , variables w and s must be updated iteratively until convergence is reached. Based on the HQ multiplicative theory [18] , the update of variables is as follows: where δ(.) is the minimizer function with respect to g (.) (see Table 1 ), and l and l + 1 represent the iteration counter.
In the next section, a new compromise method is developed based on the multiplicative HQ minimization, and it is shown that the auxiliary variable w would play the role of weights in the MCDM problems. Since the value of w is reliant on the type of HQ function g (.), different HQ functions would result in different weights and different final aggregated ranking. We particularly consider the Welsch M-estimator, for two reasons. First, it has shown a promising performance in a variety of problems and it is known to be the most promising and outlier-robust estimator among the HQ functions [23] . Second, we can calculate a consensus index and a trust level if the Welsch estimator is used.

An HQ-based compromise method
The proposed ensemble method can be used for any number of MCDM methods. In this regard, assume that there are M MCDM methods which rank K alternatives on the basis of n criteria.
A simple yet practical solution to estimate the overall ranking R * is to minimize its Euclidean distance to each computed ranking. The corresponding minimization is, where M is the number of MCDM methods and R m is the ranking of the m th MCDM method. Minimization (5) has the following closed-form solution, which is indeed the average of the rankings produced by different methods. However, averages are not reliable estimators, since they are sensitive to outliers [11] , like other methods using the Euclidean norm as their basic loss function. In aggregating rankings, it means that, if one MCDM method has a distinct ranking from the other methods, it can significantly influence the aggregated ranking. Instead, we utilize the HQ functions, which are potentially insensitive to outliers [26] , as well as allowing us to compute a consensus index and trust level for the final aggregated ranking. The proposed optimization problem to estimate R * is, where g (.) is an HQ function. Although minimization (7) is not convex, it can be solved efficiently using half-quadratic programming [18,37] . Using the HQ multiplicative form as in equation (2) , minimization (7) can be restated as, where α ∈ R M is the half-quadratic auxiliary variable. According to the HQ programming, the following steps must be iterated until convergence for the two variables is reached, The solution to the first step is obtained by the minimizer function tabulated in Table 1 , and the optimum for the second step is obtained by setting the derivative of the objective function equal to zero, i.e., Thus, the final aggregated ranking is computed as the weighted sum of all the MCDM rankings, with the weights being computed by the minimizer function. Interestingly, the weights of MCDM rankings in (10) are non-zero and fulfill the unit-sum property, which are the requirements for the MCDM methods. Note that the optimization problem is unconstrained and these properties are fulfilled, thanks to the use of the HQ functions. Algorithm 1 summarizes the overall procedure of the proposed ensemble ranking of MCDM methods.

Algorithm 1 Ensemble Ranking.
The following lemma guarantees the convergence of this algorithm. Proof. The function δ(.) has the following property [37] , where R * is assumed to be fixed. Similarly, the sequence of R * is decreasing since J is convex, e.g., Thus, the sequence The proposed ensemble method is predicated on the fact that proper ranking methods are used, since the final aggregated ranking is naturally dependent on the ranking methods in question. If we add or remove a ranking method, the aggregated ranking is likely to change. However, in cases which include a significant number of methods, the proposed method is much less sensitive to adding or removing a ranking method. As such, the proposed method can be particularly useful in voting systems which usually contain a considerable number of votes.

Remark 2.3.
The methods for ensemble ranking are useful for the case where there is no prior information about the suitability of one specific ranking method. In this situation, the rankings of different methods are treated equally a priori, and finding an aggregated ranking is desired, typically by working out a compromise between different rankings.

Consensus index and trust level
The weight of each MCDM method differs with respect to the HQ function in question, since δ(.) relies on the g (.) function. Consequently, various HQ functions would result in different weights and a different final aggregated ranking. Among the HQ functions, the Welsch estimator has shown a promising performance in a number of domains [22,24] . Interestingly, it is possible to obtain a consensus index and trust level using this estimator, owing to its use of the Gaussian distribution in the formulation. Prior to obtaining the consensus index and trust level, we first need to discuss tuning the parameter σ in the Welsch estimator. As a recent study has indicated [24] , the parameter of this estimator can be tuned recursively in each iteration as, After computing σ in the optimization procedure, we now discuss the consensus index and the trust level of the final ranking obtained by Algorithm 1 .

Definition 2.4 (Consensus Index)
. A consensus index C shows the extent to which all MCDM methods agree upon the final ranking.
The key element in this definition is that the consensus index shows the agreement among all the ranking methods being used, allowing us to compute the similarity of each ranking with the final aggregated ranking, thanks to the Welsch estimator. As a result, the consensus index C of a given final ranking R * with respect to rankings R m , m = 1 , 2 , . . . , M can be computed as, where N σ (. ) is the probability density function of the Gaussian distribution with a mean of zero and a standard deviation of σ , and N σ (0) is used to normalize the similarity computation, thus If there is a complete agreement between different rankings, then that results in a consensus index of one. As rankings deviate from each other, the consensus index decreases. As a result, the consensus index is an indicator of the agreement among different rankings. It means that, if there is one ranking method that is different from the rest, it can adversely affect the consensus index. At the same time, this distinct ranking method is treated as an outlier in the HQ functions being used. As a result, it will have less impact on the final ranking, while it can profoundly influence the consensus index.

Definition 2.5 (Trust Level) .
A trust level T for ensemble ranking is the degree to which one can accredit the final aggregated ranking.
The trust level is an indicator of reliability of the final ranking. For instance, if there is an MCDM ranking that deviates significantly from the majority of rankings, it takes a lower weight in Algorithm 1 , and consequently, has less of an impact on the final ranking. Since the weight of such a method is lower than that of the other methods, it should also have less impact on the trust level. Taking this into account, the trust level can be computed as, where w m , m = 1 , . . . , M, is computed in Algorithm 1 . Thus, the trust level is distorted to a lesser extent by the rankings that are different from the majority of rankings, and it is a measurement of the reliability of the aggregated ranking R * computed by Algorithm 1 . It is evident from equation (15) that the trust level is equivalent to the consensus index if the weights of MCDM methods, i.e., w m , m = 1 , 2 , . . . , M, are identical. Fig. 1 summarizes the implementation process of the proposed ensemble ranking to a decision-making problem.

Three MCDM methods for illustrating the proposed approach
There exist several MCDM methods which can be used for the ranking problem (see [55] for an overview). In this study, three different MCDM methods (TOPSIS, VIKOR, and PROMETHEE) are selected to illustrate the proposed ensemble ranking method. These methods are used (in the next section) to rank alignment systems with respect to several performance metrics (criteria). We selected these three methods as they are among popular methods in the MCDM field (see, for instance, [12,32,44] for the applications of TOPSIS, [2,4,50] for the applications of VIKOR, and [3,20,31] for the applications of PROMETHEE). Secondly, compared to many other MCDM methods, they can be used in an objective way, without having to include the opinions of experts or users. In addition, they were selected because of their ability to rank alternatives, which implies that other MCDM methods, which are devised for other purposes (such as sorting or selecting), are not appropriate for this study, although that does not mean that the three MCDM methods being used in this study are the only usable methods, nor does the proposed method rely on the number of MCDM methods.

Technique for order preference by similarity to ideal solution (TOPSIS)
TOPSIS is one of the popular MCDM methods for ranking alternatives with respect to a set of criteria [56] . It first identifies the positive-ideal and negative-ideal solutions and then ranks the alternatives based on their distances to the two computed solutions. The alternatives are ranked based on their closeness to the positive-ideal solution and their distance from the negative-ideal solution.
While TOPSIS has many variations and extensions [1,8,10] , in this study, we adopt the original version proposed in [41] . The ranking process in TOPSIS includes the following steps: Step 1: First, the performance matrix should be normalized. The elements of the normalized matrix ˆ X are calculated as, Step 2: Find the positive-ideal solution S + = (S + where S + j = max k ˆ X k j for benefit criteria, e.g., profit, and S + j = min k ˆ X k j for cost criteria, e.g., time.
Step 3: Find the negative-ideal solution where S − j = min k ˆ X k j for benefit criteria, and S − j = max k ˆ X k j for cost criteria.
Step 4: Calculate the Euclidean distance to the positive-ideal and negative-ideal solutions for each alternative. For the k th alternative, the distance to the ideal solution, D + i , and to the negative-ideal solution, D − i , is computed as Step 5: Calculate the ratio L k for each alternative as Step 6: Rank the alternatives according to their ratios L k in a descending order.

Vlsekriterijumska optimizacija i kompromisno resenje (VIKOR)
VIKOR is another MCDM method that ranks the alternatives based on a set of possibly conflicting criteria. The procedure used in VIKOR can be summarized as follows [39,40] .
Step 1: Find the best f + and the worst f − values among the alternatives for all criteria. For the benefit criteria, we have where the minimum and maximum are substituted if it is the cost criteria.
Step 2: For each alternative, compute S i and R i as Step 3: For each alternative, calculate Q i as where ν ∈ [0, 1] is a trade-off parameter. It is the common practice to set ν = 0 . 5 .
Step 4: Ranking the alternatives based on their corresponding Q i in descending order.
Step 5: For two alternatives A i and A k , A i is given a better ranking than A k if: (a) Q i − Q k > 1 / ( j − 1) ; and (b) A i has a better ranking according to S i and/or R i .

Preference ranking organization METHod for enrichment of evaluations (PROMETHEE)
PROMETHEE uses pairwise comparison between different alternatives to establish a ranking. And while PROMETHEE I [6] conducts partial pairwise comparison and computes the ranking accordingly, PROMETHEE II [54] , on the other hand, uses complete pairwise comparison, which is required for the proposed ensemble method and makes it also more suitable to rank the alignment systems. The ranking procedure used by PROMETHEE II is as follows.
Step 1 : For i, k = 1 , 2 , . . . , K, compute the function π ik as the number of criteria in which A i has better performance than A k , e.g., where I is the Dirac function which is 1 when the condition in the parenthesis is satisfied, and 0 when it is not.
Step 2: Calculate the positive φ + and negative φ − outranking flow and the net flow φ for each alternative as, Step 3: Rank in decreasing order the alternatives based on their net flow.

Fundamentals of ontology alignment evaluation
In this section, we first review the basic concepts of ontology and ontology alignment, and then discuss the metrics to evaluate the alignment systems.

Ontology and ontology alignment
An ontology contains the concepts of a domain, along with their properties and relationships. The following definition explains the ontology in a formal manner. All the classes, properties, and object properties are called the entities of an ontology. The design of an ontology is subjective, so two ontologies describing the same domain can have a distinct structure/terminology, which means that ontology alignment is required to deal with this discrepancy. We now consider the rudimentary concepts of ontology alignment.

Performance metrics
Alignment is the typical outcome of the ontology alignment systems, based on which different systems are evaluated and compared. In addition, several standard benchmarks with a known reference alignment have to be included, so that the evaluation can be made by the juxtaposition of the reference and the alignment generated by a system. The three widely-used performance metrics for ontology alignment are precision, recall, and F-measure. Given an alignment A and the reference A * , precision is the ratio of true positives to the total correspondences in the generated alignment by a system; thus, it can be written as where Pr is the precision and |.| is the cardinality operator.
Recall is another popular metric, which is computed as the ratio of the true positives to the total number of correspondences in the reference. Thus, it can be computed as where Re is recall. Both precision and recall represent only one aspect of the alignment systems; the former only considers the correctness of the alignment, while the latter accentuates the completeness of an alignment with respect to the reference. As a combination of both, F-measure is often used. It is the harmonic mean of the precision and recall and is computed as We do not include F-measure in this study since it is the average of precision and recall, which violates the independence of criteria required for the MCDM methods. Aside from these popular performance metrics, there are two important principles for a given alignment. The first is conservativity [52,53] , which states that, with regard to the alignment being generated, the system must not impose any new semantic relationship between the concepts of the ontologies involved. The second is consistency , which states that the discovered correspondences should not lead to unsatisfiable classes in the merged ontology [53] .
There is also a metric called Recall + , which indicates the portion of correspondences that a system cannot readily detect. When this performance metric has a higher value, that indicates that the associated system is able to identify the most non-trivial, i.e., nonsyntactically identical, correspondences between two given ontologies. In addition, the execution time is another important indicator of the performance of the alignment systems, that also has to be taken into account.

Participating systems and standard benchmarks: Five OAEI tracks
To determine some of the performance metrics, we need to have the underlying true alignment of the ontologies in question, for which we use the benchmarks of five different tracks of the OAEI whose reference alignment are also available. The tracks are anatomy, conference, largeBioMed (large biomedical track), disease and phenotype , and SPIMBENCH . By revising the history of the tracks in the OAEI competition 1 , as well as asking the organizers of the tracks, the appropriate performance metrics for each of the tracks listed above are obtained. Table 2 tabulates the performance metrics for all five tracks. According to Table 2 , the execution time is essential to all tracks, with the exception of conference, since the size of ontologies in this track is small (i.e., < 100 entities) and the systems are therefore able to perform the alignment swiftly. Furthermore, precision and recall are important in all tracks. However, we did not include F-measure, since it is the harmonic mean of precision and recall. In other words, since the evaluation based on MCDM includes both precision and recall, using F-measure is a redundancy. In addition, the criteria must be independent of each other in MCDM, which means that using F-measure would invalidate the overall ranking computed by various MCDM methods.

Experiments
In this section, the MCDM methods and the proposed aggregated methodology are applied to five tracks of the OAEI, and the systems participating in 2018 are compared and ranked accordingly. The alignments produced by various systems are available on the OAEI website. 2

Large BioMed Track
The aim of this track is to find alignments between the Foundational Model of Anatomy (FMA), SNOMED CT, and the National Cancer Institute Thesaurus (NCI) ontologies. The ontologies are large and contain tens of thousands of classes. The performance metrics used to rank the systems participated in this track are execution time, precision, and recall. Table 4 tabulates the ranking of seven systems that applied for matching FMA to NCI. This is an interesting case, since the MCDM rankings are conflicting. In particular, the rankings of VIKOR and PROMETHEE are in line for LogMapBio and FCAMAPX and are both different com pared to the ranking of TOPSIS, while the rankings of TOPSIS and VIKOR agree with regard to LogMapLite and XMap and are distinct from the ranking of PROMETHEE. When considering the weights of MCDM methods, it is interesting to see that the weight of VIKOR is relatively high and is close to one, while the weights of the other two methods are lower and close to zero, which means that the proposed ensemble method favors the middle ground ranking among these three MCDM methods. Since two methods have different rankings compared to the aggregated final ranking, the consensus index is not high at around 0.80. At the same time, the trust level is 1.00 because the weights of two MCDM methods are nearly zero so that they cannot affect this indicator. This table shows that AML, LogMap, and XMap are listed as the top three systems in this task.
In addition, Table 5 shows the ranking of participants in matching FMA and SNOMED. This table is similar to Table 4 , since VIKOR has a higher weight compared to the other methods, with its ranking situated between the other rankings. The consensus index for the final ranking is 0.80, while the trust level is 0.98. Similarly, Table 6 shows the ranking of seven systems participated in matching NCI to SNOMED. According to this table, VIKOR once more has a higher weight, and as a result, the final consensus index is 0.80, with a trust level of 0.98. According to Tables 5 and 6 , AML and LogMap are the top two systems in aligning FMA to SNOMED as well as NCI to SNOMED.

Disease and Phenotype Track
The OAEI disease and phenotype track comprises matching various disease and phenotype ontologies. The OAEI 2018 consisted of two tasks. The first one to align the human phenotype (HP) ontology to the mammalian phenotype (MP), the second to align the human disease ontology (DOID) and the orphanet and rare diseases ontology (ORDO). The performance metrics used for this track are execution time, precision, and recall.
In the OAEI 2018, eight systems were able to align HP and MP, while nine systems could match DOID and ORDO. Table 7 illustrates the ranking of the systems participated in the OAEI 2018 disease and phenotype track for mapping HP and MP ontologies. According to this table, the weights of TOPSIS and VIKOR are significantly higher than that of PROMETHEE, because the rankings obtained by PROMETHEE deviate more from the other two methods. For instance, PROMETHEE puts AML in the fourth place, while the other two consider it to be the best alignment system. As a result, the weight of PROMETHEE became insignificant. The consensus index for this ranking is 0.85 and its trust level is 0.95. Also, this table indicates that AML, LogMapLite, and LogMap are the top systems in this mapping task.
Another matching task in this track involves the alignment of DOID and ORDO ontologies. Table 8 shows the ranking of the participating systems for this task. According to this table, TOPSIS takes the highest weight, since it is a compromise of the other two MCDM methods. In particular, the TOPSIS ranking of DOME lies between those of VIKOR and PROMETHEE. Also, TOPSIS rankings occasionally agree with one of the other ranking methods: It agrees with VIKOR on ranking LogMap, LogMapLite, and XMap, while it is in line with PROMETHEE with regard to POMAPP ++ . Given these rankings, TOPSIS has a higher weight compared to other MCDM methods. The consensus index and trust level of this ranking are 0.87 and 0.95, respectively. Accordingly, LogMap, LogMapLite, and XMap are the top systems on this task with regard to all the performance metrics.

Anatomy track
This track consists of matching the adult mouse anatomy to a part of NCI thesaurus describing the human anatomy. In the OAEI 2018, 14 systems participated in the anatomy track. The systems are compared based on execution time, precision, recall, consistency, and recall + . Table 9 shows the ranking of the systems in the

Conference Track
The conference track involves matching and aligning seven ontologies from different conferences. For this track, there are two different reference alignments, i.e., certain and uncertain. Table 10 tabulates the result of the analysis of the 12 systems participated in this track at the OAEI 2018 with the certain alignment, with a consensus index of 0.91 and a trust level of 0.95. Based on this table, LogMap, AML, and Alin are the top systems. For the uncertain version of the reference alignment, as Table 11 shows, AML, LogMap, and Holontology are the top three systems. The consensus index and trust level for this track are 0.93 and 0.95, respectively.

SPIMBENCH Track
The SPIMBENCH task is another matching task, the aim of which is to determine when two OWL instances describe the same Creative Work. There are two datasets, called Sandbox and Mainbox, each of which has a Tbox as the source ontology and Abox as the target. Tbox contains the ontology and instances, and it has to be aligned to Abox, which only contains instances. The difference between Sandbox and Mainbox is that the reference of the former is available to the participants, while the latter is a blind matching task so that participants do not know the real alignment in advance.
There are only three systems included in this track at the OAEI 2018. Tables 12 and 13 list the ranking of the systems for the Sandbox and Mainbox tasks, respectively. The Sandbox task is interesting, since two MCDM methods have identical rankings, while the other, i.e., TOPSIS, differs in ranking two systems, as a result of which its weight becomes insignificant, while the weight of the other two rankings is about 0.50. The consensus index for this ranking is 0.77, while its trust level is 1.00, since the final ranking is identical to the ranking (or average) of the other two MCDM methods.
For the Mainbox task, Table 13 shows the ranking of the three systems on this task. Interestingly, the rankings of the MCDM methods are identical and they all take on a similar weight in the proposed method. As expected, the consensus index and trust level are also one. According to these tables, Lily performs best in both tasks, followed by LogMap and AML.
Remark 5.1. We discussed the ranking of TOPSIS, VIKOR, and PROMETHEE for different OAEI tracks. They all had higher weights in some tracks and lower weights in some of the others. However, the aim of this study is not to compare MCDM methods or discuss their suitability. These methods can take on higher or lower weights in different decision-making problems, and their weights are entirely dependent on the computed rankings based on the performance matrix of the decision-making problem in question.

Remark 5.2.
In this study we used three MCDM methods for which we do not need to use the expert/decision-maker opinion to make the final ranking. This, however, does not mean that we cannot use the MCDM methods in which expert/decision-maker opinion is used to make the ranking (such as AHP/ANP, BWM). In fact the rankings (which are the input for our ensemble method) could come from any set of MCDM methods (with or without expert/decision-maker opinion). It is, however, important to know that regardless of the MCDM methods we use in our proposed ensemble method, there is no need to have the opinion of an expert/decision-maker on comparing the rankings which are produced by the different MCDM methods.

Discussion
As we discussed earlier, the consensus index and the trust level indicate two different aspects of the final aggregated ranking. Generally speaking, higher values are desirable for both indicators. The consensus index is an indicator of the agreement among all the MCDM methods being used, while the trust level shows the reliability with regard to the final aggregated ranking. Below, based on the main properties of the proposed approach and the findings of the experiments, we elaborate on some general possible outcomes of the proposed methods.
• Consensus index high, trust level high: If all the MCDM methods being used have identical rankings, their weights are analogous and equivalent to 1/ M , where M is the number of ranking methods. In this case, the final aggregated ranking is precisely the average of the individual rankings. As a result, the proposed ensemble method represents the average, or equivalently, the HQ functions operate as the Euclidean norm. This is indeed acceptable, since there are no outliers when all the rankings are identical. In this case, because there is full agreement among all the MCDM methods being used, both consensus index and trust level are one. • Consensus index low, trust level high: Where there is a low consensus index and a high trust level, that can mean either of two things. First, if a small fraction of the MCDM methods being used deliver rankings that deviate from the other rankings, the proposed ensemble method treats them as outliers, assigning them lower weights, which reduces their impact on the final aggregated ranking. The presence of such methods can be detected by inspecting the weights obtained by the proposed ensemble method. Methods that have a lower weight are seen as a deviation from the majority of MCDM rankings, as well as from the final ranking, which means they are treated as outliers. The second option is when the number of methods with lower weights is significant compared to the overall number of the MCDM methods being used. The MCDM rankings with higher weights are the intermediates of all the methods. As a result, the intermediate rankings take on higher weights and have a more profound impact on the final aggregated ranking.
In both of these cases, the agreement among the MCDM methods being used is low, while the final ranking is fully captured by a fraction of the MCDM methods involved, which is why the consensus index is insignificant and the trust level is high. • Consensus index low, trust level low: If all the MCDM rankings in question deviate significantly from each other, the consensus index will be low. In that case, there is not a share of the MCDM methods involved with significantly higher weights, which means that the trust level is also low. • Consensus index high, trust level low: This scenario does not occur, because the trust level is high when there is a consensus among the MCDM methods being used.
This is a general discussion framework, and we think that the levels could be defined by the decision-makers for a particular problem.

Conclusion
In this paper, a new compromise ensemle method was proposed, based on the half-quadratic (HQ) theory. The proposed method can be used to compute a final aggregated ranking, in the form of the weighted sum of the MCDM rankings. The weights in the proposed method were computed using the minimizer functions inspired in the HQ theory, but it satisfied the basic properties of weights in MCDM. In addition, using multiple performance metrics, the ranking of ontology alignment systems was modeled as an MCDM problem, where the systems and the performance metrics served as alternatives and criteria, respectively. In this regard, appropriate MCDM methods were reviewed, each of which could assign a ranking to each system on a benchmark with respect to its performance metrics.
We also introduced two indicators, consensus index and trust level, the former indicates the level of agreement among MCDM ranking methods, while the latter reflects the reliability of the ranking schemes. It became clear in the cases we examined that, when a ranking method deviates from the others, it has a low consensus index but high trust level. As a result, these two indicators are able to delineate different properties of the final aggregated ranking.
Since evaluating and ranking ontology alignment systems are important activities, in particular in light of the ontology align-ment evaluation initiative (OAEI) competition, the approach discussed in this article can be used to produce a final ranking of ontology alignment systems in each of the OAEI tracks. The outcome can provide greater insight into the overall performance of systems and promote the report provided annually by the OAEI organizer.
This study can be extended in various ways. To begin with, the performance metrics used to rank the alignment systems are treated as though they are equally important, but it is worthwhile to keep in mind that different performance metrics may in fact not be equally important, which means that one area of future research involves examining the preferences of different performance metrics for different OAEI tracks by the experts in the domain, and then ranking the systems involved accordingly. To that end, a broad range of MCDM methods could be used.
The proposed approach in this paper has the potential to be used for many real-world applications where a number of MCDM methods are used to rank a number of alternatives, and that a consensus among the methods being used are needed to come up with a final aggregated ranking. Finally, we think that it would be interesting to use the proposed method to integrate the votes in voting systems.