On a pairwise comparison-based consistent non-numerical ranking

We discuss a consistent model of pairwise comparison-based non-numerical ranking. An algorithm that enforces consistency for raw or partially organized ranking data is presented and its properties are analysed. The concept of testing subjective rankings is also discussed.


Introduction
A ranking or preference is usually defined as a weakly ordered relationship between a set of items such that, for any two items the first is either 'less preferred', 'more preferred' or 'indifferent' to the second one [10].The ranking is numerical if numbers are used to measure importance and to create the ranking relation.Numerical rankings are usually totally ordered.Various kinds of global indexes are popular examples of numerical rankings.
The pairwise comparisons method is based on the observation that it is much easier to rank the importance of two objects than it is to rank the importance of several objects [3].The problem is then reduced to constructing a global ranking from the set of partially ordered pairs.The method can be traced to the 1785 Marquis de Condorcet paper [1,5], and was explicitly mentioned and analysed by Fechner in 1860 [7], made popular by Thurstone in 1927 [23] and was transformed into a kind of semi-formal methodology by Saaty in 1977 (called AHP, Analytic Hierarchy Process, see [6,10,21]).
At present, pairwise comparisons are practically identified with Saaty's controversial AHP.On one hand, AHP has respected practical applications, on the other it is still considered by many as a flawed procedure that produces arbitrary rankings.For more details, the reader is referred to references [6,11,16].
Pairwise Comparison-based non-numerical solutions were proposed and discussed in references [11,[13][14][15].The model presented in this article stems from [11] and [13] and can be applied to improve the quality of decision making in eHealth systems [4,18].
The model presented below uses no numbers and is entirely based on the concept of partial orders.Non-numerical rankings should be weak orders and sometimes total orders, but the initial empirical data may not even be a partial order, in general they are just arbitrary relations.This motivates asking what is the 'best'partial order approximation of an arbitrary relation, and what is the 'best'weak order approximation of an arbitrary partial order?The latter problem is discussed in detail in [8,9], while four different solutions to the former were proposed and analysed in [12,13].In this article, we will use the approximation denoted (R + ) • (calculate the transitive closure and then remove all cycles), which was first proposed by Schröder in 1895 [22].Parts of this article are based on Yun Zhai's PhD thesis [24].

Relations and partial orders
In this section, we recall some well-known concepts and results that will be used in the following sections (cf.[8,20]).
Let X be a finite set, fixed for the rest of this article.For every relation A partial order is We will call R • the acyclic refinement of R.
Approximations of partial orders by weak orders are just proper extensions.Various methods were proposed and discussed in [8] and specially in [9].For our purposes, the best seems to be the method based on the concept of a global score function [8], which is defined as (for every finite set X , X denotes its number of elements): Given the global score function g < (x), we define the relation < w ⊆ X ×X as We will use this technique in our model.

Consistency-driven non-numerical ranking: the model
A pairwise comparisons ranking data [13] is a tuple R = (X ,R 0 ,R 1 ,...,R k ), where X is the set of objects to be ranked, k ≥ 1, and R i 's are relations satisfying X ×X and R i ∩R j =∅ unless i = j.The relation R 0 , interpreted as indifference, is symmetric and reflexive, while the relations R 1 ,...,R k , interpreted as preferences, are asymmetric and irreflexive.
The relations (R 0 ,R 1 ,...,R k ) are based on empirical data or judgements, so no other specific properties are expected.
A tuple R = (X ,R 0 ,R 1 ,...,R k ), is called a pairwise comparison consistent ranking system when some additional consistency properties are satisfied, and is called We will devote the rest of this section to this problem.Quite often we will use the same symbol to denote both R i and R i , our algorithm presented later will take R , and produce R.
In [11], the case R = (X ,≈,<,⊂,<,≺), with the following interpretation a ≈ b: a and b are indifferent, a < b : slightly in favour of b, a ⊂ b : in favour of b, a < b: b is strongly better, a ≺ b : b is extremely better, was proposed and some (incomplete) axioms were proposed.For all practical applications, the list <, ⊂, <, ≺ may be shorter or longer, but not empty and not much longer (due to limitations of the human mind [2,19]).
In this article, we consider only R = (X ,≈,<,⊂,<,≺), leaving the generalizations and special cases to the reader (see [13] for more comments on this subject).
Let X be a finite set of objects to be 'ranked', and let ≈, <, ⊂, < and ≺ be a family of disjoint relations on X such that We define the relations <, ⊂, < and ≺ as follows: The relations <, ⊂, < and ≺ are interpreted as combined preferences, i.e. a <b : at least slightly in favour of b, a ⊂b: at least in favour of b, a <b: at least strongly in favour of b and a ≺b: at least b is far superior than a. DEFINITION 3.1 A tuple R=(X ,≈,<,⊂,<,≺) is a pairwise comparison ranking system if it is a pairwise comparison ranking data and the relations <, ⊂, <, ≺ are partial orders.

DEFINITION 3.2
The tuple R = (X ,≈,<,⊂,<,≺) is a pairwise comparison consistent ranking system if it is a pairwise comparison ranking data and the following rules (called consistency rules) are also satisfied:  1) and (4) From step 3 of the algorithm, we always 'increase the disorder'.In the worst case, we may get that ≈= X ×X , but the procedure always stops.
(2) We need to show that the step 5 does not introduce new rule violations.But since < w is an extension of <, i.e. a <b =⇒ a < w b, only ≈ and < may be changed.Hence, no rule is violated which can be verified by inspection.
(3) This is in principle analysis of triples (step 1 in the algorithm), so we have O(n 3 ).Because of the step 3 of the algorithm, each triple can violate each rule only once and the number of rules to violate is finite, so it remains O(n 3 ).Calculating global score and other operations from step 5 are O(n 2 ) so the algorithm is O(n 3 ).Proposition 3.3 might suggest that we do not need Algorithm 1 at all!, but this suggestion is wrong.On contrary, we recommend the following procedure for deriving a pairwise comparison consistent ranking system from a given set of raw pairwise comparison ranking data.
Procedure 1 (1) Apply Algorithm 1 but without insisting on weak ordering of the relation <.
Procedure 1 is clearly O(n 3 ).The reasons for not using Algorithm 2 on raw pairwise comparison ranking data are the following.Algorithm 1 employs procedures for partial order approximations of arbitrary relations (see [13]).Those procedures both extend and prune the initial relations, but they do not always lower preferences, and may also increase them.Algorithm 2 always lowers preferences when there is any inconsistency.While Algorithm 1 produces partial order approximations (in the sense of [13], Algorithm 2 alone, does not!If the initial data are 'almost' a ranking system or 'almost' consistent ranking system, Algorithm 2 alone may produce the same result as Procedure 1.However, if the initial data are 'far away' from a ranking system, Algorithm 2 alone will produce a huge ≈ and few higher preferences, and the result may be of no practical value.Using Procedure 1 where raw data are first made into partial orders and later made consistent should give better results.Our tests discussed in next section support this argument.

Testing
How can we test the results of the algorithms presented in Sections 3 and 4? How do we know if they produce results that make any sense?How can we compare them with algorithms constructed using numerical ranking paradigms?
Testing means that there are some data and results that are known to be correct, and then the technique is applied to the same data.The differences between the correct results and those obtained by a given technique are used to judge the value of the technique.Hence testing models such as the one presented above is problematic since it is not obvious what should be tested against.What are the correct results for given data?If the object has measurable attributes and there is a precise algorithm to calculate the value, the whole problem disappears.Nevertheless, we think we have designed a proper test (suggested in [17]) for these kinds of ranking techniques.
A blindfolded person compared the weights of stones.The person put one stone in his left hand and another in his right hand, and then decided which of the relations ≈, <, ⊂, < or ≺ (interpreted TABLE 1.The initial ranking data and its numbers of consistency violations The initial ranking data, not a consistent ranking.
Numbers of violations for the initial ranking data.The grey cells indicate maximum number of violations.
as described in previous section) held.The experiment was repeated for the same set of stones by various people; and then again for different stones and different number of stones; and again for various subsets of {≈,<,⊂,<,≺}.Those experiments have most likely been carried out by prehistoric man.Our ancestors probably used this technique to decide which stone was better to kill an enemy or an animal.
In this experiment, the stones can be weighted using a precise scale, so we have the precise results to test against.
Both the results of those experiments and their analysis, including the comparison with numerical ranking techniques, can be found in [24].They support all the claims made in this article.In general, inconsistencies occur quite often, but after using Procedure 1 we always obtained ranking that did not contradict the real weights of stones.

An example
We will now illustrate Algorithm 2. Examples of applying Algorithm 1 can be found in [13].
The following experiment has been conducted.A blindfolded person compared the weights of the eight different stones , named A, B, C, D, E, F, G, H . Table 1 (left) presents the results of one such an experiment [11,24].The raw ranking data (X ,≈,<,⊂,<,≺), where X ={A,B,C,D,E,F,G,H }, described by Table 1 (left) is not a consistent ranking and its violation numbers are presented in Table 1 (right).From Table 1 (right), we can see that the pair (C, H ) and its symmetric counterpart (H , C) have five violations (grey cells in Table 1 (right)), and all other pairs have less.In Table 1 (left), we have C < H ; however, according to the consistency rules, the relationship (C, H ) violates the following rules: After the first revision.The grey cells were revised.
Numbers of consistency rules violations.The grey cells have maximum inconsistencies TABLE 3. The case after revising (D,E) to > and (E,D) to <.This is a consistent ranking system After the second revision.This is a consistent ranking.The grey cells were revised.
Number of consistency rules violations (none!) TABLE 4. Global scores for partial orders <, ⊂, < and ≺ Global score similarly for a symmetric pair E ≺ D.Here the total number of violations is 2. Algorithm 2 revises the pair (D, E) to > and (E, D) to <.The results are presented in Table 3 (left), and this is a consistent ranking.Its numbers of consistency violations are given in Table 3 (right).From it, we can see that none of pair violates the consistency rules.
The relationship presented in Table 3 (left), is now a consistent ranking.However, it is not a weakly ordered ranking.Using global score function, we obtain the global scores for the partial order <, ⊂, < and ≺ presented in Table 4.The weak extensions ranking orders are described in Table 5. Usually we are only interested in finding a weak extension of < (see step 5 of Algorithm 2); however, in this case we have found weak extensions of all preferences to illustrate a phenomenon discussed in detail in [13], namely less precise preferences often work as well, or better, than finer preferences.
The stones were weighted and their weights created an increasing total order E, H , C, A, F, D, B, G.Note that this is the same order as weak extensions < w , ⊂ w , < w .The fact that < w correctly describes the real ordering is very interesting since it means that the very rough ranking was sufficient to produce a correct total ordering.This problem is discussed in detail in [13].TABLE 5. Ranking orders for weak extensions < w , ⊂ w , < w and ≺ w

Weak extension
Ranking order

Final comment
The concepts of consistent ranking and pairwise comparison ranking data have been defined and analysed in the setting of partial orders.Some algorithms have been presented.No numbers were used whatsoever, which we believe is more fair and objective approach.A method of testing has been proposed.The approach presented in this article is an extension of models proposed in [11,13].Implementation of Algorithms 1 and 2, and much more, can be found in [24].

TABLE 2 .
by rule 16.3) similarly for a symmetric pair H < C. The total number of violations is 5. Algorithm 2 revises the pairs (C, H ) and (H , C) to ≈, and the new relationship is presented in Table 2 (left).This is still not a consistent ranking as Table 2 (right) indicates.From Table 2 (right), we can see that the pair (D, E) and its symmetric counterpart (E, D) have maximum violations (grey cells in Table 2 (right)).In this case we have D E, the pair (D, E) violates the following rules: The relationship after revising pairs (C,H ) and (H ,C) to ≈