Towards secure judgments aggregation in AHP

In decision-making methods, it is common to assume that the experts are honest and professional. However, this is not the case when one or more experts in the group decision making framework, such as the group analytic hierarchy process (GAHP), try to manipulate results in their favor. The aim of this paper is to introduce two heuristics in the GAHP, setting allowing to detect the manipulators and minimize their eﬀect on the group consensus by diminishing their weights. The ﬁrst heuristic is based on the assumption that manipulators will provide judgments which can be considered outliers with respect to those of the rest of the experts in the group. The second heuristic assumes that dishonest judgments are less consistent than the average consistency of the group. Both approaches are illustrated with numerical examples and simulations.


Introduction
Group decision making refers to the situations where the problem of a selection of the best alternative (option, solution, etc.) is handled collectively by a set of individuals, preferably experts in the field. Usually, such situations involve complex problems too difficult to be handled by an individual, or they deal with actions required to be made by a collective by the law. These situations include government meetings, various board negotiations, policy making, business dealings, complex laboratory experiments, elections, jury trials, reaching consensus in social networks, and many others. The fundamentals of group decision making can be found e.g. in [6,11,18,26,28,37,38,46].
One of the problems associated with group decision making (GDM) is that it is susceptible to a manipulation if one or more experts try to influence the group outcome in their favor by, for example, providing dishonest judgments. The manipulation, especially in the political context, can be traced back at least to the ancient societies of Greece or Rome, see e.g. [7]. More recently, a description of political manipulation in a GDM setting can be found for example in the work of Maoz [40], who investigated examples of U.S. and Israeli foreign policy choices under crisis conditions, or Hoyt [25], who examined American decision process during Iranian revolution. Analysis of manipulation in selected voting methods can be found e.g. in [4,19,20,39,49,51]. Further on, Faliszewski et al. [15,14] proposed different approaches of manipulation protection in the context of elections.
Another area of group decision making vulnerable to manipulation are social networks. An approach to prevent weight manipulation by minimum adjustment and maximum entropy method in social network group decision making can be found in [50]. Similarly, Wu et al. [52] introduced a novel framework to prevent manipulation behavior in consensus reaching process under social network group decision making. They considered two means of manipulation: individual manipulation, where each expert manipulates his/her own behavior to achieve higher importance (weight); and group manipulation, where a group of experts forces inconsistent experts to adopt specific recommendation advices, and investigated models to counteract both kinds of manipulation. Manipulation in multiple-criteria group decision making attracted attention of several recent studies. Dong et al. [12] presented a new strategic manipulation called trust relationship manipulation and discussed clique-based strategies to manipulate trust relationships to obtain the desired ranking of the alternatives. Hnatiienko [24] studied the problem of manipulating the choice of decision options in situations of peer review process and proposed a classification of selection manipulation problems in experts' evaluation. Lev and Lewenberg [36] investigated cases when agents may wish to redraw organizational chart of a company, or markets (which is called 'reverse gerrymandering') to maximize their influence across the company's sub-units, or to allocate resources to the desired areas. Yager [53,54] studied methods of strategic manipulation of preferential data. He proposed modification of the preference aggregation function in such a way that the attempts of individual agents to manipulate the data are penalized. Dong et al. [10] defined the concept of the ranking range of an alternative in the multiple attribute decision making and proposed a series of mixed binary linear programming models to show the process of designing a strategic attribute weight vector. Moreover, the authors studied the conditions to manipulate a strategic attribute weight based on the ranking range and the proposed model. Sasaki [48] discussed the issue of strategic manipulation in the context of group decision-making with pairwise comparisons. The author considered a scenario of group decision-making situations formulated as strategic games and his theoretical results show truthful judgments (pairwise comparisons) can be a dominant strategy only in very limited situations.
Apart from the last study, the problem of manipulation in pairwise comparisons methods has not been studied thoroughly as of yet. Therefore, this paper fills the aforementioned gap and focuses on the group decision making in the analytic hierarchy process (GAHP) and a problem of a possible manipulation of its outcome. In the GAHP setting, a group of experts provides pairwise comparisons of alternatives under consideration with the aim of selecting the best alternative, see e.g. Dong and Saaty [9], Ramanathan and Ganesh [42], or Saaty [46]. The aim of the paper is to introduce two heuristics in the GAHP setting allowing to detect the manipulators and minimize their effect on the group consensus by minimizing their weights. The first heuristic is based on the assumption that manipulators will provide judgments which can be considered outliers with respect to judgments of the rest of the experts in the group. Second heuristic assumes that dishonest judgments are less consistent than average consistency of the group. Both approaches are illustrated with numerical examples and simulations.
The paper is composed of five sections where Introduction (Sec. 1) and Preliminaries (Sec. 2) aim to introduce the reader to the literature on the subject and recall the necessary concepts and definitions of the quantitative and qualitative pairwise comparisons method. The next section (Sec. 3) Inconsistency Driven Pairwise Ranking Aggregation identifies the problem of ranking manipulation and introduces the proposed robust methods for aggregating results coming from various experts. The last but one (Sec. 4) contains two Montecarlo experiments allowing to assess effectiveness of the proposed methods. The presented work ends with (Sec. 5) containing a short summary of the results achieved.

Pairwise comparisons
Comparing alternatives in pairs underlies many decision-making methods including AHP, BWM, HRE, MACBETH and others [45,43,30,3]. In these methods, the results of the comparisons constitute decision-making data that are subject to further processing. Let A = {a 1 , . . . , a n } be a finite set of alternatives (available options that each expert can choose) and E = {e 1 , . . . , e k } be the set of experts involved in the decision-making process. Similarly, let C q = {c ijq ∈ R + : i, j = 1, . . . , n} be a set of pairwise judgments provided by the qth expert so that c ijq is the relative importance of a i with respect to a j according to the opinion of e r . It is convenient to represent set of judgments as a pairwise comparisons (PC) matrix C q = (c ijq ). For the sake of readability, however, we will try to leave the additional index q wherever it is not necessary i.e. when the expert's number will be irrelevant. In such a case the PC matrix takes the form C = (c ij ). PC matrix entries can be interpreted as a ratio of individual priorities. Thus, when for some PC matrix C holds c ij = x we mean that our expert decided that a i is x times more important than a j . For the same reason c ij = 1 means that both compared alternatives are equally preferred. The diagonal of C contains the results of comparisons of alternatives with themselves, i.e. it is filled by 1's. Similarly, in most of the cases we may expect that c ij = c −1 ji . This allows us to formally define this property.

Definition 1.
A PC matrix C = (c ij ) is said to be reciprocal if for every c ij holds c ij = c −1 ji . The purpose of the decision-making methods is to prepare recommendations. It usually takes the form of a numerical ranking that assigns some real values to the alternatives. Definition 2. Let A be a set of alternatives. The numerical ranking function for A is a mapping w : A → R + assigning a real and positive number to each alternative.
The numerical ranking takes the form of a weight (priority) vector w: In the literature we may find more than a dozen methods allowing us to determine the priority vector [32,41]. The most popular one are EVM (Eigenvalue Method) and GMM (Geometric Mean Method) [45,8]. According to the first of these methods the ranking vector is calculated as the normalized principal eigenvector. Thus, having the solution of equation where λ max is a principal eigenvalue of C, entries of priority vector w ev = [w ev (a 1 ), . . . , w ev (a n )] T are given as .
In the case of the GMM method, although the assumptions of the procedure are similar [33], the calculations are simpler. In this approach the entries of a priority vector: w gm = [w gm (a 1 ), . . . , w gm (a n )] T have the form: Thus, the individual priorities of alternatives are just geometric means of rows of a PC matrix. Both above methods have their incomplete versions [22,31], i.e. procedures that allow to calculate the priority vector even if not all entries of C are known.

Inconsistency
Comparing alternatives pairwise is easier than comparing more alternatives at the same time. However, if one makes comparisons independently, it may lead (and usually does) to inconsistency. [32] if for every i, j, k = 1, . . . , n holds that

Definition 3. A PCM C is consistent
It is fairly easy to prove that it is equivalent to existence a positive vector w such that for every i, j = 1, . . . , n In real applications inconsistent PCMs appear naturally. Nonetheless, the level of inconsistency should not be too high as it may lead to several problems including its impact on sensitivity of data [34] or be a reason for questioning the competence of experts, and thus be considered as unreliable. Thus, in the literature plenty of inconsistency indicators have been defined. One of the most popular is the Consistency Index introduced by Saaty in Saaty [45]: Definition 4. The Consistency Index of a n × n PC matrix C = [c ij ] is given by where λ max is the principal right eigenvalue of C (i.e. the maximum one according to the absolute value).
Another interesting inconsistency indicator has been proposed by Koczkodaj [29].
The difference between the two indices is that the latter is not related to the priority deriving method, while the first one contains a reference to the principal eigenvector of C. Both indices have their versions for incomplete matrices [35]. Very often, K(C) is considered a local inconsistency indicator, while CI(C) is called global consistency index [32].
Ranking vectors can be compared in many ways. Depending on whether the comparison is quantitative or qualitative, the appropriate metric is used. A convenient way to compare two ordinal ranking vectors is the Kendall Tau distance [27,34]. Definition 6. The Kendall Tau distance for two ordinal ranking vectors u and v is defined as where K d (u, v) counts the number of pairwise swaps that distinguish two vectors. For example if u = (1, 2, 3) and v = (2, 1, 3) then as the only one swap between 1 and 2 is needed to get vector v from u then in this case K d (u, v) = 1. Due to the similarity of this idea to the algorithm of the so-called bubble sort, this value is sometimes called bubble sort distance.
It is easy to see that for the most distant two vectors u and v where |u| = |v| = n, the value of K d (u, v) = n(n − 1)/2 (in such a case, u contains the elements of v in reverse order). Thus, the normalized Kendall distance of two ordinal vectors takes its final form (2), where for the most distant vectors u, v the value K rd (u, v) = 1. It is worth noting that K rd does not depend on the number of alternatives in the ranking.
For quantitative rankings, any measure of vector distance can be used to determine their distance. For the purpose of this article we use Manhattan distance, however, one can also meet in the literature Chebyshev distance [23]. Definition 7. The Manhattan distance between two cardinal ranking vectors u and v is defined as Providing that all the entries of both u and v sum up to 1 the result M d (u, v) ≤ 2.

Group Decision Making
In a situation where many experts work on a recommendation and each of them presents their own PC matrix, these data must be aggregated. Typically, arithmetic or geometric weighted averages are used for aggregation, although there are strong axiomatic arguments for using geometric mean for this purpose [1]. Hence, in our further considerations, we will focus on this very method.
We can aggregate either entire PC matrices or priority vectors resulting from these matrices. The first approach is called AIJ (Aggregation of Individual Judgments) while the second AIP (Aggregation of Individual Priorities [16]). Let us consider a group of experts E = {e 1 , . . . , e k } whose task is to compare a set A = {a 1 , . . . , a n } of alternatives pairwise. Each of them provides a PC matrix C q = [c ijq ] containing its personal judgements on elements of A. In the AIJ approach first we create the aggregated matrix where r 1 , . . . , r k ∈ [0, 1] and k q=1 r q = 1. Then, adopting C as input, we calculate the final priority vector using the method we prefer. The values r 1 , . . . , r k mean the priorities assigned to individual experts. They correspond to the strength of the influence of individual experts' opinions on the final result. In a situation where the opinion of each of the experts counts the same (this is most often the case) r q = 1/k for q = 1, . . . , k.
In the AIP approach, first for each C q we calculate a priority vector Then, we aggregate vectors so that resulting ranking is given as .
Similarly as before, the higher value of r q the stronger impact of q-th expert to the final recommendation.

Problem statement
In the decision making method, the common assumption is the honesty and professionalism of experts. According to the first of these assumptions, each of the experts will try to express opinions that are as close to the actual state as possible. In other words, they will not express an opinion that is contrary to their own knowledge and inner conviction. Their alleged professionalism, on the other hand, allows us to believe that the assessment made by experts will be reliable and will be based on a possibly objective comparison of various considered options. Both of these assumptions allow us to hope that the judgments of different but honest and professional experts should rather coincide. Therefore, outliers are likely either dishonest or unprofessional. In either case, there is a good reason to reduce the impact of such opinions on the final result.
Describing the facts and fiction about AHP [17, p. 22], Forman notes that "It is possible to be perfectly consistent but consistently wrong". Similarly, to paraphrase Forman, one might say that it is possible to be perfectly consistent and completely dishonest. Nevertheless, in practice, when there are many alternatives and little time to decide, the questions are asked in a random order and the expert does not have the opportunity to learn about the set of alternatives beforehand, the chance of giving consistent but insincere answers does not seem very high. Hence, we can expect the insincere expert will give less consistent answers than the average. This leads to the formulation of a second possible heuristic that may point to dishonest or incompetent experts. Too high inconsistency in an expert's response may be a good reason for reducing their impact on the final ranking values.
The above observations allow us to propose two procedures for prioritizing experts, so that potentially dishonest experts receive a lower priority than others. The first procedure will be based on a deviation from the average (i.e. it will detect outliers in terms of the preferences presented) (Sec. 3.2). The second will take into account the inconsistency in the context of a certain average inconsistency (Sec. 3.3). We are also considering combination both of the above heuristics (Sec. 3.4).

Example
One of the common variants of manipulation in the social choice theory is control [4]. It is usually carried out by the election organizer. Paradigmatic examples of control are adding or deleting voters. A similar effect can be seen with the pairwise comparison method. Hence, by adding or removing experts, one can try to affect the ranking results 1 . Let us consider the example where the opinion of six experts e 1 , . . . , e 6 was taken into account in order to develop recommendations on the four alternatives considered. The experts' opinions were in the form of 4 × 4 matrices: The normalized weight vectors obtained by GMM are as follows: However, a dishonest organizer (process facilitator) added two more experts e 7 and e 8 who lobby for a 2 . As they know that a 1 is its main competitor, they proposed opinions (matrices C 7 and C 8 ) i.e.: Then applying the standard aggregation process [16,21], the final ranking is as follows: This determines the order of alternatives: a 2 , a 1 , a 3 , a 4 which is in line with the expectations of experts e 7 and e 8 .

Preferential distance-driven expert prioritization
T be the ranking vector calculated using either EVM or GMM (Sec. 2.1) based on the matrix C i provided by the expert e i for i = 1, . . . , k. Similarly, let w = [ w(a 1 ), . . . , w(a n )] T be a ranking vector calculated using AIP (Sec. 2.3) based on w 1 , . . . , w k . According to the adopted heuristics, the more the opinion of the i-th expert differs from that of the team of experts, the greater the risk of manipulation. Thus, let be a quantitative distance 2 between vectors w and w i . Then we need to map the individual distances d i (5) to priority values over a certain numerical scale. For this purpose, let us denote the minimum and maximum of the values D = {d 1 , . . . , d k } as d min and d max correspondingly. As d min corresponds to the most preferred expert and d min corresponds to the least preferred expert, we assign the value h ∈ R + to d min and l ∈ R + to d max , where of course h > l. The ratio h/l should correspond to the comparison of the reliability of the expert corresponding to d min to the reliability of the expert corresponding to d max .
Values l and h form the scale on which all the distances d 1 , . . . , d k will be transformed.
Let f : R + → R be a mapping transforming distances d i to priorities, which after normalization can be used in the weighted AIP procedure. The function f should pass through two points X = (d min , h) and Y = (d max , l), so that the highest priority value h is assigned to the expert whose opinion was closest to the mean, and the lowest priority value l is assigned to the expert whose opinion was the farthest from the mean.
As the mapping f let us use a linear function passing through two points X = (x 1 , x 2 ) and Y = (y 1 , y 2 ) in the form This allow us to calculate the values 3 f (d 1 ), . . . , f (d k ) which determines the priorities of experts e 1 , . . . , e k . In order to satisfy the form of the weighted geometric mean one would need to rescale the priority values so that they sum up to one. Hence, the final experts' priorities take the form: After computing r 1 , . . . , r k we calculate the final priority vector w = [w(a 1 ), . . . , w(a n )] T using the weighted version of AIP (4). For the purpose of the example (end of the Section 3.1), let us assume that the expert with the lowest value d min is strongly more credible than the expert with the highest value d max . Following the fundamental scale the ratio h/l = 5, so we may assign h = 5 and l = 1. In result we get the linear mapping function f (x) = −15.0607x + 7.0075 determining the expert weights (Fig. 1). The weights are: Rescaling produces the values that can be used as input to AIP procedure: After re-aggregating the results, we get a priority vector: As we can see the alternative a 1 with the priority w 1−8 (a 1 ) = 0.327, which is preferred by the majority of voters returned to the winner's position.

Inconsistency-driven expert prioritization
Instead of distance between individual judgments vectors and the mean we may use distance between inconsistencies of individual experts and the average inconsistency of their judgments. Thus, let where I is some selected inconsistency index [5,35]. Contrary to the previous case, in which the distance of the individual ranking from the average value was important, here we have to distinguish whether the inconsistency of a given expert is below or above the average. If it is above average, it may mean either an attempt (perhaps naive) of manipulation or the deficiencies of the expert himself (lack of experience, lack of firmness, distraction, time pressure, etc.). However, if the individual ranking is below average, i.e., the expert's consistency is higher than the average, this may, to some extent, be in favor of the expert. On the other hand, a certain degree of inconsistency is considered desirable [47, p. 265] or [44, p. 172]. In other words, both too high and too small a degree of inconsistency may indicate a strategic decision-making, although it is quite difficult to "punish" too much consistency.
These two observations lead to the conclusion that the mapping of distances to priorities should differ depending on the sign of d i . Providing that there exists at least two matrices C i and C j such that I(C i ) = I(C j ) then there must exists two matrices such that I(C p ) < 1/k k j=1 I(C j ) < I(C q ). Let the inconsistency values for the most consistent e min and inconsistent e max expert be: I min = min j=1,...,k I(C j ) and I max = max j=1,...,k I(C j ). Additionally let I mid is the value of the inconsistency of the expert whose opinions' consistency is closest to the average, i.e. I mid = I(C i ) such that I( In the next step, we should set the weights high, middle and low i.e. h, m and l corresponding to the values I min , I mid and I max . Thus, we perform three comparisons of experts' credibility, which results in the following matrix: c min,mid c min,max 1/c min,mid 1 c mid,max 1/c min,max 1/c mid,max 1   .
It is worth noting that the heuristic the smaller the inconsistency, the better results in the constraint according to which c min,mid , c mid,max and c min,max cannot be greater than 1.
Calculating the ranking based on (8)  As mapping f : R + → R transforming distances I i to priorities let us use a piecewise linear function including two segments: A − B and B − C. Thus, Similarly as before f ABC = f allows us to calculate values f (d 1 ), . . . , f (d k ) which after appropriate rescaling (7) form experts' priorities r 1 , . . . , r k . Finally, the ranking is calculated taking into account the priorities of individual experts. It is easy to see that I min = 0.0026 (expert e 3 ) and I max = 0.0528 (expert e 7 ). The average inconsistency is 0.02, thus the nearest inconsistency result was achieved by the expert e 2 with I mid = 0.0152. In the next step we need to compare credibility of e 3 , e 2 and e 7 . Let After rescaling so that all weights sum up to 1 we obtain the values that can be used as input to AIP procedure: The alternative a 1 with the priority w 1−8 (a 1 ) = 0.329 has the highest rank, whilst a 2 takes a proper second place. Thanks to the introduction of priorities, it was once again possible to avoid manipulation.

Mixed expert prioritization
It is possible to use both of the above heuristics simultaneously. So let r 1 i be the weight of the i-th expert calculated on the basis of the heuristic distance from the mean judgment (Sec. 3.2), while r 2 i be the weight of the expert calculated from the difference in inconsistencies (Sec. 3.3). Thus, the mixed expert weight is the linear combination of both: where 0 ≤ β ≤ 1 is the coefficient determining the impact of both heuristics. Since Thus, the obtained weights r 1 , . . . , r k fit the definition of the weighted geometric mean.
In the case of the Example 3.1 and assuming that both heuristics contribute equally to the weights of the experts i.e. β = 0.5, we get

The degree of expert's credibility
In each of the two heuristics described above (Section 3.2 and 3.3), two or three key experts are first selected for credibility assessment. Then, based on this result, a mapping is proposed to prioritize all experts. The credibility evaluation must be made in accordance with the adopted heuristics, i.e. experts less consistent in their judgments or more distant from the average than the competitor have to get the smaller score. This may bring a certain inconvenience for those assessing the credibility of experts. They may focus on their attitude towards individual experts, and not on the quality of their expertise. As the result a person who is liked and popular in the society may get a better assessment than a reliable but not sociable expert.
The way to avoid this trap is to provide a procedure that allow us to prioritize key experts without their explicit comparisons. So we can assume a priori that the ratio of the best to the worst expert (the first heuristic, Section 3.2) is e.g. 5 : 1, or ratios of the best, average and worst expert (the second heuristic, Section 3.3) is e.g. 9 : 4 : 1.
It is also possible to determine these relations in a procedural / functional way. For instance in the case of our example and the second heuristics we may assume that the expert priority should be linearly correlated with inconsistency. Thus, as d min = 0.0026, d mid = 0.0152 and d max = 0.0528 (Section 3.3) the priority may take the values: h = α · d max /d min , m = α · d mid /d min and l = 1, where α ≥ 1 is a gain factor.

Data preparation
For the purpose of both experiments (Sections 4.2 and 4.3) we prepared 4, 000 sets of 20-matrix sets corresponding to different decision scenarios. For this purpose, we first drew 34 priority vectors for 5 alternatives, 33 vectors for 6 alternatives, and another 33 vectors for 7 alternatives. Then, as every priority vector corresponds to exactly one consistent PC matrix [32], we created 100 consistent PC matrices of sizes 5 × 5, 6 × 6 and 7 × 7. Thus, if w = [w(a 1 ), . . . , w(a 5 )] T is a priority vector corresponding to a certain five alternatives then the consistent matrix corresponding to w is: In the next step, we disturbed the elements of these matrices by multiplying them by a random factor ǫ ∈ [1/α, α], where α = 1.1, 1.2, . . . , 5. Thus, the disturbed version of C w takes the form: where ǫ ij ∈ [1/α, α] and ǫ ij = 1/ǫ ji for i, j = 1, . . . , 5. For every consistent PC matrix C wx (where x = 1, . . . , 100) and α k ∈ {1.1, . . . , 5} we randomly selected 20 matrices C (q) w,α k where q = 1, . . . , 20. Thus, every set S wxα k = C (1) wx,α k , . . . , C (q) wx,α k corresponds to a single group decision scenario in which q = 20 experts make decisions as to the priorities of 5, 6 or 7 alternatives convergent (to some extent) with the vector w x . The input to each experiment were 4, 000 such S wxα k corresponding to different sets of alternatives (100 vectors w x ) and different average inconsistency of experts (40 different values of α k ).
For every S wxα k we determined the average level of inconsistency as the arithmetic mean of its components. I.e.
where I denotes the inconsistency indicator (for the purpose of the experiments we used Saaty's consistency index CI) and q = |S wxα k | is the number of experts involved in the decision-making process (for the purposes of the experiments, we adopted the number 20). As a general rule, an increase in α k causes an increase in I(S wxα k ).
For the purpose of the experiments we used GMM (Section 2.1) for priorities calculation. In such a case, it is easy to show that AIP and AIJ (Section 2.3) lead to the same results, we do not need to consider both aggregation methods separately.

Defense against manipulation 4.2.1. Model of manipulation
In the first experiment, we will assume that a certain number of experts are bribed to submit manipulated matrices. For the purposes of the experiment, we assume that the grafter's goal is to make the original second alternative the winner of the ranking. For this purpose, he bribes several experts who are most in favor of the original winner (bribing experts who do not support the current winner seems to be a less effective strategy). In exchange for a bribe, the experts undertake to indicate that the comparison of the second best alternative so far with any other is 9 (the largest value of the fundamental scale), and the comparison of the current leader with any other alternative is 1/9.
Let us see how this somewhat simple group decision-making manipulation scheme works on the simple example. In order to evaluate the five alternatives, four different experts would prepare four pairwise comparison matrices:  i.e. the winner is the second alternative with the score w(a 2 ) = 0.417. Therefore, in order to push through the candidature of the vice-leader of the current ranking, i.e. alternative a 5 with the score w(a 5 ) = 0.233, grafter bribes a 2 's the strongest supporter, i.e. the expert no. 1. Hence, in fact, the first expert submits a manipulated matrix: 3.731 2.75 1/9 1/9 1 1/9 1/9 1/9 0.268 9 1 0.39 1/9 0.363 9 2.566 1 1/9 9 9 9 9 1       .
After aggregation C 1 , C 2 , C 3 and C 4 it turns out that the final priorities looks like follows: w = [0.148, 0.183, 0.08, 0.113, 0.31] T , which means that the manipulation was successful. The new winner was alternative a 5 with the score w(a 5 ) = 0.31, which without manipulation would have taken the second position. If bribing one expert was not enough, the grafter would try to bribe next by one strongest supporter of a 2 etc. In the above procedure, we assume that grafter knows who the winner's strongest supporter is (and bribery of whom could potentially be most disadvantageous to the winner and beneficial to the preferred alternative). In practice, grafter usually does not have such knowledge and must rely on his intuition and knowledge of expert preferences. So one can hope that in practice the real grafter will work less efficiently than the one from our experiment.

Experiment results
The input data for the experiment was 4, 000 sets S wxα k composed of twenty PC matrices, each corresponding to the given initial priority vector w x and the range of disturbance factor α k . For each set S wxα k first we calculate the aggregated priorities 4w and based on it we carry out a simulated manipulation attack in accordance with the method described in (Section 4.2.1). We also determine the average inconsistency of experts' responses I (S wxα k ). As expected, along the increase of the α k coefficient, the average inconsistency also gets higher. In most cases, it is enough to "bribe" one to three experts, i.e. manipulate from one to three matrices from the entire S wxα k set. In each of the analyzed cases, the attack method is effective. This means that in each considered case it is possible to "improve" the experts' answers to achieve the intended goal, i.e. to promote the second best alternative in the original ranking to the leader. Let us denote the manipulated set of expert answers as S wxα k , and the manipulated priority vector for S wxα k aggregated using the AIP method as: where i 1 , i 2 , . . . , i n is some permutation of indices from the set {1, 3, 4, . . . , n}.
Of course, according to the assumption of the manipulation (Section 4.2.1), even though w x (a 1 ) > w x (a 2 ), in the manipulated ranking w(a 1 ) < w(a 2 ). Then, we estimated the priority vector with the help of APDD (Aggregation of Preferential Distance-Driven expert prioritization , Section 3.2), AID (Aggregation of Inconsistency-Driven expert prioritization, Section 3.3) and (MX) the mixed method based on a linear combination of priorities of APDD and AID (Section 3.4). As a result for each S wxα k we obtained the following vectors: where (p 1 , . . . , p n ), (q 1 , . . . , q n ) and (r 1 , . . . , r n ) are some permutations of elements from {1, 2, . . . , n}. We consider w APDD x as a WR (winner restoration case) if the order of the first two alternatives has been restored, i.e. p 1 = 1 and p 2 = 2, and as a RR (ranking restoration case), if the order of the alternatives is the same as before manipulation i.e. p 1 = 1, p 2 = 2, . . . , p n = n. The cases of w APDD x when p 1 = 1 or p 2 = 2 are considered as a "failure". We similarly denoted results in the case of w AID x and w MX x . In Figure 3 we can see how the ratios of WR and RR to the number of considered sets S wxα k changes with the increase in the average inconsistency of  S wxα k in the case of the APDD method. In particular, we can observe that for the average consistency CI ≤ 0.1 there are 89% of cases (value 0.89 on the plot) in which APDD was able to restore the correct winner. Similarly, there are 86% of cases where APDD reconstructed a complete ranking.
The AID and MX methods results are presented in Figures 4 and 5.
For CI ≤ 0.1 AID restored winner in 85% and the whole ranking in 83%. The combined MX method reconstructed the winer in 88% and the complete ranking in 86%. In all cases, the number of decision models in which the winner was restored is slightly larger than the number of those where the entire ranking was restored.
Despite the good efficiency in restoring the order of alternatives, the APDD, AID and MX methods are not able to ensure that the resulting ranking will be exactly the same as before the manipulation. In this case, the quality of  the obtained result also depends on the average inconsistency of S wxα k . In Figures 6, 7 and 8 we see how the restored results differ from non-manipulated rankings expressed in the form of the average values of the Manhattan distances 5 depending on M d (w x , w * x ) where * denotes APDD, AID or MX method. It can be seen that with a relatively small average inconsistency of experts (let say 6 around 0.1), the difference between the original and the reconstructed  ranking is also reasonably small. In the case of APDD it is: M d (u, v) = 0.0336, AID: M d (u, v) = 0.047 and MX M d (u, v) = 0.0327. Therefore, in a large number of cases, such a result can be considered acceptable.

Vulnerability to original ranking perturbation
In the second experiment, we assumed that all experts acted honestly. Hence, methods of aggregating expert opinions are defined to minimize the effects of manipulation one can perceive as disturbances. Therefore our goal this time is to check to what extent the proposed methods can "disturb" the actual ranking if the manipulation did not occur. For this purpose, we took the same dataset as in the previous experiment, but this time we did not manipulate individual data sets S wxα k to modify the original result. Similarly as beforew x denotes the aggregated ranking (9) calculated using standard AIP procedure (see Sec. 2.3). The results aggregated using the modified APDD, AID and MX aggregation procedures will be denoted asw x with appropriate superscripted acronym. However, this time both ranking vectorsw x andw x are calculated based on the same data set S wxα k . Therefore, the distance between them can be understood as an indicator of the disturbance of the original ranking by unnecessary use APDD, AID and MX aggregation methods.
As expected, (Fig. 9, 10 and 11) the size of the ranking disturbances depends on the degree of inconsistency. Basically, the greater the inconsistency, the greater the differences between the two vectorsw x andw x . Interestingly, the best results are achieved by the mixed method (Fig. 11), which is a combination of both other strategies (Fig.  8, 9). The advantage of the MX method can be seen not only in the drawings. Indeed, the average distance M d (w x ,w x ) between rankings aggregated using standard and modified procedures are as follows: From the above, it is easy to see that, on average, the results of the MX method (value 0.009) are the least distant from the unmodified aggregation method. Figure 10: Distances between non-manipulated rankings calculated in a standard waywx and with the help of AID method.
The disruption of the ranking may be not only quantitative but also qualitative. This means that the modified method may propose a ranking that will differ from the "original" in the order of the alternatives. As the standard method for determining the ordinal difference between rankings is the Kendall Tau distance (3), we calculated 7 K d (w x ,w x ) for every S wxα k for which I(S wxα k ) ≤ 0.1. The obtained values (vertical axis) can be interpreted as the expected probability that with the assumed not too high inconsistency of the group of experts (I(S wxα k ) ≤ 0.1), both rankings will differ by a given number of transpositions (horizontal axis). For example, in the case of APDD method we can see (Fig. 12) that if the average inconsistency in the group of experts is not to high I(S wxα k ) ≤ 0.1 then there is a 92% chance that both rankings remains unchanged. Going forward, there is a 2.5% chance that they will differ in one transposition, etc.
average inconsistency of matrices in S wxαk Figure 11: Distances between non-manipulated rankings calculated in a standard waywx and with the help of MX method.
Similarly, for the AID grouping method, the chance that the ranking will not change (i.e. K d (w x ,w AID x ) = 0) is 94.4% (Fig. 13). The difference of one transposition (i.e. K d (w x ,w AID x ) = 1) occurred in 1.5% of cases, etc.
the probability of a difference betweenwx andw AID Figure 13: Estimated probability that for I(Sw x α k ) ≤ 0.1 the Kendall Tau distance K d (wx,w AID x ) will be: 0, 1, . . . , 12.
Finally, in the case of a mixed method the chance that the ranking remains unchanged (i.e. K d (w x ,w MX x ) = 0) is 95.1% (Fig. 14). Analogously, the difference of one transposition (i.e. K d (w x ,w MX x ) = 1) occurred in 1.4% of cases, three transpositions are needed to transformw x intow MX x in 1.2% of cases and so on.
the probability of a difference betweenwx andw MX x Figure 14: Estimated probability that for I(Sw x α k ) ≤ 0.1 the Kendall Tau distance K d (wx,w MX x ) will be: 0, 1, . . . , 12.

Experiments summary and discussion
The conducted experiments used 4000 sets containing 20 matrices each simulating the opinions of experts in the group decision-making process. The simulated scenarios contained from 5 to 7 alternatives. The first of the Montecarlo experiments consisted in simulating a simple manipulation and checking the robustness of the experts' ranking aggregation. Three proposed methods were tested: APDD (Aggregation of Preferential Distance-Driven expert prioritization), AID (Aggregation of Inconsistency-Driven expert prioritization) and MX (the mixed method) being a combination of the previous two. For all methods, their efficiency depended on the average inconsistency of the matrix in the tested 20-element set. The higher the inconsistency, the lower the effectiveness. In the case of small inconsistencies (close to 0), the effectiveness of the APDD method was close to 95%. i.e. in 95% of cases out of a hundred, this method was able to mitigate the negative effects of the attack (Fig. 3). The AID method fared slightly worse with effectiveness around 88% (Fig. 4). The mixed approach seem to be in between APDD and AID with effectiveness around 90% (Fig. 5). For inconsistency around 0.1 the effectiveness of APDD drops to 89% (WRwinner restoration) and 86% (RR -ranking restoration). The results for AID and MX are respectively: 85% (WR), 83% (RR), 88% (WR) and 86% (RR). Quantitative differences between the original (non-manipulated) ranking and the "fixed" ranking were not large (Fig. 6, 7 and 8) and varied from 0.0327 to 0.047. Interestingly, the mixed method performed as well as the first APDD method.
The purpose of the second experiment was to test the proposed aggregation methods on non-manipulated data. In this case, these methods seem unnecessary and superfluous. Hence, changes in the differences between the ranking obtained using these methods and the ranking obtained using the classical method can be treated as an unnecessary disturbance. We examined the generated data for the quantitative and qualitative difference between the two ranking vectors. We achieved the best quantitative result for the MX method. The average Manhattan distance for MX was 0.009 (12). The AID method fared slightly worse with the Manhattan distance 0.011 (11), and at the end APDD with the result 0.017 (10). In all the cases, it can be seen that this distance depends on the average inconsistency of experts and increases along with increasing inconsistency.
To examine the qualitative difference between the classic aggregation method and the modified methods, we used Kendall's tau distance measure and the subset of data for which the average inconsistency is not too high (less than 0.1). With these assumptions, the MX method turned out to be the best again, as in the 95% of cases, the ranking did not change (Fig. 14). The AID method fared slightly worse since the 94 of cases, the order remained unchanged. The last position took the APDD method with 92% untouched rankings.
The proposed methods are not perfect. However, almost 90% of effectiveness related to eliminating the negative effects of manipulation is paid for, with a 5 − 8% risk of introducing disturbances when the manipulation occurs. The negative impact in the second case is much less severe when the ranking result is interpreted quantitatively, i.e., when the rank value is more important than the position on the list. Then the chance to eliminate or mitigate the potentially significant changes introduced by the manipulation is paid for with relatively small quantitative changes in the non-manipulated ranking. However, is it worth using modified aggregation methods when the ranking result is ultimately given an ordinal meaning? It depends on the subjective assessment of the people responsible for the decision process. In other words, if the risk of manipulation is not negligible, then it may be worth using the presented aggregation methods. In the article, we proposed three heuristic ranking aggregation methods, the third of which combined the other two. Based on the conducted experiments, the third one is the most effective in practice. However, this observation suggests that adding more heuristics to identify possible manipulations could improve the results. In particular, effective methods of mitigating the effects of manipulation should simultaneously be based on many mutually complementary approaches.

Summary
The article presents three modified procedures for aggregating expert opinions that can be used in group decision-making in the AHP method. They allow for mitigating (or eliminating) the adverse effects of manipulation with a small risk of distortion of the ranking. The first two methods are based on heuristics that make the weight of a given expert dependent on the level of its inconsistency and the group's average opinion. The third method, perhaps the most promising, combines the other two. Developing more secure and tamperresistant methods based on comparing alternatives in pairs will require further study of attack and defense methods against manipulation. Thus, the presented solution is a step closer to this goal.