An Examination of Ranking Quality for Simulated Pairwise Judgments in relation to Performance of the Selected Consistency Measure

An overview of current debates and contemporary research devoted to modeling decision making processes and their facilitation directs attention to techniques based on pairwise judgments. At the core of these techniques are various judgment consistency measures which, in a sense, control the prioritization process which leads to the establishment of decision makers’ unknown preferences. If judgments expressed by decision makers were perfectly consistent (cardinally transitive), all available prioritization techniqueswould deliver the same solution.However, human judgments are consistently inconsistent, as it were; thus the preference estimation quality significantly varies. The scale of these variations depends, among others, on the chosen consistency measure of pairwise judgments. That is why it seems important to examine relations among various consistency measures and the preferences estimation quality. This research reveals that there are consistency measures whose performance may confuse decision makers with the quality of their ranking outcome. Thus, it introduces a measure which is directly related to the quality of the preferences estimation process. The main problem of the research is studied via Monte Carlo simulations executed in Wolfram Mathematica Software.The research results argue that although the performance of examined consistencymeasures deviates from the exemplary ones in relation to the estimation quality of decision makers preferences, solutions proposed in this paper can significantly improve that quality.


Introduction
Overwhelming scientific evidence indicates that the unaided human brain is simply not capable of simultaneous analysis of many different, competing factors and then synthesizing the results for the purpose of making a rational decision. Indeed, numerous psychological experiments, e.g., [1], including the well-known Miller [2] study put forth the notion that humans are not capable of dealing accurately with more than about seven (±2) things at a time (the human brain is limited in its short term memory capacity, its discrimination ability, and its range of perception).
Humans learn about anything by two means: the first involves examining and studying some phenomenon from the perspective of its various properties and then synthesizing findings and drawing conclusions; the second entails studying some phenomenon in relation to other similar phenomena and relating them by making comparisons. The latter method leads directly to the essence of the matter, i.e., judgments regarding a phenomenon. Judgments can be relative or absolute. An absolute judgment is the relation between a single stimulus and some information held in short or long term memory. A relative judgment, on the other hand [3], is defined as identification of some relation between two stimuli both present to the observer. Certainly humans can make much better relative than absolute judgments. Thus, a pairwise comparisons method was proposed in order to facilitate the process of relative judgments.
Some authors proclaim [4,5] that this method dates back to the beginning of the 20th century and was firstly applied by Thurstone [6]; however, its first scientific applications can be found in Fechner [7]. In reality, the method itself is much older and its idea goes back to Ramon Lull who lived in the end of 13th century. It is a fact that its popularity comes from 2 Advances in Operations Research an influential paper of Marquis de Condorcet [8]; see, e.g., [9,10], who used this method in the election process where voters rank candidates based on their preference. It has been perfected in many papers, e.g., [4,[11][12][13][14][15][16][17][18][19][20].
The fundamental objective of this research is to determine the answer to the question: Does the reduction of PCM inconsistency lead to improvement of the priority ratios estimation process quality?

Background
The AHP uses the hierarchical structure of the decision problem, pairwise relative comparisons of the elements in the hierarchy, and a series of redundant judgments which enable measurement of judgment consistency. To make a proposed solution possible, i.e., derive ratio scale priorities on the basis of verbal judgments, a scale is utilized to evaluate the preferences for each pair of items. Probably, the most known scales are Saaty's numerical scale which comprises integers, and their reciprocals, from one (equivalent to the verbal judgment, "equally preferred") to nine (equivalent to the verbal judgment, "extremely preferred"), and a geometric scale which usually consists of the numbers computed in accordance with the formula ( ) = /2 for ∈ {−8, . . . , 8}∧ ∈ where c denotes its parameter which commonly equals 2. Other arbitrarily defined numerical scales are also available, e.g., composed of arbitrary integers from one to n and their reciprocals.
The key issue in AHP is priority ranking on the basis of true or approximate weights, i.e., judgments. If the relative weights of a set of activities are known, they can be expressed as a Pairwise Comparison Matrix (PCM): A( )=( / ), i, j=1,. . ., n. PCM in the AHP reflects decision makers' preferences (their relative judgments) about considered activities (criteria, scenarios, players, alternatives, etc.). On the basis of A( ), it is possible to derive true weights; i.e., decision makers priority ratios , where: i=1,. . ., n, are selected to be positive and normalized to unity: ∑ = 1. For uniformity, is referred hereafter to its normalized form. If the elements of a matrix A( ) satisfy the condition =1/ for all i, j=1,. . ., n then matrix A( ) is called reciprocal. If the elements of a matrix A( ) satisfy the condition = for all i, j, k=1,. . ., n, and the matrix is reciprocal, then it is called consistent or cardinally transitive.
Certainly, in real life situations when AHP is utilized, there is no A( ) which would reflect weights given by the vector of priority ratios. As was stated earlier, the human mind is not a reliable measurement device. Assignments, such as "Compare -applying a given ratio scale -your feelings concerning alternative 1 versus alternative 2", do not produce accurate outcomes. Thus, A( ) is not established but only its estimate A(x) containing intuitive judgments, more or less close to A( ) in accordance with experience, skills, specific knowledge, personal taste, and even temporary mood or overall disposition. In such case, consistency property does not hold and the relation between elements of A(x) and A( ) can be expressed as = where is a perturbation factor fluctuating near unity. In the statistical approach, e reflects a realization of a random variable with a given probability distribution.
Besides the prioritization procedure (PP) proposed by the creator of AHP, right principal eigenvector method (REV), there are alternative PPs devised to cope with the priority ratios estimation problem; their demonstrative review can be found, e.g., in [47]. Many of them are optimization based and seek a vector , as a solution of the minimization problem given by the formula {min ( ( ), ( ))} subject to some assigned constraints, such as positive coefficients and normalization condition. Because the distance function D measures an interval between matrices A(x) and A( ), different definitions of the distance function lead to various prioritization concepts and prioritization results. As an example, eighteen PPs in [48] are described and compared for ranking purposes although some authors suggest there are only fifteen that are different. Furthermore, since the publication of the above-mentioned article, a few additional procedures have been introduced to the literature; see, e.g., [43,[49][50][51]. Probably the most popular alternative to the REV is the Logarithmic Least Squares Method (LLSM) developed by Crawford and Williams [26,52]. It is given by the following formula: The LLSM solution also has the following closed form and is given by the normalized products of the elements in each row: Thus, it is also known as the geometric mean method and it is utilized in this research which strives to improve the reliability of the pairwise comparisons process which is also the core element of AHP.

Problem and Research Methodology
There are several PCM consistency measures (PCM-CMs) provided in literature called consistency or inconsistency indices (CIs). The most popular one is the PCM-CM proposed by Saaty [21]. He proposed his CI as determined by the formula where n indicates the number of alternatives within the particular PCM and max denotes its maximal eigenvalue. The Advances in Operations Research 3 significant disadvantage of the PCM-CM is the fact that it can operate exclusively with reciprocal PCMs. In the case of nonreciprocal PCMs, this measure is useless (its values are meaningless) which in consequence seriously diminishes the value of the whole approach; see, e.g., [33]. It was also recently found to be incorrect; see, e.g., [4,10,53,54]. However, as mentioned earlier, there are a number of additional PCM-CMs. Some of them, as in the case of CI REV , originate from the PPs devised for the purpose of the priority ratios estimation process. Their distinct feature is the fact that all of them can operate equally efficiently in conditions where reciprocal and nonreciprocal PCMs are accepted. Probably the most known example from that set of propositions is PCM-CM proposed by Aguaron and Moreno-Jimenez [55] given by the following formula: Noticeably, there are a few definitions of PCM-CMs which are not connected with any PP and are devised on the basis of the PCM consistency definition. Koczkodaj's idea [56] attracts attention and is the first to be scrutinized. Koczkodaj's PCM-CM is grounded in his concept of triad consistency.
In order to clarify this, for any three distinguished decision alternatives A 1 , A 2 , and A 3 , there are three meaningful priority ratios, i.e., a , a , and a , which have their different locations in a particular = [ ] × . For some different i≤n, j≤n, and k≤n, the tuple (a , a , a ) is called a triad. If the matrix = [ ] × is consistent, then a a = a for all triads.
In consequence, either of the equations 1 − / = 0 and 1 − / = 0 have to be true. Taking the above into consideration, Koczkodaj proposed his measure for triad inconsistency by the following formula: Following his idea, he then proposed the following PCM-CM of any reciprocal PCM=A: where the maximum value for K(A) is taken from the set of all possible triads in the upper triangle of a given PCM. On the basis of Koczkodaj's idea of triad inconsistency, Grzybowski [5] presented his PCM-CM determined by the following formula: Finally, following the idea that ln( / ) = −ln( / ), Kazibudzki [57] redefined triad inconsistency and proposed -Two formulae for its measurement: -One meaningful formula for PCM-CM: where x denotes the formula for triad inconsistency measurement, i.e., LTI 1 or LTI 2 .
It behooves us to mention that ALTI(A) can be calculated on the basis of triads from the upper triangle of the given PCM when it is reciprocal or all triads within the given PCM when it is nonreciprocal.
As was already stated earlier, the fundamental question which should be asked by researchers who deal with the problem of priority ratios estimation quality in relation to a PCM consistency measure is as follows: Does the reduction of PCM inconsistency lead to improvement of the priority ratios estimation process quality?
The common reason why one strives to improve the consistency of the PCM, when it seems unsatisfactory, is to increase the quality of the priority ratios estimation process. However, the above question remains open and the answer to it is not evident. Even the creator of AHP stated once that improving consistency does not mean getting an answer closer to the "real" life solution [21]. It can be illustrated in the following example.
Considered is the true PV (denoting true weights of examined alternatives), i.e., =[7/20, 1/4, 1/4, 3/20] and A( ) derived from that PV, which can be presented as follows: Then two PCMs are considered, i.e., R(x) and A(x) produced by a hypothetical decision maker (DM). It is assumed that DM is very trustworthy and is able to express judgments very precisely at the same time being still somehow limited by the necessity of expressing judgments on a scale (the example utilizes Saaty's scale). In the first scenario, entries of A( ) are rounded to Saaty's scale and the entries are made reciprocal (a principal condition for a PCM in the AHP) producing It should be noted thatR(x) is perfectly consistent and A(x) is not. Tables 1 and 2 present selected values of the PPs related PCM-CMs (that is, CI REV and CI LLSM ) forR(x) and A(x) together with PVs derived from R(x) andA(x); Mean Absolute Errors (MAEs), formula (14), among * (PP) and for the case; Spearman Rank Correlation Coefficients (SRCs) among * (PP) and for the case.
Surprisingly, a very interesting phenomenon can be noted on the basis of information provided in Tables 1 and 2. The nonreciprocal version of the analyzed PCM contains nonzero values for the selected PCM-CMs. In cases similar to this example, the value of Saaty's PCM-CM always becomes negative which makes it inexplicable and in consequence useless under such circumstances (as already mentioned earlier). The other exemplary measure taken into consideration is positive and higher values than zero which indicates that particular PCM is inconsistent. On the basis of the same indicators in the case of the reciprocal version of the analyzed PCM, its perfect consistency is apparent because all selected PCM-CMs in this case are equal to zero. However, the estimation precision measures (MAE and SRC), i.e., characteristics of the particular PV estimation quality, indicate something quite opposite. Surprisingly, smaller values of MAEs are apparent, as are perfect correlations of ranks between estimated and genuine PV for nonreciprocal version of the analyzed PCM. Certainly, this conclusion concerns all analyzed PPs and it is very true in the situation when the particular PCM is less consistent (on the basis of selected exemplary PCM-CMs). It remains to be mentioned that the inconsistency problem (hence errors) does not exist for simplified pairwise comparisons [53].
Taking into consideration only such a trivial example like the one above, it becomes apparent that the relation between performance of a consistency measure and the quality of the priority ratios estimation process is of great importance. That is why to examine the phenomenon further to improve the quality of the pairwise comparisons based prioritization process was decided. Thus, the simulation framework for this purpose was adopted from [5,57] as the only way to examine the said phenomena through computer simulations. The simulation algorithm SA|K| is comprised of the following phases.

Phase 2. Randomly select an element
for x<y ofK( ) and replace it with where e is a relatively significant error, randomly drawn (uniform distribution) from the interval ∈ [2; 4]. Errors of that magnitude are basically considered "significant"; see, e.g., [50,58].
Phase 3. For each other element , i<j≤n, select randomly a value e for the relatively small error in accordance with the given probability distribution (applied in equal proportions as gamma, log-normal, truncated normal, and uniform distribution) and replace the element with the element where e is randomly drawn (uniform distribution) from the interval ∈ [0, 5; 1, 5].

Phase 4. Round all values of
for i<j ofK( ) to the nearest value of a considered scale.
Phase 6. After all replacements are done, return the value of the examined index as well as the estimate of the vector denoted as * (PP) with application of assigned prioritization procedure. Then return the mean absolute error (MAE) and mean relative error (MRE) (formula (15)) between and * (PP). Remember values computed in this phase as one record.
Phase 7. Repeat Phases from Phase 2 to Phase 6 N times.
Phase 8. Repeat Phases from Phase 1 to Phase 7 N times.

Phase 9.
Save all records as one database file.
The above algorithm allows examining the performance of the selected CM in relation to the quality of the priority ratios estimation process. Its framework resembles steps scrutinized in the example provided earlier in this paper and was thoroughly described in [5]. Thus, for brevity, it will not be analyzed in detail herein.
For formality, all parameters of the applied PDs, gamma, log-normal, truncated normal, and uniform, in the simulation algorithm SA|K| are set in such a way that the expected value EV(e )=1. The simulation begins from n=4. Simulations for n=3 are not interesting due to direct interrelation of considered PCM consistency measures; see, e.g., [58,59]. For the sake of objectivity, the simulation data is gathered in the and * (PP), MAE quantiles of the following orders, 0.05, 0.1, 0.5, 0.9, and 0.95, and relations between all of them. The application of the rounding procedure was also assumed which in this research operates according to Saaty's scale. Lastly, the scenario takes into account the compulsory assumption in conventional AHP applications, i.e., the PCM reciprocity condition. The outcome of the simulation program is presented for the most popular PP=LLSM, and the most attractive PCM-CM=K(A). It must be emphasized that other PPs and PCM-CMs were also examined but results with their application will not be presented in this research paper because this is beyond its scope. The results are based on N =20, and N =500, i.e., 10,000 cases.

Results and Discussion
One could assume that MAE and MRE quantiles of any order should monotonically increase concurrently with the growth of the selected PCM-CM, e.g., VRCM index. The same relation should occur for mean VRCM n and average MAE and MRE for VRCM n . The results of the proposed simulation framework or any other similar simulation scenario which would contradict such a pertinent relationship would unequivocally lead to the conclusion that the examined PCM-CM is not a good indicator of the priority ratios estimation process quality and may mislead in further actions towards acceptance or rejection of the derived priority ratios vector.
An examination of the research problem depicts Figures  1 and 2 which present the performance of Koczkodaj K(A) (Plots A-H). In each case hereafter a horizontal axis represents values of examined PCM-CM, and a vertical axis denotes particular estimation errors.
Noticeably, when the quality of priority ratios estimation in a pairwise comparisons based process is taken into consideration, the presented relations indicate that the performance of selected PCM-CM varies from the assumption presented earlier. The phenomenon was thus far only examined in [5]. It may seem disturbing because the relation between priority ratios estimation errors and analyzed PCM-CM indicates that the analyzed index may sometimes misinform DMs about their judgments' quality, affecting the vector of priorities which best converge with the true vector.
As seen similarly in the example provided earlier in this paper (Tables 1 and 2), taking the particular index as the measure of PCM consistency, especially when errors are small and inconsistency is not negligible, one can expect both, i.e., the improvement of priority ratios estimation quality (increase of the estimation accuracy) together with the increase of the particular CI (decrease of PCM consistency) and, inversely, the deterioration of priority ratios estimation quality (decrease of the estimation accuracy) together with the descent of the particular CI (improvement of PCM consistency).
This problem seems very troublesome especially when differences among derived priority ratios are insignificant. Then, it may occur that a ranking order of alternatives in the estimated priority vector can drastically differ from the true one because of estimation errors. From that perspective, a necessity of controlling that issue seems paramount. In order to evaluate that problem more thoroughly, detailed statistical characteristics for the examined K(A) are provided in Tables 3-6. It can be concluded that the performance of the presented PCM-CM varies (compare, e.g., [9]).
The motivation for this research was to develop a PCM-CM which could depict a more credible relation between the consistency of pairwise judgments and the priority ratios estimation quality, i.e., whose priority ratios estimation error, reflected by SRC, would be very close or equal to 1 for all their quantiles (the most desirable situation).
Successfully, a solution of the problem was generated and is presented as follows. On the basis of triad inconsistency measure introduced in [57], the following PCM-CM was devised: ( ) The proposed PCM-CM is denoted as the Triads Squared Logarithm Corrected Mean and an examination of its    performance on the basis of simulation algorithm SA|K| proposed earlier in this paper was carried out (Figures 3 and  4). As can be noticed, the proposed TSL(A) performs credibly from the perspective of the relation among the consistency of pairwise judgments and the priority ratios estimation quality. It is undeniably a positive piece of information which opens a new chapter in pairwise judgments based priority ratios estimation process embedded in many methodologies of decision making such as AHP. It behooves us to mention that TSL(A) is suitable for both reciprocal and nonreciprocal PCMs which prospectively may improve the pairwise judgments based priority ratios estimation quality when nonreciprocal PCMs are accepted.
Tables 7 and 8 provide detailed characteristics data for TSL(A) with application of LLSM (as the most popular alternative to REV), Saaty's scale, and geometric scale as the most popular preference scales. Results for other PPs are similar; thus they are not presented in order to conserve the length of this paper. However, it is stressed that other PPs and preference scales were also tested and examined during the research and their results are not depicted in this paper because they coincide with results herein presented.
It is noted that all statistical characteristics of the MAEs and MREs distribution in relation to various VRCM for i=1,. . .,15 of TSL(A) values, with few exceptions, monotonically grow. This examination ascertains that the proposed PCM-CM is a suitable measure of relation among pairwise comparisons consistency and the priority ratios estimation quality. The paramount position of the proposed TSL(A) is that it performs better than the other, evaluated here, PCM-CM, i.e., K(A). Its position is additionally strengthened by the fact that its performance is similar and independent from the applied PP and improves significantly for higher numbers of alternatives without regard to which PP is selected.
It should be noted that all characteristics presented herein are of great importance in the priority ratios estimation process, because one has to consider the potential of rejecting a "good" PCM, and vice versa, i.e., the possibility of accepting a "bad" PCM, as in the classic statistical hypothesis testing theory. However, for the first time in the course of pairwise judgments based prioritization development history, the possibility of selecting the level of certainty and basing decisions on statistical facts has been demonstrated.
For instance, considering some hypothetical PCM for n=4, with its mean TSL(A)≈0.319702 for LLSM as the PP (Table 7), one can expect with 95% confidence that the MAE should not exceed the value of 0.1208420. At the same time, one can expect with 95% confidence that it will be higher than 0.0201740 (Table 7). Whether one decides to accept such a PCM or reject it obviously depends on the quality requirements of the priority ratios estimation and the attitude regarding these errors. Indeed, the outcome of the research finally creates the potential for true consistency control in an unprecedented way, i.e., directly related to the priority ratios estimation quality. For example, the following PV is considered as =[0.27, 0.26, 0.24, 0.23] denoting DM preferences for alternatives A 1 , A 2 , A 3 , and A 4 , respectively. Taking into consideration the earlier assumed level of TSL(A)≈0.319702, the order of alternatives ranks A 1 =1, A 2 =2, A 3 =3, and A 4 =4 can be very deceptive and is rather meaningless. In such a situation one can expect with 95% confidence that the MAE>0.0201740 which makes one aware that the true rank order of examined preferences may appear otherwise, due to estimation errors related to DM inconsistency, e.g., * =[(0.27-0.025) For similar but even more detailed calculation, MRE can be applied. It is a more accurate measure of priority ratios deviation; however, its straightforward application for calculation of discrepancies within normalized priority vectors is problematic.
In order to enable other researchers to make similar analysis concerning different numbers of alternatives, the exemplary characteristics of TSL(A) performance are provided for n>4 in the Appendix of this article, computed with application of commonly known PP, i.e., LLSM and the most common Saaty's scale (Table 9) and geometric scale (Table 10) as prospective preferences scales selected by decision makers.

Conclusions
In this research, the performance of the selected PCM-CM from the perspective of its relation between pairwise judgments consistency and the quality of the priority ratios estimation process was examined with application of the most