Consistency issues in the best worst method: Measurements and thresholds

The Best-Worst Method (BWM) uses ratios of the relative importance of criteria in pairs based on the assessment done by decision-makers. When a decision-maker provides the pairwise comparisons in BWM, checking the acceptable inconsistency, to ensure the rationality of the assessments, is an important step. Although both the original and the extended versions of BWM have proposed several consistency measurements, there are some deficiencies, including: (i) the lack of a mechanism to provide immediate feedback to the decision-maker regarding the consistency of the pairwise comparisons being provided, (ii) the inability to consider the ordinal consistency into account, and (iii) the lack of consistency thresholds to determine the reliability of the results. To deal with these problems, this study starts by proposing a cardinal consistency measurement to provide immediate feedback, called the input-based consistency measurement, after which an ordinal consistency measurement is proposed to check the coherence of the order of the results (weights) against the order of the pairwise comparisons provided by the decisionmaker. Finally, a method is proposed to balance cardinal consistency ratio under ordinal-consistent and ordinal-inconsistent conditions, to determine the thresholds for the proposed and the original consistency ratios. © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
The Best Worst Method (BWM), which is a Multi-Criteria Decision Making (MCDM) method that was recently developed by Rezaei [37] , uses ratios of the relative importance of criteria in pairwise comparisons provided by a decision-maker (DM), based on two evaluation vectors: the Best criterion against the Other criteria, and the Other criteria against the Worst criterion. The weights of the criteria are obtained by solving a nonlinear [37] or a linear model [38] . Compared to one of the most popular pairwise comparison-based MCDM methods, Analytic Hierarchy Process (AHP), BWM requires fewer comparison data, while being able to generate more consistent comparisons, allowing it to produce more reliable results according to previous analyses [37] . Thanks to its simplicity and reliability, BWM has been widely applied to address a host of different problems [29,39,49] . For more detailed ✩ This manuscript was processed by associate editor Dias. * Corresponding author. information, readers are referred to a recent survey on the BWM [32] .
BWM and other pairwise comparisons methods, like AHP and ANP (Analytical Network Process), are based on a DM's evaluations of the relative priorities of the decision-making elements as captured in a complete pairwise comparison matrix [41] , incomplete pairwise comparison matrix [19] or vectors [37] . One of the advantages of using pairwise comparisons is that they allow us to estimate the inconsistency of a DM's preferences. Usually, the consistency level of the judgements is related to the rationality of the DM and his/her ability to discriminate between criteria/alternatives [21] . The DM's judgments have to meet the cardinal transitivity condition to be perfectly consistent; otherwise, the DM is not fully consistent, which may imply some irrationality in the relative weight estimates.
To check how inconsistent (deviating from the condition of full consistency) a full set of pairwise comparisons may be, Saaty [40] , in their seminal work on the AHP, proposed a consistency measurement (Saaty index), but since then, many other consistency indices have been proposed [10] . Basically, the existing consistency measurements can be divided into two groups: the inputbased measurements and output-based measurements [28] . The

ARTICLE IN PRESS
JID: OME [m5G; January 14, 2020; 1:21 ] measurements in the former group are based on the input, i.e. preferences assigned to pairwise comparisons, e.g. Koczkodaj index [24] , while the output-based consistency measurements are based on the weights or rankings. In this group, there are, for instance, Saaty's index [40] and the geometric consistency index proposed by Crawford and Williams [13] . The consistency measurements mentioned above were initially designed for complete pairwise comparison matrices and we cannot use them to measure the consistency degree of incomplete pairwise comparison matrices where some judgments are missing [19] . To adapt the consistency indices to incomplete pairwise comparison matrices, one of the most popular approaches is to complete the pairwise comparison matrices [16,47] and then measure their consistency in the traditional manner [19,28] . Instead of completing the matrix, a graph-theoretic approach can be used to generate all possible preferences by enumerating all spanning trees, after which the variance of these preferences can be used as a measure of inconsistency [6,31,44] . Replacing triads with cycles [28] is another way to estimate the inconsistency.
One might see BWM as a special case of incomplete pairwise comparison matrix. Although the method only uses a specific subset of 2n-3 comparisons gathered in two representative vectors, these preferences can be represented equivalently by an incomplete pairwise comparison matrix. One could argue that we could then complete the two vectors to create a full matrix and measure the inconsistency by using the approaches mentioned above. However, not only will that make the measurement more difficult (unrealistic), it will also destroy the simplification (non-redundancy) philosophy embedded in BWM. Therefore, to check the consistency by using this specific method, Rezaei [37] proposed a consistency measurement (sometimes referred to as inconsistency measurement) in the original version of BWM. Later, the extended BWM methods also provided corresponding consistency measurements similar to the original consistency measurement. For example, Mou et al. [35] extended BWM to include intuitionistic fuzzy multiplicative preference relations, and provided a new definition for the consistency algorithm to check consistency, while Guo and Zhao [18] proposed a consistency ratio (also referred to as inconsistency ratio) for fuzzy BWM, and Aboutorab et al. [1] explained a corresponding consistency ratio for the Z-numbers BWM.
However, the existing studies on BWM lack a metric/tool to provide the DM/analyst with immediate feedback regarding the consistency of the pairwise comparisons. The consistency ratios obtained by the existing consistency measurements of BWM are based on the outputs instead of directly on the inputs. A DM can only obtain the consistency ratio and check the consistency after the entire optimization process is completed, by using the existing consistency measurements. However, it has been shown that confronting the DM with the inconsistencies in his/her assessments after he/she has already gone through the entire elicitation process is ineffective [34] . In addition, the consistency ratios obtained by the original BWM, graph-theoretic approach [6,31,44] and the methods of replacing triads with cycles [28] are overall indicators that show the consistency of the pairwise comparison system as a whole, so they cannot help the DM locate their most inconsistent judgments. A proper consistency measurement should indeed assist the DM in identifying the most inconsistent comparisons [14] and achieve sufficiently consistent preferences [17,36] . Although some inputbased consistency measurements for general incomplete pairwise comparison matrices, including the Koczkodaj index [24] and the Salo and Hämäläinen index [43] , can be applied to BWM, some of their properties are not as desirable as we expected, as discussed in Section 3 .
Moreover, the existing studies on consistency measurement in the BWM thus far fail to take ordinal consistency into consideration. Consistency in pairwise comparisons can be divided into two categories: cardinal consistency and ordinal consistency [45] . The existing consistency ratios of BWM only measure cardinal consistency. However, even if the judgements have a high level of cardinal consistency, they can be still contradictory, according to the research of Kwiesielewicz and Van Uden [30] . The contradiction is caused by the violation of ordinal consistency, i.e. there is a discrepancy in the criteria importance rankings obtained from the two pairwise comparison vectors in BWM. If the preferences are ordinal-consistent, the final ranking will not change with the cardinal consistency ratio, only the intensity could vary; but if they are ordinal-inconsistent, a change in the cardinal consistency ratio could affect the final ranking [45] . Thus, in order to ensure a DM provides a stable judgement, it is important to check his/her ordinal consistency status, and indicate to what extent the ordinal consistency has been violated. There are several ordinal consistency measurements for the complete pairwise comparison matrices, like the ordinal coefficient proposed by Jensen and Hicks [22] , the dissonance measurement proposed by Siraj et al. [45,46] . However, they cannot be applied to incomplete pairwise comparison matrices or the two vectors used in BWM.
Furthermore, there is no threshold for the consistency ratio of BWM in existing literature. Although BWM has been widely used and the consistency measurements help a DM check the reliability of his/her preferences, the absence of threshold associated with the existing consistency measurements makes it hard to provide a meaningful interpretation. Without a consistency threshold, the DM/analyst is left with the major problem of having to decide when his/her judgments should be revised and when it should be accepted, not to mention the consideration of the number of criteria and the scale of evaluation, making the situation even more complicated. The 10% rule of thumb of AHP has long been criticised [5,7,33] , and even Saaty later suggested additional threshold values of 5% and 8% for 3 and 4 criteria, respectively [42] . Although some other methods have been proposed to determine consistency thresholds [2,4,33] , most of them are applied in complete pairwise comparison matrices, which cannot be used directly for incomplete pairwise comparison matrices. Thus, designing a threshold determination algorithm for BWM can fill this gap.
As such, the contribution of this study is threefold: (i) Developing a mechanism designed to provide a DM with immediate feedback regarding his/her consistency status and making the elicitation process more effective. To this end, we propose an input-based consistency measurement, which is simple to use and has several desirable properties; (ii) Developing an ordinal consistency ratio that shows a DM's violation level involving ordinal consistency and complements the cardinal consistency measurement. With this ratio, a DM can revise his/her judgments to meet the ordinal consistency condition, which is a minimum requirement for a logical and rational DM; (iii) The most significant contribution of this study is to establish thresholds for the consistency ratios (the proposed consistency ratio and the original consistency ratio) used in BWM.
The remainder of the paper is structured as follows: In Section 2 , the original BWM and its consistency measurement are introduced. An input-based consistency ratio is proposed as an alternative to replace the original output-based consistency ratio in Section 3 . An ordinal consistency measurement is formulated in Section 4 . The threshold tables are presented in Section 5 , followed by the conclusion in Section 6 .

The best worst method and consistency measurement
In this part, the basic steps of the original BWM are briefly introduced, and the original output-based consistency measurement is reviewed.

The basic steps of BWM
As a pairwise comparison method, BWM uses ratios of the relative importance of criteria in pairs estimated by a DM, from the two evaluation vectors, A BO and A OW . The weights of the criteria can be obtained by solving the linear or nonlinear program [38] . The basic steps of the original BWM can be summarized as below: Step 1 Have the set of evaluation criteria { C 1 , C 2 , · · · , C n } determined by the DM.
Step 2 Have the best (e.g. the most influential or important) and the worst (e.g. the least influential or important) criteria determined by the DM.
Step 3 Determine the preferences of the best over all the other criteria using a number from { 1 , 2 , . . . , 9 } . The obtained Best-to-Others vector is: A BO = ( a B 1 , a B 2 , · · · , a Bn ) , where a B j represents the preference of the best criterion C B over criterion C j , j = 1 , 2 , · · · , n .
Step 4 Determine the preferences of all the criteria over the worst criterion using a number from The obtained Others-to-Worst vector is: A OW = ( a 1 W , a 2 W , · · · , a nW ) , where a jW represents the preference of criterion C j over the worst criterion C W , j = 1 , 2 , · · · , n .
Step 5 Determine the weights ( w * 1 , w * 2 , · · · , w * n ) by solving the following model: Model (1) can be transformed into the following model:

The original consistency measurement
In the remainder of this paper, when we talk about a pairwise comparison system, we will refer to the set of judgments contained in vectors A BO and A OW . Given this notion, we are able to provide the definition of cardinal consistency for the set of preferences contained in a pairwise comparison system.
where a BW is the preference of the best criterion over the worst criterion.
However, it is common practice to allow a pairwise comparison system to deviate, to some extent, from the condition of cardinalconsistency. Thus, a consistency ratio is necessary to indicate how inconsistent a DM is. The consistency measurement proposed in the original BWM is based on ξ * , which is the optimal objective value (the output) of the optimization model (2) , so we call it an output-based consistency measurement (we will use an outputbased consistency measurement instead of using the original consistency measurement in the remainder of the paper). The ratio used to indicate the consistency level is called Output-based Consistency Ratio , noted as C R O (we will use output-based consistency ratio or C R O to represent original consistency ratio from now on), was defined as follows [37] : where ξ * is the optimal objective value of model (2) and ξ max is the maximum possible ξ , which can be derived from [37] : The . The closer C R O is to 0, the more consistent the judgments are. In particular, C R O = 0 means that the comparisons are cardinally consistent.

The proposed consistency measurement
The consistency ratio proposed in the original BWM can only be obtained after the entire elicitation process has finished, which means it cannot provide a DM with immediate feedback involving his/her consistency. To overcome this problem and to provide a DM with a clear and immediate idea of his/her consistency level, we propose an input-based consistency measurement for BWM that is easy to compute and has clear and simple algebraic meaning and interpretation. Furthermore, we will see that it has several desirable properties (in comparison to the existing indices) and a high correlation with the output-based consistency measurement.
In accordance with the original index, the new inconsistency index proposed in the following section only attains value 1 when, given a BW , there exists a C j such that a B j = a jW = a BW . This is possible because the index considers the maximum violation of local inconsistencies and the value 1 can actually be attained. None of the indices studied by Kułakowski and Talaga [28] has this property. Besides this similarity, we will also show the resemblance between the old and the new index using some numerical analyses.

The input-based consistency ratio
In contrast to the Output-based Consistency Ratio ( C R O ), the ratio we propose in this paper can immediately indicate a DM's consistency level by using the input he/she provides, i.e. his/her preferences, instead of going through the entire optimization process, which is why it is called an Input-based Consistency Ratio ( C R I ): Definition 3 (Input-based Consistency Ratio) . The Input-based Consistency Ratio C R I is formulated as follows: C R I is the global input-based consistency ratio for all criteria, CR I j represents the local consistency level associated with criterion C j .
Compared to the output-based consistency measurement, the input-based consistency measurement has several advantages:  1. It can provide immediate feedback. The input-based consistency measurement is based on the input (preferences), which means it is not necessary to complete the entire elicitation process. The output-based consistency measurement on the other hand, is based on the output (weights), making it a difficult way to determine the consistency level. By using the simple calculation of the input-based consistency measurement, it is easy to provide a DM with immediate feedback. 2. It is easy to interpret: it is the maximum normalized discrepancy between the value of a BW and its estimated value calculated as the indirect comparison a B j × a jW . 3. It can provide a DM with a clear guideline on the revision of the inconsistent judgement(s). The CR O indicates the global consistency level, but it cannot show the DM which judgement should be revised. The local CR I , however, displays the consistency levels associated to individual criteria; after identifying the maximum local CR I , the most inconsistent judgement can be located, after which a DM can revise his/her judgements accordingly, instead of modifying them without a guideline. 4. It is model-independent. This CR I can be applied independently to measure the consistency level in various form of BWM models, e.g. a non-linear or linear model, or a multiplicative model [11] . For example, the linear BWM model [38] does not have an effective consistency measurement, while the non-linear BWM model [37] has a different interpretation than the multiplicative BWM model [11] . By using the input-based consistency ratio, however, they are the same in all three models. Actually, the input-based consistency measurement does not depend on the optimization models.

Example 1.
To illustrate the proposed consistency measurement, we adopt the car evaluation example from the original BWM [38] , in which the best criterion is price and the worst criterion style. The pairwise comparisons vectors of A BO and A OW are presented in the second and third rows respectively. By using the input-based consistency measurement in Eq. (7) , the CR I j s are represented in the last row of Table 1 .
From Table 1 , by using the maximum measurement (6) , we can obtain the global C R I , 0.14. One of the advantages of the inputbased consistency measurement is that we can immediately locate the most inconsistent pairwise comparison from this table, which in this case is the preferences regarding the criterion comfort . If the C R I is too high, the DM's preferences have to be modified .

Properties of the input-based consistency measurement
As indicated by Brunelli [8] , it is important that formal properties of inconsistency indices be investigated to check their technical soundness and rule out possible unreasonable behaviours. The next proposition will show that C R I satisfies a number of reasonable properties. Proposition 1. The proposed consistency measurement, C R I = max j CR I j satisfies the following properties:  (3) when a BW = 1 , C R I = 0 ; when a BW > 1 , a BW a BW − a BW > 0 and (4) | a B j a jW − a BW | ≤ a BW a BW − a BW , because when the left-hand side a B j a jW ≥ a BW , a B j a jW ≤ a BW a BW , a B j a jW − a BW ≤ a BW a BW − a BW , so the inequality holds; when the left-hand side a B j a jW < a BW , then a BW should be larger than 2, and the right-hand side a BW a BW − a BW ≥ a BW , therefore a BW − a B j a jW ≤ a BW a BW − a BW , the inequality holds also. 4. For each j = B, W , we want to study the reaction of C R I (S) to changes in a single comparison in the range [ 1 , a BW ] . In this case 1 ≤ a jW , a B j ≤ a BW , and we can consider a BW a constant. Let us consider the effect of a variation of a B j in C R I by taking its partial derivative We can see that with minimum in the consistent case ( a B j a jW = a BW ). The same conclusion follows if we consider a jW instead of a B j . 5. Straightforward. C R I is a continuous function for all a BW > 1 . 6. If we assume that the criterion which is eliminated, say C i , is neither the best nor the worst, then a BW remains unchanged and we can define a new set S −i which disregards C i :

ARTICLE IN PRESS
JID: OME [m5G; January 14, 2020;1:21 ] Note that these properties are adaptations of well-known properties already proposed and justified in the framework of pairwise comparison matrices. In particular, Properties 1, 2, 4 and 5 stem from those proposed by Brunelli and Fedrizzi [9] , Property 3 from the normalization proposed by Koczkodaj et al. [26] , and Property 6 from the contraction property proposed by Koczkodaj and Urban [25] .
It is worth mentioning that Property 6 would not be satisfied by an approach based on the average of the local inconsistencies like Salo and Hämäläinen index [43] .

Relationship between the input-based and output-based consistency ratio
In the input-based consistency measurement, when the number of criteria larger than 2, for two pairwise comparisons, a B j and a jW ∈ { 1 , 2 , . . . , 9 } , the relationship between them and their corresponding C R I s is shown in Fig. 1 (a). Likewise, we can calculate the relationship between a B j , a jW ∈ { 1 , 2 , . . . , 9 } and their C R O s for the output-based consistency measurement in BWM, which is shown in Fig. 1 (b).
It is clear that these two relationship figures have similar shapes, which indicates they should have a high correlation.
To determine the agreement between these two indices, we analyse them from a statistical perspective by numerical simulations. Firstly, we randomly generated a set of 20,0 0 0 pairs of pairwise comparison vectors ( A BO and A OW ) in a 9 criteria problem with 1-9 scales to represent the preferences provided by DMs in BWM. Then we computed the input-based consistency ratios and the output-based consistency ratios ( C R I , C R O ) for each pair of vectors in this 20,0 0 0 random pairs set. Each pair ( C R I , C R O ) is represented by a point in the scatter plot in Fig. 2 .
As a B j , a jW ∈ { 1 , 2 , . . . , 9 } take values from a discrete scale, the possible C R O s and C R I s are limited. Thus, although we have obtained 20,0 0 0 C R O s and C R I s , they distribute only in these limited possibilities, which is why there are much fewer than 20,0 0 0 dots in this scatter plot.
We compute the Pearson's correlation coefficient between C R O s and C R I s to check the linear correlation between them. The result of Pearson's correlation coefficient in this case is 0.9942, which means these C R O s and C R I s have a very high linear correlation. We also consider the Spearman index to measure the extent to which C R O s and C R I s are co-monotone. The result of the Spearman index is 0.9963, which means these two variables are highly monotonically related.
When we calculate all the Pearson's and Spearman's correlation coefficients with respect to 3-9 criteria under maximal scale from 3 to 9, the minimum Pearson's and the minimum Spearman's correlation coefficients are 0.979 and 0.958, respectively. As such, based on these high correlation coefficients, the input-based consistency measurement and the output-based consistency measurement have a very good agreement, so they could be used interchangeably. Nevertheless, due to its advantages discussed in Section 3.1 , there are valid reasons to prefer the input-based consistency measurement to the output-based consistency measurement.

Ordinal consistency measurement
In this section, an ordinal consistency ratio is proposed to determine the extent to which a DM violates the ordinal consistency. Some properties for this ratio are presented and the relationship between ordinal consistency and cardinal consistency is analysed.

Ordinal consistency
Kwiesielewicz and Van Uden [30] have shown that, even if a pairwise comparison matrix passes the consistency test, it can still be contradictory. Therefore, in addition to calculating the cardinal consistency, it is also important to check whether the rankings of the criteria obtained from the two pairwise comparison vectors A BO and A OW are the same in BWM, in what we call ordinal consistency condition . The meaning of ordinal consistency in BWM is slightly different from that in early studies, which is mainly based on the circular triads [20,23,27] . We define the ordinal consistency in BWM as below: Definition 4 (Ordinal consistency) . In the BWM, a pairwise comparison system is said to be ordinal-consistent if the order relations of the two paired comparison vectors ( A BO and A OW ) are the same. That is, the following conditions should be satisfied: a Bi − a B j × a jW − a iW > 0 or a Bi = a B j = a iw , for all i and j. (8) The ordinal consistency is the usual weak transitivity condition which should be the minimum requirement for a logical and rational DM [48] . Intuitively, one might consider ordinal consistency to be easily satisfied, but that is not true, especially when the number of criteria is large. To see how it develops, we randomly generated 10 0,0 0 0 paired vectors for each combination of criteria number from 3 to 9 to simulate the preferences for BWM. After categorizing, we can see the percentage of ordinal-consistent pairs is reduced dramatically as the number of criteria increases, as shown in Fig. 3 . In reality, the situation is better than the randomly generated vectors, but after checking the data used in the original BWM, we found that only 24.4% of them are ordinal-consistent.

Ordinal consistency ratio
Since the ordinal consistency has a vital impact on the ranking of the criteria, it is necessary to check whether the preferences violate the ordinal consistency, and, if so, to what extent. To do so, we need to define an index, which we call Ordinal Consistency Ratio (hereafter simply OR) in this study.

Definition 5 (Ordinal Consistency Ratio) . The Ordinal Consistency
Ratio OR of a pairwise comparison system is defined as: F a Bi − a B j × a jW − a iW , for all i and j (10) where F (x ) is a step function defined as: The rationale of O R j formulation is that if criterion C j overweighs criterion C i , then the ordinal consistency should satisfy a Bi > a B j and a jW > a iW , i.e. ( a Bi − a B j ) × ( a jW − a iW ) > 0 . If only one of ( a Bi − a B j ) and ( a jW − a iW ) is equal to 0, we say that, in this situation, it violates weak ordinal relation [12,15] , but if both are equal to 0, it is ordinal-consistent.
O R j is called local ordinal consistency ratio, indicating the degree of consistency with respect to the jth criterion. With this ordinal consistency ratio ( O R j ∈ [ 0 , 1 ] ), we can find out which criterion violates the relative order (and to what extent), and the higher the O R j is, the more contradictory the preferences has regarding this criterion C j .
OR is called global ordinal consistency ratio, which reflects the ordinal consistency of the pairwise comparison system provided by the DM.

Example 2.
We use the car evaluation preferences example from the original BWM again (showed in the Example 1 in Section 3.1 ) to explain the ordinal consistency measurement. From the preference vector A BO , we can easily get the ranking of the criteria: price quality safety comfort style. The ranking from the A WO vector: price quality ∼ comfort safety style (" " means superior to, "∼" means indifferent to). The orders of the criteria are different in these two vectors, thus the preferences of this DM violate the ordinal consistency. By using the ordinal consistency measurement from Eqs. (9) -(11) , we can obtain the ordinal consistency ratios regarding each criterion in Table 2 , which represent the ordinal violation level of each criterion. The global ordinal consistency ratios can be calculated from Eq. (9) , which is 0.3 in this case.
Combining the cardinal and ordinal consistency ratios, a DM can check his/her rationality during the preference elicitation process. This immediate feedback helps the DM confronts his/her inconsistencies as soon as they arise, making this process more effective [34] .

Properties of the ordinal consistency ratio
The index OR ( Eq. (9) ) satisfies three basic properties. To enunciate the properties, we need to acknowledge that each vector A BO and A OW induces an order relation on the set of criteria. That is to say, for example, a Bi > a B j ⇒ i ≺ j and a iW = a jW ⇒ i ∼ j.  Since these properties are similar to those in Proposition 1 , the associated proof is omitted for the sake of brevity.

The relationship between ordinal consistency and cardinal consistency
Analysing the data used in the original BWM [37,38] , we can obtain the inclusion relation between cardinal and ordinal (in)consistency of the preferences obtained from different DMs, which is graphically presented in Fig. 4 . For example, the pairwise comparison system with cardinal consistency is a subset of which, with ordinal consistency, the ordinal inconsistent system is a subset of cardinal inconsistency.
The inclusion relation between cardinal consistency and ordinal consistency shown in Fig. 4 is formalized in the Proposition 2 and Corollary 1 .

Proposition 2. If a pairwise comparison system is cardinalconsistent, it must be ordinal-consistent .
Proof. Taking the cardinal consistency condition ( a Bi × a iW = a BW , a B j × a jW = a BW , where a Bi , a iW , a B j , a jW , a BW ≥ 1 ), and ordinal consistency condition ( ( a Bi − a B j ) × ( a jW − a iW ) > 0 or ( a Bi = a B j & a jW = a iW ) ), we shall show that, given a pairwise comparison system, cardinal consistency implies either (1).
( a Bi − a B j ) × ( a jW − a iW ) > 0 or (2). a Bi = a B j & a jW = a iW .
(1) If a Bi = a B j , then a jW = a BW a B j = a BW a Bi = a iW , vice versa, the comparison is ordinal-consistent; (2) If a Bi = a B j , or a jW = a iW , ( a Bi − a B j ) × ( a jW − a iW ) = a Bi × a jW − a B j × a jW − a Bi × a iW + a B j × a iW From the notion of cardinal consistency, we know that: Therefore, the comparison is also ordinal-consistent.

Corollary 1.
If a pairwise comparison system is ordinal-inconsistent, it must be cardinal-inconsistent .

Thresholds for BWM
Even though we can easily identify the inconsistent judgement by using the consistency measurements proposed in this study, requiring the DM to achieve perfect cardinal and ordinal consistency is unrealistic. However, the question involving the degree to which inconsistency can be accepted has far been lacking in the study of BWM. As such, to bridge this gap, a threshold has to be defined. In the following section, based on the concept of ordinal and cardinal consistency measurement, a method to derive consistency thresholds is proposed.

A methodology for determining the thresholds
Inspired by Amenta et al. [3,4] , we develop a method for determining the thresholds for BWM, which is based on the cardinal consistency measurement and the definition of ordinal consistency. The thresholds for BWM are established, not only for the input-based consistency measurement, but also for the outputbased consistency measurement. However, we use the input-based consistency ratio ( C R I ) to illustrate this approach.
The basic idea is that, based on the concept of ordinal consistency, if a decision-maker is ordinal-consistent, the ranking of the final weights obtained from the two preference vectors ( A BO and A OW ) will not change with C R I , only the intensities may vary. In this sense, we can suggest that the preferences provided by the DM are reliable.
We use Monte-Carlo method to simulate the probability distribution of C R I s . In this study, we analyse the entire problem space covering the weighting problems, with the number of criteria ranging from 3 to 9, and where the preferences can be assigned with the largest evaluation grade from 3 to 9, we call them 3-scale to 9-scale. 1 Consequently, in all, there are 7 × 7 = 49 combinations to be analysed. For each combination, we randomly generated 10,0 0 0 pairs of ordinal-consistent vectors, each pair acting as the two vectors A BO and A OW . We categorized this group as an acceptable group , and calculated all the C R I s of this group. Likewise, we randomly generated 10,0 0 0 pairs of ordinal-inconsistent vectors and calculated their C R I s , which is categorized as an unacceptable group .
Theoretically, we can obtain all the possible C R I s of the acceptable group in each situation, taking the maximum as a boundary (boundary 1), the C R I s above this boundary are not acceptable, because they can only be ordinal-inconsistent. Although, practically, it is very difficult to traverse all the possibilities, we still assume that the maximum C R I from 10,0 0 0 pair of vectors as the boundary 1, because the likelihood of having a higher value than this boundary is very low. For example, the maximum consistency value of 9criterion and 9-scale ordinal-consistent pairwise comparison vectors is 0.7639, which means that, for any judgments whose C R I s are bigger than this value in a 9-criterion and 9-scale size problem, they should be rejected.
However, that does not automatically mean that the C R I s within that boundary are necessarily acceptable, because they could still be ordinal-inconsistent, and ordinal inconsistency is what we set out to reject. Based on this idea, the minimum C R I could be used as a boundary (boundary 2), all of the C R I s within this boundary are acceptable. For example, the minimum consistency value of 9criterion and 9-scale ordinal-inconsistent paired vectors is 0.0694, if the C R I s obtained are smaller than this boundary, they should be accepted.
Values of C R I greater than boundary 1 are assumed to be totally unacceptable, while values below boundary 2 are assumed totally

ARTICLE IN PRESS
JID: OME [m5G; January 14, 2020;1:21 ] acceptable. Between boundary 1 and 2, we expect that there exists a threshold, making the proportion of ordinal inconsistency we accept as small as possible, and beyond the threshold, the proportion of ordinal consistency we reject should be as small as possible. In statistical terms, our goal is to minimize the sum of Type I error (false positive) and Type II error (false negative). This idea can be more clearly visualized in a kernel smoothing distribution in 9-criteria and 9-scale combination, as shown in Fig. 5 . From the idea explained above, the empirical cumulative distribution function can be used to achieve our purpose.  (12) where I{ } is the indicator function: and N is the pair number of pairwise comparisons, CR I i is the i th ( i ∈ { 1 , · · · , N } ) input-based consistency ratio obtained from this N pairs of preferences, α ∈ [ 0 , 1 ] is the possible threshold.
We now distinguish the distribution function based on two groups: (1) for the Acceptable group, the cumulative distribution of C R I in ordinal-consistent situation is denoted as ˆ F A (α) ; (2) for the Unacceptable group, the cumulative distribution of C R I in ordinalinconsistent situation is denoted as ˆ F U (α) . The rejected part of the ordinal-consistent group is 1 −ˆ F A (α) , which can be seen in the blue area B in Fig. 5 , and the accepted ordinal-inconsistent group is ˆ F U (α) , which is the red area R. We can calculate the relative rejected proportion of the C R I s in the acceptable group ( P A re jected ) and the accepted proportion of the C R I s in the unacceptable group ( P U accepted ) using the following formulas: The relationship between these two proportions is shown in Fig. 6 , which shows how the possibility of acceptance (red line with squares) and rejection (blue line with circles) distribute in the two groups according to the selected threshold from 0 to 1.
The goal is to obtain a threshold which makes the red and blue areas in Fig. 5 as small as possible, or makes the relative proportions of the two groups in Fig. 6 as close as possible. If there exists a C R I obtained from the two groups which makes P A re jected = P U accepted , the two lines in Fig. 6 will intersect at that point, which means that the proportion of rejection in the acceptable group and the proportion of acceptance in the unacceptable group are the same. However, as the obtained C R I s are discrete, there could be no C R I at the intersection point, which means that we need to find out the intersecting coordinate of the two lines, using the corresponding C R I as the threshold. The simulation algorithm for obtaining the threshold is illustrated in the Appendix .

Approximated thresholds for the input-based consistency ratio
Based on the algorithm presented above, we can finally establish the thresholds for BWM. In Table 3 , we have obtained the consistency thresholds for combinations which range from 3-9 criteria

Table 3
Thresholds for different combinations using input-based consistency measurement. with highest evaluation grades from 3 to 9 based on the inputbased consistency measurement. The thresholds in the combinations with 3-criteria and the combinations with 3-scale are relatively special. The thresholds in 3-scale problem remain unchanged even the number of criterion changes, because, no matter how many criteria there are, the maximum C R I in the acceptable group and the minimum C R I in the unacceptable group are equal to 0.1667. In most other cases, we can see that the thresholds have a tendency to increase along with the number of criteria and with the scale of the preferences, as shown in Fig. 7 . 2

Approximated thresholds for the output-based consistency ratio
By using the same algorithm in the Appendix , we can also determine the thresholds for the C R O in different combinations, as shown in Table 4 . 3 Compared to the thresholds obtained from the input-based consistency measurement, the thresholds of the output-based consistency measurement are slightly higher.
Finally, by using the approximated consistency thresholds obtained above, we can check whether or not the consistency of the DM is acceptable. For instance, since the overall C R I in the illustrative example in Section 3.1 is 0.14, which is less than the threshold of 0.2958 (in 5-criteria and 8-scale combination), as shown in Table 3 , it is acceptable. If we use C R O , which is 0.223, we can see that it is also below the threshold of 0.4029, as shown in Table 4 . 2 The combinations with 2-scale for the C R I are not shown in Table 3 and Fig. 7 , but it is worth mentioning that the threshold should be 0 in this case, because, when the preferences are ordinal-consistent, C R I = 0 . Therefore, the DM should revise his or her preferences when C R I > 0 . 3 The threshold for the C R O in the combinations with 2-scale is 0, because when the preferences are ordinal-consistent, C R O = 0 .

Table 4
Threshold for different combinations using output-based consistency measurement. Thanks to these thresholds, C R I and C R O now have a meaningful interpretation, because we can now determine whether they are acceptable or not. The thresholds for C R I can help a DM check his/her pairwise comparisons before solving the optimization program.

Conclusion
In this paper, we addressed the consistency issue in BWM. First, we argued that the output-based consistency measurement in BWM cannot provide immediate feedback to a DM, and only informs the DM about any inconsistencies in his/her assessments after the entire elicitation process has finished, which has been proven to be ineffective. In addition, existing consistency indices designed for the incomplete pairwise comparison matrices are not as desirable as we expected. To remedy that state of affairs, we propose an input-based consistency ratio, which has a number of desirable properties and a high correlation to the original ratio, to indicate the DM's consistency status during the preference elicitation process. This input-based consistency ratio is simple and is easy for a DM to identify his/her most inconsistent judgments. Then, to complement the cardinal consistency measurement, we proposed an ordinal consistency measurement to explicate the possible contradictions even in cases where the cardinal consistency of a DM's pairwise comparisons is considered to be good enough. This ratio not only shows how much a DM violates the ordinal consistency, but also provides a convenient way to identify and correct the conflicts involved. Finally, with the help of Monte-Carlo simulations, we determined the thresholds for the inputbased and output-based consistency ratios in different scales with different numbers of criteria. The idea is to balance the ordinal consistency and inconsistency, making the portion of the cardinal consistency ratios that violate ordinal consistency to be accepted as small as possible and the portion of the cardinal consistency ratios that satisfy ordinal consistency to be rejected as small as possible. With these thresholds, a DM can decide whether or not to revise his/her earlier assessments. And because the input-based consistency measurement can indicate the consistency level regarding each criterion, it can be used in the preference revision process.
The method of determining the thresholds only considers whether the judgments are ordinal-consistent or not and has not taken the violation level into account. This will be examined in future studies. Similarly to the approach what was adopted in this paper, this method can also be applied to fuzzy consistency measurements to determine their corresponding thresholds.