International Journal of Approximate Reasoning

One natural way to express preferences over items is to represent them in the form of pairwise comparisons, from which a model is learned in order to predict further preferences. In this setting, if an item a is preferred to the item b , then it is natural to consider that the preference still holds after multiplying both vectors by a positive scalar (e.g., 2 a (cid:3) 2 b ). Such invariance to scaling is satisﬁed in maximum margin learning approaches for pairs of test vectors, but not for the preference input pairs, i.e., scaling the inputs in a different way could result in a different preference relation being learned. In addition to the scaling of preference inputs, maximum margin methods are also sensitive to the way used for normalizing (scaling) the features, which is an essential pre-processing phase for these methods. In this paper, we deﬁne and analyse more cautious preference relations that are invariant to the scaling of features, or preference inputs, or both simultaneously; this leads to computational methods for testing dominance with respect to the induced relations, and for generating optimal solutions (i.e., best items) among a set of alternatives. In our experiments, we compare the relations and their associated optimality sets based on their decisiveness, computation time and cardinality of the optimal set. © 2020 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
There is a growing trend towards personalisation for services in many real-world application domains, such as ecommerce, marketing, and entertainment. This involves capturing user preferences over alternative choices, e.g., products, movies and hotels. One may view this as an enhanced variation of supervised learning, known as preference learning, where instead of tagging an instance with a single label, preference relations are expressed over instances [1,2]. One natural way to express preferences over items is to represent them in the form of pairwise comparisons, stating that one alternative a is preferred over another one b, where an alternative is associated with a feature vector, i.e., a vector of values for a number of features.
An established approach for modelling preferences makes use of the concept of a utility function that is learned from preference input pairs. Then, for a pair of test vectors (α, β), this function assigns an abstract degree of utility to each test vector, implying which test vector is preferred to which [3]. Support Vector Machine (SVM) approaches [4][5][6] have inspired the development of several methods for learning the utility function, such as OrderSVM [7], SVOR [8] and SVMRank [9].
In a method such as SVMRank, when the utility function has been learned, rescaling a pair of test vectors makes no difference to the result, i.e., α is preferred to β if and only if rα is preferred to rβ for any strictly positive scale factor r.
The same does not hold for the input pairs: different ways of scaling preference input pairs may lead to a very different We examine the relationships between these relations, and analyse two natural ways of defining the best solutions among an input, with respect to each of these relations. As the relations are based on the assumption that the input pairs are consistent, we briefly discuss three possible approaches to generate a consistent preference input set.
Our experimental testing involves derivatives of two real databases; the experiments compare the different relations based on (a) the number of test pairs in which one dominates the other; (b) the number of optimal solutions found according to the defined optimality operators; and (c) the computation time.
Summary of contributions It is clearly important that the information in decision support systems is reliable and trustworthy. Although SVM-based (maximum margin) approaches to learning preferences are attractive and well-founded, they can also be very sensitive to the form of the preference inputs. For instance, the choice of feature domains can be somewhat arbitrary, but can significantly affect the result. The paper considers different forms of robustness for preference learning; in particular, we developed three novel robust preference learning techniques; an approach that is invariant to the rescaling of the features' domains; one that is invariant to rescaling of the input vectors; and a method that expresses both kinds of invariance. For each of these, we develop characterisations that lead to computational methods. We also have developed a computational approach for testing when maximum margin preference relation is invariant to feature domain rescaling. We analyse relationships between the different forms of preference relations, and, based on the computational characterisations, we implemented and tested them, comparing the relative numbers of optimal solutions, according to two natural kinds of optimality. We demonstrate that the methods are all different, and that they can be feasible computationally, and do not necessarily lead to large sets of optimal solutions. The rest of the paper is organised as follows. The next section introduces the terminology being used throughout the paper and explains two preliminary preference relations, namely the consistency-based relation and the maximum margin relation. Section 3 considers the effect of rescaling of preference input pairs, and characterises a preference relation that is invariant to the scaling of preference input pairs. Similarly, the two other relations, where features are rescaled and where both features and preference inputs are rescaled simultaneously, are characterised in Sections 4 and 5 respectively. We discuss three possible approaches to deal with inconsistencies in Section 6. The characterisations of relations lead to the computational methods in Section 7 for testing dominance with respect to the induced relations. In Section 8, we consider two kinds of optimality operator to choose a subset of alternatives as optimal solutions with regard to each preference approach. We report the experimental results in Section 9; Section 10 concludes, with a discussion of potential extensions. The appendix includes all the proofs of the formal results that are not included in the main body of the paper.
This work includes and extends work in two conference papers [45,46].

Preliminaries
In this section, we describe some notation and two preference relations that provide a basis for the following sections. Since there are inevitably many symbols and results to keep track of, Table 1 includes a glossary of symbols.
We assume that some user has told us that she prefers feature vector a i ∈ IR n over b i ∈ IR n , for each i ∈ I = {1, . . . ,m}. Each tuple a i or b i in IR n represents an alternative that is characterised by n features, with a i (k) being the score for alternative a i regarding the kth feature.. 1 By assuming a linear weighting model, each pair (a i , b i ) expresses a linear restriction a i · w > b i · w on an unknown weight vector w ∈ IR n (the dot product a i · w is equal to n j=1 a i ( j)w( j)). This linear weighting assumption is less restrictive than it sounds; for instance, we could form additional features representing e.g., pairwise products of the basic features, enabling a richer representation of the utility function.
We define , the preference inputs, to be {λ i : i ∈ I}, where for each i, λ i = a i − b i . Then, a feasible w satisfies λ · w > 0 for all λ ∈ (because a i · w > b i · w). Let us denote the feasible set by > (={w ∈ IR n : ∀λ ∈ , w · λ > 0}), and associate the hyperplane H w = {x ∈ IR n : x · w = 0} with a feasible w ∈ > . Clearly, any feasible hyperplane contains the origin, and all λ ∈ are in the associated positive open half-space of the hyperplane. We also will almost always be assuming that the preference inputs are consistent, so that > is non-empty. Later, in Section 6, we discuss how to cope with inconsistency in preference inputs.
We sum up some of these key notions in the following definition.

Symbol
Meaning n number of features. m number of preference input pairs.

S − {x}
from the set S, the element x is excluded. * + {u} for the vector u ∈ IR n , defined as {w + u : w ∈ * }. w Euclidean norm of w.
the relation that is invariant to the rescaling of features and inputs simultaneously.
SIF( ) defined as {ω t τ τ : t ∈ (0, 1] m , τ ∈ IR n + }; e.g., the part of the shaded regions that is strictly within the first quadrant (so not including the axes) in Fig. 1(b). (iii) all the shaded region; and (iv) the positive half space of x + y = 0 then γ will dominate 0 under relation (i) C ; (ii) F and I,F (these two are equal in this example); (iii) I ; and (iv) mm , respectively. (b) ω equals (0.5, 0.5) ≥ is the union of the two shaded regions, SIF( ) is the part of ≥ that is strictly within the first quadrant (so not including the axes), SF( ) is the part of the line segment x + y = 1 strictly within the first quadrant, and SI( ) is the darkly shaded region, which is the intersection of ≥ with co( ), which is the dark region in the left-hand figure. set, > , is shown in Fig. 1(b) as the convex open space above and to the right of the dotted lines, which is also shown as the union of the shaded regions (excluding its boundary) in Fig. 1(a). In Fig. 1(a), the dotted line (x + y = 0) is a feasible hyperplane since it could be associated with a feasible point, such as (0.5, 0.5).

Consistency based relation
One natural preference relation, C , which has been explored, for example, in [47], is given as follows: the test vector α is consistency-based preferred to β (α C β) if and only if w · α ≥ w · β for all feasible w ∈ > . This means that dominance of α over β is consistent with the fact that for all i ∈ I , a i has dominated b i . Proposition 1 below states two other alternative ways to determine if α C β (just consider γ = α − β). We recall the definition of > and define for any finite ⊆ IR n , the following three sets: • > = {w ∈ IR n : ∀λ ∈ , w · λ > 0}; • ≥ = {w ∈ IR n : ∀λ ∈ , w · λ ≥ 1} ( ≥ is the union of the two shaded regions in Fig. 1(b)); and • * = {w ∈ IR n : ∀λ ∈ , w · λ ≥ 0} (the closed convex space surrounded by dotted lines in Fig. 1(b)); • co( ), the convex cone generated by , is the smallest convex cone containing (this is the darkly shaded region in Fig. 1(a)); i.e., the set of all vectors in IR n that can be written as λ∈ r λ λ, where r λ are arbitrary non-negative reals. Elements of co( ) are said to be positive linear combinations of elements of . Proposition 1. Consider any finite ⊆ IR n that is consistent (i.e., > = ∅) and consider any γ ∈ IR n . Then, the following conditions are equivalent. Thus, any of these are equivalent to γ C 0.

Maximum margin preference relation
The maximum margin preference is based on the principal idea in conventional SVM for the hard margin case (see Section 2.3 of [5] and Section 2 of [6]) 2 ; This involves picking a unique element w in the feasible set, to generate the preference relation ≥ w given by α ≥ w β ⇐⇒ w · α ≥ w · β (leading to a stronger ordering than C ).
As mentioned above, w is said to be feasible if w · λ > 0 for all λ in the set of preference inputs . We can also consider degrees of feasibility or satisfaction: one might consider w · λ as a measure of the degree to which w satisfies λ. However, for our purposes, vector w is equivalent with any scalar multiple of w, such as 2w, so we want the degree of satisfaction not to be affected by scalar multiplication of w. For this reason, we define the degree DegSat(w; λ) that w satisfies λ to be w w · λ. The margin marg (w) is then defined as the minimal degree of satisfaction over all elements of , i.e., marg (w) = min λ∈ DegSat(w; λ) = min λ∈ w·λ w . Note that marg (w) > 0 if and only if w is feasible, i.e., w ∈ > , and that for any real r > 0, marg (rw) = marg (w).
It is natural to choose w that maximises the margin, since it is, in a certain sense, maximally consistent with the preference inputs, i.e., it maximises the degree of satisfaction. The margin marg (w) is equal to the perpendicular distance between the hyperplane H w and the closest element of to H w . In simple terms, maximising the margin means choosing a feasible hyperplane that is as far as possible from . The hyperplane that produces the maximum margin is equal to the hyperplane H w where w uniquely has the minimum (Euclidean) norm in ≥ , as stated in Theorem 2. We denote the unique element of ≥ with the minimum norm by ω . In Fig. 1(b), (0.5, 0.5) has uniquely minimal norm in ≥ , so ω = (0.5, 0.5), and thus, the associated hyperplane for that point, x + y = 0 in Fig. 1(a), has the maximum margin. We use w as the notation for Euclidean norm in this paper. Theorem 2. Let ⊆ IR n be a finite consistent set of preference inputs, so that > is non-empty. Then the following all hold.
(i) ≥ is non-empty; (ii) there exists a unique element ω in ≥ with minimum norm; (iii) w maximises marg within > if and only if w is a strictly positive scalar multiple of ω , i.e., there exists r ∈ IR with r > 0 such that w = rω .
More general versions of this result that allow additional linear restrictions on the feasible set > are given in [48,49]. 2 It also corresponds to a hard margin version of Ranking SVM [9]; in particular, when the slack variables are omitted or set to zero; this also corresponds, roughly speaking, to tending the penalising constant C in the objective function (Equation 12 of [9]) to infinity.. Theorem 2 allows the following definition of the max-margin preference relation mm , and also implies that α mm β if and only if w · α ≥ w · β for any w maximising the margin, i.e., with maximum degree of satisfaction of the preference inputs.

Definition 2 ( mm ). For finite consistent set of preference inputs
⊆ IR n we define relation mm by, for α, β ∈ IR n , α is max-margin-preferred to β with respect to (i.e., α mm β) if and only if α · ω ≥ β · ω , where ω has uniquely minimal norm in ≥ .
The relation mm is a total pre-order, since it is transitive and for any α, β ∈ IR n we have α mm β or β mm α (or both).

Overall view of rescaling methods
Here we give an overall view of the rescaling approaches developed in the next three sections, Sections 3, 4 and 5. Recall that we are ideally trying to pick an element w of the feasible set > , since for any w ∈ > , the associated relation ≥ w , given by α ≥ w β ⇐⇒ α · w ≥ β · w, is then consistent with the preference inputs . Note that if we multiply w by a positive constant r > 0, this does not change the relation, i.e., the relation ≥ r w is equal to the relation ≥ w . This implies that we do not lose anything if we focus on the subset ≥ of the feasible set > (since they generate the same set of relations ≥ w ).
As discussed in the previous subsection, the max-margin preference relation (see Definition 2) chooses the element ω with minimum norm in ≥ , leading to the associated relation mm equalling ≥ ω . The effect of different kinds of rescaling leads us to consider different elements of ≥ , each with an associated subset S of ≥ ; we call S, the set of scenarios. We then consider preferences that hold for each element of S; thus, the associated preference relation S is given by α S β if and only if for all w ∈ S, α · w ≥ β · w. Equivalently, relation S is equal to w∈S ≥ w . In the next few paragraphs we discuss different choices for the set of scenarios S.
Max-margin case Regarding the maximum margin preference relation from Section 2.2, mm involves just a single scenario, ω , the element of ≥ with minimum norm, which uniquely maximises the margin. Recall, α is max-margin-preferred to β if and only if α ≥ ω β.
Consistency-based For the consistency-based relation C in Section 2.1 we have the set of scenarios S as the whole of ≥ . Thus, C is the same as ≥ .
Rescaling input preferences Rescaling a preference input, λ ∈ , means replacing λ by t λ λ, where t λ is a strictly positive real, as discussed in Section 3. So, when we rescale the preference inputs, we obtain a new version t of , which has a corresponding element ω t maximising the margin in the transformed problem. We let S = SI( ), the set of all such ω t , leading to relation I that is invariant to the rescaling of inputs. We have that α I β if and only if α is max-marginpreferred to β over all rescalings of preference inputs. SI( ) is the darkly shaded region in Fig. 1 Rescaling of features A rescaling of the features' domains (as considered in Section 4) amounts to a scaling of each coordinate, and thus is associated with a vector τ in IR n with strictly positive values; e.g., doubling the value of the first feature and leaving the rest unchanged, leads to the rescaling vector τ being (2, 1, 1, . . . , 1). This transformation affects both the preference inputs , and arbitrary feature vectors, such as α and β. We can then consider the max-margin relation in the transformed space. The features-scaling-invariant preference relation F is given by α F β if and only if α is maxmargin-preferred to β over all rescalings τ of features. The set S of scenarios in this case is equal to the set of what we call the rescale-optimal elements of ≥ , which are those elements that have minimum rescaled norm for some rescaling τ .
The set of rescale-optimal elements is SF( ), which equals the part of the line segment x + y = 1 strictly within the first quadrant in Fig. 1(b).

Rescaling of both inputs and features
In Section 5 we consider both rescaling of the preference inputs and of the features' domains. α I,F β if and only if for all rescalings of the features and the preference inputs, α is max-margin preferred to β. This is if and only if α ≥ w β for all w in the associated set of scenarios SIF( ), where the latter set consists of all ways of transforming ω by inputs and features rescaling. In Fig. 1(b), SIF( ) is the part of the shaded regions that is strictly within the first quadrant.
For each of these sets S of scenarios, we have that α S β if and only if there exists some w ∈ S, such that (β −α) · w > 0.
As shown in Proposition 1 in Section 2.1, for the case of the consistency-based relation C , the simple structure of S = ≥ leads to a simple formulation for testing α S β that can be solved using linear programming.
However, for the sets of scenarios, SI( ), SF( ) and SIF( ), computation of the associated dominance relations I , F and I,F is not straight-forward, because of the more complex definitions. Most of the technical work in Sections 3, 4 and 5 is concerned with giving characterisations of the associated sets of scenarios (as constraints involving additional variables) that enable computation of the dominance relations.

Rescaling of preference inputs
As discussed in the introduction, a plausible robustness requirement is that a preference relation should not depend on how the preference inputs are scaled. If the user tells us that they prefer α to β, we might expect that this would mean that they would also prefer 0.5α over 0.5β, since we are assuming a linear model. However, if we add this preference, corresponding to 0.5(α − β), to the preference input set, we may well obtain a different preference relation for the maxmargin preference relation; similarly if we replace the original preference α − β with this rescaled version 0.5(α − β). In this section we define and give a characterisation of a preference relation I that is invariant to rescaling of the preference inputs .

Defining inputs-rescaling-invariant relation
Consider the effect of rescaling the preference inputs by t ∈ IR m + (where IR m + is the set of strictly positive reals in m-dimensional), with each preference input being multiplied by a strictly positive scalar, so that the rescaled preference input set is defined as t = {t(i)λ i : i ∈ I}. We then have ( t ) ≥ = ≥ t = {w ∈ IR n : ∀i ∈ I, w · (t(i)λ i ) ≥ 1}. We will write t(i) as t i for brevity. Note that ( t ) > = > for any t ∈ IR m + , since w · t i λ i > 0 ⇐⇒ w · λ i > 0, so if is consistent then so is t for every t ∈ IR m + .
Let us say that α is max-margin-preferred to β under rescaling t if α mm t β. Now, it can easily happen that α is preferred to β under one rescaling, but not under another.

Example 2.
Consider t = ( 3 /5, 1 /5, 1) rescaling in Example 1. Then, t equals {( 6 /5, 3 /5), ( 1 /5, 2 /5), (1, 1)}. In Fig. 2(a), ≥ t is the whole shaded region, and it can be seen that ω t = (1, 2) which means the hyperplane with the maximum margin for t is x + 2 y = 0 (instead of x + y = 0). Then, (2 However, it seems natural to assume that if the user prefers a i over b i then he will also prefer t i a i over t i b i for any t i ∈ IR + . Also, for test vectors α and β, if α mm β then, for any positive real r, we have rα mm rβ; since the resultant preferences are invariant to such rescaling, it seems reasonable that the same would hold for the input preferences.
We therefore consider a more robust relation, which is invariant to the scaling of the preference inputs, with α being inputs-scaling-invariant preferred to β only if it is max-margin preferred for all rescalings t ∈ IR m + of the preference inputs.

Definition 3 ( I ).
For finite consistent set of preference inputs ⊆ IR n we define relation I by, for α, β ∈ IR n , α I β if and only if α is max-margin-preferred to β over all rescalings of preference inputs, i.e., if for all t ∈ IR m + , α mm t β.
So far, we have assumed that each component t i of t can be any strictly positive scalar. However, in Proposition 3 below, we will show that if each t i is restricted to be in (0, 1], the result for relation I will not change. This is not surprising, since, e.g., doubling each component of t will not change the relation mm t . This simplification will be helpful in the computation of the I relation.
Proposition 3. Consider any finite consistent set of preference inputs ⊆ IR n and any α, β ∈ IR n . Then, α I β if and only if for all t ∈ (0, 1] m , α mm t β.
We define SI( ) to be the set consisting solely of ω t for all scalings t ∈ (0, 1] m . Definition 4 (SI( )). For finite consistent set of preference inputs ⊆ IR n , let SI( ) = {ω t : t ∈ (0, 1] m }. Thus, u ∈ SI( ) if and only if there exists t ∈ (0, 1] m such that u has minimal norm in ≥ t . Proposition 3, along with Definition 2, immediately implies the following result, expressing the preference relation I in terms of the set SI( ).
For example, it can be shown that SI( ) in Fig. 1(b) is the darkly shaded region, which is the intersection of the shaded region ≥ in Fig. 1(b) with co( ), which is the dark region in the left-hand figure (see Theorem 7 below). Then, it can be seen that (SI( )) * is all the shaded region in Fig. 1(a). This implies that γ I 0 if and only if γ is in any of the shaded region in Fig. 1(a). Also, α I β ⇐⇒ α − β I 0.

Proposition 5.
Consider a finite consistent set of preference inputs ⊆ IR n and any u ∈ IR n . Then, u ∈ SI( ) if and only if u ∈ ≥ and u has minimum norm in * + {u}. Thus, in particular, SI( ) ⊆ ≥ .
We know that u = (1, 2) ∈ SI( ) because it has minimum norm in t for t = ( 3 /5, 1 /5, 1). We can easily see that u ∈ ≥ and has minimum norm in * + {u}. Now, let v be any point between two black circles in Fig. 2(a). Then, v does not have minimal norm in * + {v}; in fact, (1, 2) minimises norm in * + {v}. We will see that v / ∈ SI( ). We will prove (in Proposition 6) that co( ) is precisely the set of elements u ∈ IR n such that u has minimum norm in * + {u}. Together with Proposition 5, this will imply Theorem 7 below, which characterises SI.

Proposition 6. Consider any finite consistent set of preference inputs
⊆ IR n and any u ∈ IR n . Then, u has minimum norm in * + {u} if and only if u ∈ co( ).
Propositions 5 and 6 immediately imply the following theorem. Proof. u is in SI( ) if and only if by Proposition 5, u ∈ ≥ and u has minimum norm in * +{u}, which, from Proposition 6, holds if and only if u ∈ ≥ and u ∈ co( ).
Theorem 7 shows that SI( ) in Fig. 1(b) is the darkly shaded region, which is the intersection of the shaded region ≥ in Fig. 1(b) with co( ), which is the dark region in Fig. 1(a).
The following result leads immediately to an algorithm to determine, for arbitrary α, β ∈ IR n if α I β, using a linear programming solver.

Corollary 8. For finite consistent set of preference inputs
⊆ IR n , let λ i ∈ be the i th element of where i ∈ I = {1, . . . , | |}. Consider any u ∈ IR n . Then, u is in SI( ) if and only if for all i ∈ I, u · λ i ≥ 1 and there exist non-negative reals r i for each i ∈ I such that u = i∈I r i λ i .
Proof. The result follows easily from Theorem 7 and the definition of co( ) and ≥ . Proposition 4 implies that for α, β ∈ IR n , α I β if and only if there exists u ∈ SI( ) such that α · u < β · u. Using Corollary 8 we therefore have the following result which leads immediately to a computational procedure for the preference relation I . Proposition 9. Let ⊆ IR n be a finite consistent set of preference inputs, and let α, β ∈ IR n . Then α I β if and only if there exists u ∈ IR n such that the following three conditions all hold: (ii) for all i ∈ I, u · λ i ≥ 1; and (iii) there exist non-negative reals r i for each i ∈ I such that u = i∈I r i λ i .

Rescaling of features
As discussed in the introduction, an important, and potentially problematic, pre-processing step in SVM methods is rescaling of the domain of each feature. In this section we define a preference relation F (based on preference inputs ) that is invariant to the relative scalings of the feature domains.
Normalization of features is a necessary phase in any SVM-based method. This task often involves translations and rescalings on the domain of each feature. It is evident that the maximum margin relation is unaffected by translation of feature space; i.e., for all δ ∈ IR n , α + δ mm β + δ iff (α + δ) · ω ≥ (β + δ) · ω if and only if α mm β. Therefore, in this section we only consider the effect of rescaling of feature spaces.
The effect of rescaling of features on a conventional binary SVM classifier is also discussed in a separate study by the authors [50]. In that context, a data point is called strongly positive (respectively negative) if it is positively (resp. negatively) classified for all choices of feature scaling. Otherwise, the instance is considered neutral because it is differently classified for different scalings of features.
Let IR n + be the set of strictly positive vectors in IR n , i.e., vectors with every component strictly positive. Let features rescaling τ ∈ IR n + be a vector of strictly positive reals, with the jth component τ ( j) being the scale factor for the jth feature. The effect of the rescaling on a vector λ ∈ IR n is given by pointwise multiplication, λ τ , defined by, for all j = 1, . . . , n, (λ τ )( j) = λ( j)τ ( j). Operation is commutative, associative and distributes over addition of vectors. An important property is that for any u, v, w ∈ IR n (u v) · w = v · (u w), since they are both equal to n j=1 u( j)v( j)w( j).
Like rescaling of inputs, we see that α might be preferred to β under one rescaling of features, but not under another.
However, the choice of how the features are scaled relative to each other can involve somewhat arbitrary choices. It is therefore natural to consider a more cautious preference relation, features-scaling-invariant preference relation, given by α being preferred to β for all rescalings τ ∈ IR n + .

Definition 5 ( F ). For finite consistent set of preference inputs
⊆ IR n we define relation F by, for α, β ∈ IR n , α F β if and only if α is max-margin-preferred to β over all rescalings of features, i.e., if for all τ ∈ IR n + , we have α τ mm τ β τ .
We define the set of vectors SF( ) as follows.
We then have the following simple relationship between SF( ) and the preference relation F .

Proposition 10.
For finite consistent set of preference inputs ⊆ IR n , and any α, β ∈ IR n , we have α F β ⇐⇒ for all w ∈ SF( ),

Rescale optimality
We define an important notion, rescale optimality, for understanding the set SF( ), and hence the features-scalinginvariant preference relation F . We will see below, in Proposition 11, that SF( ) is equal to the set of rescale optimal elements of ≥ . Because some of the formal concepts and results do not require exactly the form of ≥ , we express them in terms of a more general subset G of IR n . Definition 7 (Rescale-optimal). For G ⊆ IR n , and u ∈ G, let us say that u is rescale-optimal in G if there exists (strictly positive) It can be seen intuitively that elements of the (open) line segment between (1, 0) and (0, 1) in Fig. 1 We will show in Proposition 11 below that SF( ) is equal to the set of rescale-optimal elements in ≥ . For instance, that is between ( 1 /2, 1 /2) and (0, 1). If SF( ) equals the line segment between (1, 0) and (0, 1), it can be seen that (SF( )) * is the first quadrant in Fig. 1(a). This implies that in Fig. 1(a), γ F 0 if and only if γ is in the first quadrant.
Proposition 11. Consider any finite consistent set of preference inputs ⊆ IR n . Then, SF( ) is equal to the set of all rescale-optimal Proposition 11 implies, in particular, that ω is rescale-optimal in ≥ .

Pointwise undominated
Let G be some subset of IR n (where we will be applying the results to the case of G = ≥ ). For u ∈ G, if there exists v ∈ G such that for all j, v( j) is between u( j) and 0 then it is easy to see that u cannot be rescale-optimal element in G. This is the idea behind being pointwise undominated, which is reminiscent of being Pareto undominated, and is a necessary condition for being rescale-optimal. The notion of pointwise dominance leads to a characterisation of when there is a unique rescale-optimal element, see Theorem 13 in Section 4.3, which corresponds to the case in which rescaling of features makes no difference.
For u ∈ G ⊆ IR n , we say that u is pointwise undominated in G if there exists no v ∈ G that pointwise dominates u.
In Fig. 1(b), all elements on the part of closed line segment x + y = 1 within the first quadrant (i.e., including points on the axes) are pointwise undominated in ≥ . The definition easily implies that being rescale-optimal implies being pointwise undominated (but not the converse).

Proposition 12. Let G ⊆ IR n . If u is rescale-optimal in G then u is pointwise undominated in G. Thus, if u is pointwise dominated in G then u is not rescale-optimal in G.
Proof. Suppose that u is not pointwise undominated in G, so that there exists v ∈ G that pointwise dominates u. Then, for every j ∈ {1, . . . ,n}, |v( j)| ≤ |u( j)|, and for some k ∈ {1, . . . ,n}, |v(k)| < |u(k)|, which implies that for every τ ∈ IR n + , v τ < u τ , and hence, u is not rescale-optimal in G.
Proposition 12 states that being pointwise undominated is a necessary condition for being rescale-optimal. However, by having a look at our running example we will see that this not a sufficient condition. The intersection points of x + y = 1 with the axes (i.e., (1, 0) and (0, 1)) are pointwise undominated but not rescale-optimal in ≥ . To see this, suppose that for example (1, 0) were rescale-optimal in ≥ ; i.e., there exists τ ∈ IR 2 , we obtain, there exists r ∈ IR + such that for all ∈ (0, 1], r 2 < (1 − ) 2 r 2 + 2 , and thus, r 2 < 2 /(1 −(1 − ) 2 ) = /(2 − ). Now, for any r ∈ IR + there exists sufficiently small > 0 such that /(2 − ) < r 2 , proving that (1, 0) is not rescale-optimal in ≥ by contradiction. We can use a similar argument to show that (0, 1) is not rescale-optimal in ≥ as well. We will investigate this further in Section 4.4, leading to a computational procedure for rescale-optimality. First, in Section 4.3, we characterise the situations when rescaling of features makes no difference, in which case F is the same as mm .

Determining invariance to rescaling of features
Example 4 below illustrates that allowing rescaling of features can sometimes make no difference in maximum margin relation. Fig. 3. Here, ≥ has a single extremal point at ( thus, by Proposition 12, they are not rescale-optimal. Consequently, the only element of ≥ that is rescale-optimal is Note that if there exists a unique rescale-optimal element in ≥ , then this element must be ω , since the latter is rescale-optimal by Proposition 11. This immediately implies that F is then equal to mm . Therefore this is the situation in which rescaling of the features has no effect on the preference relation.
Theorem 13 below states that u is the only rescale-optimal element in convex closed G if and only if u pointwise dominates every other element of G.
Theorem 13. Let G be a convex and closed subset of IR n , and let u be an element of G. Then the following conditions are equivalent.
(i) u is uniquely rescale-optimal in G, i.e., u is the unique element of G that is rescale-optimal; Consider as it is in Example 4. Then, the three conditions hold for u = ( 1 /2, 1 /2). The equivalence between (i) and (ii) is proved using Lemmas 42 and 43, and the equivalence between (ii) and (iii) follows using Lemma 15.

Corollary 14.
Let ⊆ IR n be a finite consistent set of preference inputs. Choose an arbitrary element y ∈ ≥ . Using y we will generate an element y * ∈ IR n . For each j ∈ {1, . . . ,n}: If y * ∈ ≥ then y * is uniquely rescale-optimal in ≥ . Also, there exists a uniquely rescale-optimal element in ≥ if and only if y * ∈ ≥ .
To prove the corollary we use the following lemma.
Lemma 15. Let G be a convex subset of IR n , and let j be any element of {1, . . . ,n}. Then either (i) there exists w ∈ G such that w( j) = 0; or (ii) for all w ∈ G, w( j) > 0; or (iii) for all w ∈ G, w( j) < 0.
Conversely, suppose that there exists a uniquely rescale-optimal element u in ≥ ; we will prove that y * ∈ ≥ . Consider arbitrary j ∈ {1, . . . ,n}. The fact that ≥ is a polyhedron implies that there exists some w ∈ ≥ with w( j) = y * ( j). If y * ( j) > 0 then we know by Lemma 15 that u( j) ≥ y * ( j). But Theorem 13 implies that u( j) ≤ w( j) = y * ( j), and thus y * ( j) = u( j). Similarly, if y * ( j) < 0 then y * ( j) ≥ u( j) ≥ w( j) = y * ( j) and so, y * ( j) = u( j). If y * ( j) = 0 then w( j) = 0, so u( j) = 0, also using Theorem 13. We have shown that y * = u, so y * ∈ ≥ . Corollary 14 leads immediately to an algorithm for determining if ≥ has a uniquely rescale-optimal element, and finding it, if it exists. This is the situation in which rescaling of the features makes no difference. The algorithm involves at most n + 1 runs of a linear programming solver, and thus determining and finding a uniquely rescale-optimal element u can be performed in polynomial time. If it succeeds in finding such a u then the induced preferences can be efficiently tested

Characterising rescale-optimality
As we have shown, being pointwise undominated is a necessary but not a sufficient condition for being rescale-optimal. In this section we define a stronger version of pointwise undominated called zm-pointwise undominated, where 'zm' stands for zeros-modified (the essential difference being in the treatment of j such that u( j) = 0). We show that this is a necessary condition as well, and is in fact also a sufficient condition for being rescale-optimal (for polyhedra). According to the following definition, while the points (1, 0) and (0, 1) in Fig. 1(b) are pointwise undominated, they are not zm-pointwise undominated.
Clearly, if every component of u is non-zero, then N u = {1, . . . ,n}, and so for any vector v ∈ IR n we have that v zm-pointwise dominates u if and only if v pointwise dominates u. Proposition 16 below gives a characterisation of (i) rescale-optimal, and (ii) zm-pointwise undominated. Together, these immediately imply part (iii), that being zm-pointwise undominated is a necessary condition for being rescale-optimal.
Proposition 16. Let u be an element of convex G ⊆ IR n . Then: (iii) If u is rescale-optimal in G then u is zm-pointwise undominated in G.
We say that u, v ∈ IR n agree on signs if, for each component j, u( j) and v( j) have equal sign.
Definition 10 (Agreeing on Signs). For u, v ∈ IR n , u and v agree on signs if for all j = 1, . . . , n, and thus also: For example, (1, 0) and (1, 1) do not agree on signs but for ε > 0, (1, ε) and (1, 1) agree on signs. Clearly, if u and v agree on signs then u · v > 0, unless they're both the zero vector. The following is the key theorem of this section to characterise rescale-optimality by making use of Proposition 16(i). This characterisation is the basis of the computational procedure for the features-scaling-invariant preference relation F developed in Section 4.6.
Theorem 17. Consider any u in convex G ⊆ IR n . If u = 0 then it is the unique rescale-optimal element of G. Otherwise, u is rescaleoptimal in G if and only if there exists μ ∈ IR n agreeing on signs with u such that μ · u = 1 and for all w ∈ G, μ · w ≥ 1.

Proof.
It is clear that if u = 0 then for all τ ∈ IR n + and for all w ∈ G − {u}, u τ = 0 < w τ , which means that u is the unique rescale-optimal element of G. Now, suppose u = 0.

Equivalence of rescale-optimal with zm-pointwise undominated
It turns out that being zm-pointwise undominated is equivalent to being rescale-optimal, for a polyhedron, and therefore, in particular, for ≥ ; see Theorem 18 below. The rather complex proof of this result can be found in the appendix.

Expressing rescale-optimality in terms of positive linear combinations
Here we extend the characterisation of rescale-optimality given in Theorem 17, leading to a computational method for testing rescale-optimality, and thus to a method for testing if α F β, for α, β ∈ IR n , i.e., preference with respect to the features-scaling-invariant preference relation. Theorem 17 implies that non-zero u is rescale-optimal in convex set G if and only if there exists a vector μ that agrees on signs with u with μ · w ≥ μ · u for all w ∈ G. Theorem 21 below shows that μ is a positive linear combination of certain vectors when G is a polyhedron, which thus includes the case when G = ≥ .
This leads to a characterisation in Theorem 23 of SF( ), giving a computation procedure for the preference relation F , summed up in Proposition 24.
We can write any polyhedron as G I = {w ∈ IR n : ∀i ∈ I, w · λ i ≥ a i }, for finite I , and with each λ i ∈ IR n and a i ∈ IR . We also consider For example, consider a 1 = a 2 = 1, u = (1, 0), v = ( 1 /2, 1 /2) and y = (1, 1), with the vectors λ i for i ∈ I = {1, 2, 3} being as in Example 1 and Fig. 1. Then, The following pair of lemmas are used in the proof, with the first one following very easily from the definition.

Lemma 20. Consider a polyhedron G I and non-zero u ∈ G I . Then u is rescale-optimal in G I if and only if u is rescale-optimal in G J u .
Theorem 21. Let G be a polyhedron, which we write as G I = {w ∈ IR n : ∀i ∈ I, w · λ i ≥ a i }, for finite I, and with each λ i ∈ IR n and a i ∈ IR . Consider any non-zero vector u in G I . Then, u is rescale-optimal in G I if and only if there exists μ ∈ IR n that agrees on signs with u such that μ · u = 1 and μ ∈ co({λ i : i ∈ J u }).
Note that this theorem implies that if non-zero u is rescale-optimal in G I then J u is non-empty, since 0 is the only positive linear combination of the empty set, and μ = 0.
Proof. First consider μ ∈ IR n such that μ · u = 1. Then it can be seen that {w : By Lemma 20, u is rescale-optimal in G I if and only if u is rescale-optimal in G J u , which, by Theorem 17, is if and only if there exists μ ∈ IR n agreeing on signs with u such that μ · u = 1 and G J u ⊆ {w : w · μ ≥ 1}, i.e., μ ∈ co({λ i : i ∈ J u }), by the earlier argument.
We have the following corollary (using the same notation), which shows that testing if u is rescale-optimal in G I can be performed in polynomial time: by first checking that u ∈ G I (i.e., for all i ∈ I , u · λ i ≥ a i ), and then testing if a set of inequalities has a solution, using a linear programming solver.

Corollary 22.
Let u be a non-zero element of IR n . Then, u is rescale-optimal in G I if and only if u ∈ G I and there exists non-negative reals r i for each i ∈ J u , and vector τ ∈ IR n with for all j ∈ {1, . . . ,n}, τ ( j) ≥ 1, and τ ( j)u( j) = i∈ J u r i λ i ( j).
Proof. First suppose that u is rescale-optimal in G I . Then u ∈ G I , and, by Theorem 21, there exists μ ∈ IR n that agrees on signs with u such that μ · u = 1 and there exist non-negative r i ∈ IR such that μ = i∈ J u r i λ i . For all j ∈ {1, . . . ,n} such that u( j) = 0, define t j = μ( j)/u( j), which is greater than zero, because μ and u agree on signs, and let t be the minimum of these values. Define τ by τ Conversely, suppose that u ∈ G I and there exists non-negative reals r i for each i ∈ J u and vector τ ∈ IR n with for all j ∈ {1, . . . ,n}, τ ( j) ≥ 1, and τ ( j)u( j) = i∈ J u r i λ i ( j). Define μ ∈ IR n to be τ u (τ u)·u . Then μ · u = 1, and μ agrees on signs with u, and is a positive linear combination of {λ i : i ∈ J u }. Theorem 21 then can be applied to give the result.
Theorem 21 implies the following, which leads to a computational method for checking dominance with respect to F .

Theorem 23.
Consider finite consistent set of preference inputs ⊆ IR n , and any u ∈ IR n . Define u = {λ ∈ : λ · u = 1}. Then, u is in SF( ) if and only if u ∈ ≥ and there exists μ ∈ IR n such that μ agrees on signs with u, and μ ∈ co( u ). Also, u is in SF( ) if and only if u ∈ ≥ and there exists μ ∈ IR n and some subset of u such that | | ≤ n + 1, and μ ∈ co( ), and μ agrees on signs with u.
Proof. Proposition 11 implies that SF( ) equals the set of all rescale-optimal elements of ≥ . Hence, Theorem 21 implies that u ∈ SF( ) if and only if u ∈ ≥ and there exists μ ∈ IR n that agrees on signs with u such that μ ∈ co( u ) and μ · u = 1.
First, we will show that the condition μ · u = 1 can be omitted. Suppose first that u ∈ ≥ and there exists μ ∈ IR n that agrees on signs with u such that μ ∈ co( u ). Now, u is a nonzero vector, since u ∈ ≥ . Since μ agrees on signs with u, we haveμ · u > 0. Define μ = μ μ·u . Then μ ∈ co( ), μ · u = 1, and μ and u agree on signs. We can then apply Theorem 21 to give u ∈ SF( ). The converse follows immediately from the same theorem.
The last part follows from Carathéodory's Theorem (see e.g., 3.1.2 in [51]) which states that for any w ∈ IR n and any S ⊆ IR n , if w ∈ co(S) then there exists S ⊆ S with |S | ≤ n + 1 such that w ∈ co(S ).
Condition (iv) holds if and only if there exist non-negative reals r i for each i ∈ I such that i∈I (r i = 0) ≤ n + 1 (i.e., |{i ∈ I : r i = 0}| ≤ n + 1), and μ = i∈I r i λ i , and for all i ∈ I , either u · λ i = 1 or r i = 0.

Simultaneous rescaling of features and inputs
Having defined the preference relations I and F , based, respectively, on rescaling of preference inputs and features, it is also natural to consider both kinds of rescaling simultaneously. In this section, we define and characterise a preference relation based on allowing both the rescaling of features and of preference inputs.
Definition 11 (SIF( ) and I,F ). For finite consistent set of preference inputs ⊆ IR n , we define the set SIF( ) by w ∈ SIF( ) if there exists t ∈ (0, 1] m such that w ∈ SF( t ); i.e., SIF( ) = {ω t τ τ : t ∈ (0, 1] m , τ ∈ IR n + }. We define relation I,F by This definition implies that α I,F β if and only if for all rescalings of the features and the preference inputs, α is max-margin preferred to β. We have the following characterisation, which leads to a computational method for checking if α I,F β. Theorem 25. Let ⊆ IR n be a finite consistent set of preference inputs. Then, u ∈ SIF( ) if and only if u ∈ ≥ and there exists μ ∈ IR n that agrees on signs with u such that μ ∈ co( ).
For the converse, assume that u ∈ ≥ and there exists μ ∈ IR n , that agrees on signs with u, such that μ ∈ co( ). Let us define t ∈ IR m + by t(i) = 1 λ i ·u for all i ∈ {1, . . . ,m}. Because u ∈ ≥ we have λ i · u ≥ 1, and thus, t(i) ∈ (0, 1]. Then, for all i we have t i λ i · u = 1, which implies that u ∈ ≥ t and also that We therefore have that u ∈ SIF( ).

Generating a consistent preference input set
There are a number ways of extending the approach to deal with inconsistent input information, i.e., when ≥ is empty, where is the (finite) set of preference inputs. One desirable property of such a method is that it should not depend on an arbitrary ordering of the input set . Here, we describe three possible approaches for restoring consistency, which all satisfy this property.
The first approach is iteratively eliminating the elements of that are least consistent with others. Define the function C : → IR such that for every i ∈ I , C (λ i ) = j∈I−{i} λ i · λ j . This function expresses a kind of degree of consistency of the element λ i with other elements of , where the smaller the value of C (λ i ) is, the less consistency there is between λ i and the other elements of . Then the simple procedure Algorithm 1 can be followed to generate a consistent subset of . The second method forms a consistent subset of based on the sum μ = λ∈ λ of the preference input vectors: see Proposition 27 below. Unless μ is the zero vector, μ = {λ ∈ : λ · μ > 0} is non-empty and consistent. We can therefore define ω μ to be the solution of the maximum margin approach for μ . Then, we return ω μ = {λ ∈ : λ · ω μ > 0} which is again consistent, and we have μ ⊆ ω μ ⊆ .
A third approach involves adding m extra real variables (i.e., m dummy features), one for each λ i (with i ∈ I = {1, . . . ,m}) and extend each λ i to the extra m variables by it having a value ε in the corresponding column, and zeros in the other m − 1 columns. Here, ε is a strictly positive (typically small) number that relates inversely to the penalty for softening the constraints.
More formally, we say that u ∈ IR n+m extends v ∈ IR n if for each j = 1, . . . , n, u( j) = v( j). For each i ∈ I we define δ i as follows: δ i extends λ i , and δ i (n + i) = ε, and δ i (n + j) = 0 for j ∈ I − {i}. Let , the extended preference inputs set, equal Consider any w ∈ IR n , and any u ∈ IR n+m that extends w. Then, for each If w · λ i ≥ 1 then we can satisfy the constraint u · δ i ≥ 1 by setting u(n + i) = 0. Otherwise, we can satisfy the constraint by letting u(n + i) = 1 ε (1 − w · λ i ). (In fact, since we are interested in minimising the norm, or a rescaled version of the norm, we only need to consider this particular way of extending w to IR n+m .) This implies that any w ∈ IR n can be extended to an element of ≥ ; so, in particular, the extended input set is always consistent. However, if w is not close to satisfying λ i , i.e., if w · λ i is a large negative number, then the value of u(n + i), and hence the norm of u, will be large. This shows that vectors w ∈ IR n that come close to satisfying the input constraints will be favoured.
The definitions and mathematical machinery for the various preference relations defined above can then proceed as in the previous sections but now working within IR n+m . When testing dominance the test vectors α and β are extended with the same value (e.g., 0) for the extra m components.

Properties of relations and computation of inferences
In previous sections, we defined a number of preference relations. In Section 7.1 we give some properties, in particular, regarding the relationships between the preference relations. In Section 7.2 we express the computational characterisations, derived in earlier sections, in terms of constraints, which enable simple implementation.

Properties of the different preference relations
We have considered the following preference relations: the consistency-based relation C (Section 2.1), the relation I based on rescaling preference inputs for the maximum margin preference relation (Section 3), relation F based on rescaling of features (Section 4) and relation I,F based on rescaling both inputs and features (Section 5).
For each of the relations C , I , F and I,F , the corresponding set of scenarios is defined to be SC( ), SI( ), SF( ) and SIF( ), respectively, where SC( ) is defined to be ≥ . For u ∈ IR n recall that the total pre-order ≥ u is given by α ≥ u β ⇐⇒ u · α ≥ u · β. Let be any of the relations C , I , F and I,F and let S be the corresponding set of scenarios for each relation, so that = S using the notation of Section 2.3. We then have that is the intersection of relations ≥ u over all u ∈ S: see Section 2.1, and Proposition 4, Proposition 10 and Definition 11.
We also consider the intersection of I and F , which we call I∧F , so that, for α, β ∈ IR n , α I∧F β if and only if α I β and α F β, which is if and only if α ≥ u β for all u ∈ SI( ) ∪ SF( ). Hence, if S = SI( ) ∪ SF( ) then S equals These relations, as well as mm , are all reflexive and transitive, and thus pre-orders (with mm being a total pre-order). This is because each relation is equal to an intersection of pre-orders. For similar reasons, the relations are preserved under some simple transformations.
We say that binary relation on IR n is preserved by translation and uniform positive scaling if for any λ ∈ and for α, β, γ ∈ IR n and r ∈ IR + , if α β then α + γ β + γ and rα rβ.

Proposition 28.
For finite consistent set of preference inputs ⊆ IR n , we have the following relationships between the sets of scenarios: Let be any of the relations mm , C , I , F and I,F . Then, is a pre-order preserved by translation and uniform positive scaling, and λ 0 for all λ ∈ (where is the strict part of ). In addition, these relations are nested in the following ways (see Fig. 4): Fig. 4. The Venn diagram that depicts relationships between the preference relations defined in this paper.

Summary of computational characterisations
For finite consistent set of preference inputs ⊆ IR n and arbitrary α, β ∈ IR n , we would like to be able to determine which of the following hold: α C β, α I β, α F β and α I,F β. As usual, we label as {λ i : i ∈ I}. We use the results of previous sections to express, in terms of constraints, the condition that α does not dominate β, with respect to each of the four relations. α I β if and only if there exists u ∈ SI( ) such that u · β > u · α. Recall that, by Proposition 9, this holds if and only if there exists u ∈ IR n , and non-negative reals r i for each i ∈ I , such that • u · (β − α) > 0; • ∀i ∈ I , u · λ i ≥ 1; and • u = i∈I r i λ i .
Note that if t was not restricted to (0, 1] m in the definition of SI( ), then the second constraint (i.e., u · λ i ≥ 1) would be replaced by u · λ i > 0 which is computationally more expensive due to the strict inequality. However, as we proved in Proposition 3, the result for both cases is the same. F F F : α F β if and only if there exists u ∈ SF( ) such that u · β > u · α. As we saw in Proposition 24, this holds if and only if there exists u ∈ IR n and μ ∈ IR n , and non-negative reals r i for each i ∈ I , such that • i∈I (r i = 0) ≤ n + 1; and • ∀ j ∈ {1, . . . ,n}, u( j) = 0 ⇐⇒ μ( j) = 0, and u( j) > 0 ⇐⇒ μ( j) > 0.
In CPLEX, a disjunctive constraint such as [w · λ i = 1 or r i = 0] can be expressed as (w · λ i == 1) + (r i == 0) ≥ 1 (each logical proposition is treated as an integer; 0 for false and 1 for true). I,F I,F I,F : α I,F β if and only if there exists u ∈ SIF( ) such that u · β > u · α. Recall from Proposition 26 that this holds if and only if there exists u ∈ IR n and μ ∈ IR n , and non-negative reals r i for each i ∈ I , such that • u · (β − α) > 0; • ∀i ∈ I , u · λ i ≥ 1;

Optimality operators
In many decision-making situations, there is no clear ordering on decisions (alternatives). There can often be a set of different scenarios with a different ordering on alternatives in each scenario. For example, for different scalings of preference inputs, we may have different orderings over a set of alternatives. In such a setup there are a number of natural ways of defining the set of optimal solutions (best alternatives or top recommended solutions).
We consider here two kinds of optimality operators in the sense of [52]; namely the set of undominated solutions, which is a natural generalisation of the Pareto-optimal set; and the set of possibly optimal solutions. The set of possibly optimal alternatives has been considered in a number of different situations, including for voting rules [53], for soft constraint optimisation [54], and for multi-objective optimisation [55,52].
Let S be any of the relations C , I , F and I,F , where S is the corresponding set of scenarios for each relation (see Section 7.1), which are respectively SC( ) (= ≥ ), SI( ), SF( ) and SIF( ). We have then α S β if and only if, for all u ∈ S, u · α ≥ u · β. We define S to be the strict part of S , so that α S β if and only if α S β and β S α.
An alternative α is defined to be an element of IR n . For a given finite set of alternatives A, the two optimality operators are defined as follows:

UND S (A) (= UND S (A))
is the set of undominated elements with respect to relation S , i.e., α ∈ UND S (A) if and only if there is no β ∈ A such that β S α. PO S (A) is the set of elements that are optimal in some scenario. Thus, α ∈ PO S (A) if and only if there exists u ∈ S such that for all β ∈ A, α · u ≥ β · u. Elements of PO S (A) are said to be possibly optimal (in A, given S).

= {}
This set contains the undominated elements found so far.

Proposition 29. For any finite non-empty set A ⊆ IR n of alternatives, and any finite consistent set of preference inputs ⊆ IR n , PO SF( ) (A) ∩ PO SI( ) (A) is non-empty, and PO SF( ) (A) ∪ PO SI( ) (A) = PO SI( )∪SF( ) (A) ⊆ PO SIF( ) (A) ⊆ PO SC( ) (A).
For each of the sets S of scenarios SC( ) SI( ), SF( ) and SIF( ), we have a characterisation of the condition [u ∈ S] in terms of constraints, see Proposition 1(ii), Corollary 8, Theorem 23 and Theorem 25, respectively. (Each corresponds to the set of constraints for α S β shown for the associated relation in Section 7.2, omitting the first constraint u · (β − α) > 0.) We then define C S (A, α) to be this set of constraints plus the constraints: for all β ∈ A, α · u ≥ β · u. Hence, u is a solution of C S (A, α) if and only if u ∈ S and for all β ∈ A, α · u ≥ β · u. Therefore α ∈ PO S (A) if and only if C S (A, α) has a solution.
Typically, but not always (as we found in our experiments), PO S (A) is a smaller set than UND S (A) (since possibly optimal alternatives are very often, but not always, undominated).
Propositions 2 and 4 in [52] imply that computation of UND S (A) and PO S (A) can be done with a very simple incremental algorithm. We adapt this incremental approach and exploit it for each of the four sets of scenarios.
Algorithm 2 shows how UND S (A) can be found incrementally. It corresponds with a natural way of computing Pareto optimal solutions. The algorithm consists of two stages for each α ∈ A. In the first stage, we examine if α is undominated among the undominated elements found so far. We proceed to the next stage if α is undominated and remove those elements of that are dominated by α (so they are no longer undominated). The correctness of Algorithm 2 is formally stated in Proposition 30.
Solving for α against POs that are found so far.

20:
if u is not NULL then 21:

22:
end if 23: end if 24: end for 25: end function The set of possibly optimal elements PO S (A) is built up in an incremental way in Algorithm 3. In this algorithm, F S (A, α) is a function such that it returns the solution of C S (A, α) if a solution is found, and NULL otherwise. Here, is a set of pairs where the first component of a pair is the potentially possibly optimal element, and the second one is the scenario in which the first component has been found to be optimal. Regarding this notation, ↓ is the set of first components in ; i.e., ↓ = {ψ : (ψ, u) ∈ }. In Line 6, once it is found out that α is a possibly optimal element within ↓ , it is included in along with its associated solution (scenario). Then, in the function Refine-Previous-POs, we remove any (ψ, v) ∈ which is not possibly optimal anymore because of adding α. In Line 18, the existing possibly optimal element ψ is removed from because it is not as good as the incoming possibly optimal element α in its own associated scenario v. However, it does not mean that ψ cannot be possibly optimal; there might be another scenario u in which ψ is better than all elements of including α. If it is the case, we include ψ again in but with this new scenario u instead of v. Proposition 31 formally states the correctness of Algorithm 3.

Proposition 31. For finite consistent set of preference inputs
⊆ IR n , given any subset S of ≥ , and any finite set A (⊆ IR n ) of alternatives, Algorithm 3 returns PO S (A).

Experimental testing
In this section we experimentally test the methods and algorithms developed in earlier sections; the results show the feasibility of the methods, and illustrate relative computational efficiency, as well as the differences between the various relations and optimality classes. It is shown that our preference relations do not necessarily lead to a large number of solutions for the decision maker to consider.
The experiments make use of two databases, namely Ridesharing Database and Car Preference Database. The ridesharing database is a subset of a year's worth of real ridesharing records, provided by a commercial ridesharing system Carma (see http://gocarma .com/). Each ridesharing alternative has 7 features, representing different aspects of a possible choice of match for a given user. More information about the data can be found in [45].
The second database is the result of a survey expressing the preferences of different users over specific cars [56]. For each car 7 features are considered (e.g., engine size).
We base our experiments on 13 benchmarks derived from the ridesharing database and 10 benchmarks derived from the car preference database. Each benchmark corresponds to the inferred preferences of a different user. The preference of alternative a i (i.e., a ridesharing alternative or a car) over b i leads to a i − b i (= λ i ) being included in .
A pre-processing phase deletes some elements of , in order to make it consistent (i.e., ≥ = ∅). In order to do that, we adopt the first and the second approaches discussed in Section 6 respectively for the first and the second database. To conduct the experiments, CPLEX 12.6.3 is used as the solver on a computer facilitated by an Intel Xeon E312xx 2.20 GHz processor and 8 GB RAM memory.

Decisive pairs
Here, we would like to examine how decisive each relation is, i.e., which relation is weaker and by how much. We randomly generate 1000 pairs (α, β), based on a uniform distribution for each feature. A pair (α, β) is called decisive for a preference relation if one of them can (strictly) dominate the other one; for example, the pair (α, β) is decisive for I if and only if α I β or β I α. This is if and only if either (α I β and β I α) or (β I α and α I β). We also consider the relation I∧F which is the intersection of I and F (see Section 7.1; note that this relation differs from the relation I,F ). To determine whether a pair is decisive we need to run the solver, based on the proposed computation methods in Section 7, twice; once for testing if α S β and a second time for β S α. Table 2 shows the percentage of decisive pairs for F , I , I∧F , I,F and C , as well as the running time per pair.
The results illustrate some of the relationships expressed in Proposition 28: C ⊆ I,F ⊆ I∧F (which equals I ∩ F ).
They also demonstrate that the subset relations can easily be strict, with I,F not being the same as either the relation I∧F or the consistency-based relation C . Typically, relation F , relating to rescaling of the features, is much the strongest relation, i.e., most decisive, followed by I , which is only slightly more decisive than I∧F , which is a good deal stronger than I,F , with the consistency-based relation C being much the weakest (least decisive).
The fact that the relation I , based on preference inputs rescaling, is only slightly more decisive than I∧F suggests that I can be close (in some sense) to being a sub-relation of F , since if I ⊆ F then I∧F = I . However, in four of the thirteen ridesharing benchmarks, and in nine of the ten Car Preference benchmarks (see Table 2), the number of decisive pairs for I is not equal to the number for I∧F . This implies that in these particular benchmarks, we have F I (and hence SF( ) SI( )). There are even two of the benchmarks (see the figures in bold) in which F is less decisive than I .
In terms of running time, I is around 130 and 100 times faster than F on average for the ridesharing database and the car preference database, respectively. The computations for C and I,F are of the same order of magnitude as for I , with the former being somewhat faster for the ridesharing database, and those for I,F being somewhat slower. It is interesting that the non-linear constraint for I,F (see Section 7.2) makes much less of a difference for computation time than the non-linear constraints for computing F . The computation times do not appear to depend strongly on the number m of preference inputs, with the partial exception of the F relation.

Optimal elements
The next phase of experiments is devoted to finding optimal solutions with respect to the two kinds of optimality operator discussed in Section 8. To do so, a set of 100 alternatives (i.e., the set A) is randomly generated, based on a uniform distribution for each feature. Then, for each relation, the number of possibly optimal and undominated elements in A is counted; see Table 3. The numbers in the I ∩ F columns relate to the intersection of the I and F optimality sets; for example, the left-hand I ∩ F column gives the cardinalities of the sets PO SI( ) ∩ PO SF( ) .
The results in Table 3  The results also show that others of the subset relations can easily be strict. For example, with the first Rideshar- In most of the benchmarks the figure in the I ∩ F column for the PO S (A) case is equal to the corresponding value in the F column, which implies that PO SF( ) (A) is then a subset of PO SI( ) (A), and similarly, for the UND S (A) results. However, the bold numbers show that the F and I ∩ F columns are not identical, and thus illustrate that e.g., PO SF( ) (A) is not necessarily a subset of PO SI( ) (A).
One can sometimes obtain a still smaller set than that related to SF( ) by taking the intersection of the optimality sets for SI( ) and SF( ). For the Possibly Optimal case, this set PO SI( ) ∩ PO SF( ) is guaranteed to be non-empty, by Proposition 29 (because it contains the non-empty set PO {ω } (A)).
In the ridesharing database, it can be seen that for the most conservative relation, C , the optimality operators return a substantial proportion of alternatives as optimal solutions (roughly half for UND S (A)).
The results for SIF( ) (invariant to preference inputs and features rescaling), the most robust of the three rescaling approaches, lead to only slightly more optimal solutions than for SI( ). Also, for the ridesharing benchmarks, the PO S (A) sets tend to be substantially smaller than the corresponding UND S (A) sets. However, the number of undominated elements for the car preference database is fairly similar to the number of possibly optimal elements, and we sometimes even have |PO S (A)| > |UND S (A)| (see the encircled numbers). Table 4 shows the time for finding possibly optimal and undominated solutions, where the former is faster than the latter by a factor ranging from 1.5 to 4.8 on average; this is partly because of |PO S (A)| being usually smaller than |UND S (A)| particularly for the ridesharing database. Because the computation of F was very much slower than the other relations, the times in the F columns are still greatest, despite the number of optimal solutions being smaller. Overall, the computational cost of the relation F may make it less useful, even though it is more decisive, and thus leads to smaller sets of optimal solutions. Instead one might, for instance, favour PO SI( ) , PO SIF( ) and UND SI( ) since they generate reasonably sized optimality sets much faster. Recall that each one of the returned solutions in a Possibly Optimal set is an optimal solution to a rescaled version of the original problem; it thus seems natural for it to be available for consideration by the decision maker.

Summary and discussion
The maximum margin method for preference learning learns a utility function from a set of input preferences, in order to predict further preferences. However, in many situations, it can be argued that the scaling of preference inputs should not affect the induced preference relation. We have defined a relation I that is a more robust version of the maximum margin preference inference mm , and which is invariant to the scaling of preference inputs. It is also reasonable to consider invariance to the way that features are scaled because, in maximum margin inference, features should be scaled before applying the method; this is due to the fact that the objective function in maximum margin method is sensitive to the scale of feature domains. Thus, we have also defined the F relation which is invariant to the scaling of features. With these two types of rescaling being complementary, it is also natural to consider both types simultaneously, leading to a further preference relation I,F . We derived characterisations for the relations I , F and I,F , which lead to computational procedures. We also characterised the situation when the maximum margin relation is insensitive to the scaling of features, i.e., F equals mm . We then discussed three basic approaches to restore consistency of input data. Two optimality operators-UND S (A) and PO S (A)-have been considered to define how a set of optimal solutions can be extracted from the available alternatives. We proposed two algorithms in order to compute UND S (A) and PO S (A) in an incremental manner. Our experiments, which used 23 benchmarks derived from two sets of real preference data, compared the different relations in terms of decisiveness and the set of optimal solutions regarding UND S (A) and PO S (A), and showed that the computational methods are practically feasible for a moderate number of instances/features. The relation associated with only scaling the features was the most decisive but by far the slowest for computing the associated optimality classes. Overall, one might consider I as a relation that keeps quite a good balance between decisiveness and computation time.
In the future, it would be interesting to explore extensions of our approaches including (i) integration of the approach with a conversational recommender system, and with a multi-criteria decision-making system; (ii) developing computational methods for certain kinds of kernel; (iii) considering soft margin optimisation, i.e., more sophisticated approaches for dealing with an inconsistent dataset; (iv) taking into account more general kinds of input preference statement; and (v) exploring connections with imprecise probability, based on linear constraints on probabilities.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Lemma 32. Consider any ⊆ IR n . If > is non-empty then * is the topological closure of > .
Proof. Let us write the topological closure operator as Cl(·), so that Cl(S) is the topological closure of S, which equals S plus all the limit points of S. Basic properties of Cl(·) include: (a) S ⊆ T implies Cl(S) ⊆ Cl(T ), and (b) Cl(S) = S if S is a topologically closed set.
The following lemma is a well-known result for convex cones. Consider any finite ⊆ IR n and any u ∈ IR n . Then, * ⊆ {u} * if and only if u ∈ co( ). Lemma 33. Consider any finite ⊆ IR n and any u ∈ IR n . Then, * ⊆ {u} * if and only if u ∈ co( ). Fig. 1, and {u} * = {(x, y) : 3x + 3 y ≥ 0} clearly contains ≥ , the union of the shaded regions in Fig. 1(b).
(i) ⇐ (ii): Suppose that for all w ∈ ≥ , w · γ ≥ 0, and consider any u ∈ > . Let a u = min λ∈ u · λ which is clearly greater than zero, and let u = u a u . For any λ ∈ , u · λ ≥ a u which implies that u · λ ≥ 1, and thus, u ∈ ≥ . Because u · γ ≥ 0, we have also, u · γ ≥ 0. The proof of Theorem 2 uses the following lemma.

Lemma 34.
For w ∈ > , define a w to be min λ∈ w · λ, (which is always strictly positive by definition of > ), and define w to be w a w . Then, the following hold for any w ∈ > . (i) w ∈ ≥ ; (ii) if w ∈ ≥ then w > w unless w =w; (iii) for any real r > 0, Proof. Assume w ∈ > . Then aw = min λ∈ 1 a w w · λ = a w a w = 1. Thus, w ∈ ≥ , showing (i). Also, w w = a w , by definition of w. If w ∈ ≥ then a w ≥ 1, so w > w unless a w = 1, i.e., w =w, proving (ii). The definitions immediately imply that marg (w) = a w w . Since aw = 1, we have marg (w) = 1 w . The definition of marg implies that for any real r > 0, marg (rw) = marg (w), showing (iii), so, in particular, marg (w) = marg (w) = 1 w , which proves (iv).
Theorem 2. Let ⊆ IR n be a finite consistent set of preference inputs, so that > is non-empty. Then the following all hold.
(i) ≥ is non-empty; (ii) there exists a unique element ω in ≥ with minimum norm; (iii) w maximises marg within > if and only if w is a strictly positive scalar multiple of ω , i.e., there exists r ∈ IR with r > 0 such that w = rω .
Proof. u ∈ ≥ means that for all i ∈ I , u · λ i ≥ 1, which implies that 0 < 1 u·λ i ≤ 1. For all i ∈ I , let t i = 1 u·λ i , and so t ∈ (0, 1] m . By definition, w ∈ ≥ t if and only if for all i ∈ I , w · t i λ i ≥ 1. Now, w · t i λ i ≥ 1 holds if and only if w · λ i ≥ u · λ i , which is if and only if (w − u) · λ i ≥ 0. Thus, ≥ t = {w ∈ IR n : ∀i ∈ I, (w − u) · λ i ≥ 0}, which equals * + {u}.
Lemma 39. Consider any finite ⊆ IR n , and any t ∈ (0, 1] m . Then, ≥ t ⊆ ≥ . Proof. Consider any u ∈ ≥ t . Then for all i ∈ I , u · λ i ≥ 1 t i . Since each t i is in (0, 1] we have for all i ∈ I , u · λ i ≥ 1, and thus, u ∈ ≥ . Proposition 5. Consider a finite consistent set of preference inputs ⊆ IR n and any u ∈ IR n . Then, u ∈ SI( ) if and only if u ∈ ≥ and u has minimum norm in * + {u}. Thus, in particular, SI( ) ⊆ ≥ .
The following lemma is used in the proof of Proposition 6; it states a basic property of a minimal norm element in a convex set.

Lemma 40. Consider any u ∈ G where G ⊆ IR n is a convex set. Then, u has the minimum norm in G if and only if for all
Proof. ⇒: Firstly, for the case when v = u, the result is easily obtained because u · (v − u) = 0. Now, consider any v ∈ G − {u}. We define v δ = δv + (1 − δ)u for each δ ∈ (0, 1]. It is clear that v δ ∈ G because v and u both are in the convex set G, and since u has the minimum norm in G, for all δ ∈ (0, 1] we have that v δ ≥ u . Now, assume that u · (v − u) < 0. We show that this assumption leads to v δ < u for some δ ∈ (0, 1], which will prove the first part by contradiction. To do this, we rewrite v δ 2 − u 2 as follows:

Results in Section 4.1
The following lemma is used to prove the equivalence in Proposition 11. Lemma 41. Consider any v ∈ IR n and any τ ∈ IR n + . Then, v ∈ ≥ if and only if v τ −1 ∈ ( τ ) ≥ . Also, w = v minimises w τ −1 over w ∈ ≥ if and only if v = τ ω τ .
Proof. The first part follows easily from the definitions. Regarding the second part, by definition of ω τ , we have that v = τ ω τ if and only if v τ −1 has minimum norm in ( τ ) ≥ , which is if and only if w = v τ −1 minimises w over w ∈ {w : ∀λ ∈ , w · λ τ ≥ 1}. By substituting w with w τ −1 this holds if and only if v = w minimises w τ −1 over w ∈ ≥ . Proposition 11. Consider any finite consistent set of preference inputs ⊆ IR n . Then, SF( ) is equal to the set of all rescale-optimal elements of ≥ . Thus, for α, β ∈ IR n , α F β if and only if w · (α − β) ≥ 0 for every rescale-optimal element w in ≥ .
Proof. Consider any u ∈ IR n . Then, u is rescale-optimal in ≥ if and only if there exists τ ∈ IR n + such that u = w minimises w τ over w ∈ ≥ , which, by Lemma 41, is if and only if there exists τ ∈ IR n + such that u = τ −1 ω τ −1 , which is, from the definition of SF( ), if and only if u ∈ SF( ).

Results in Section 4.3
The proof of Theorem 13 uses a triple of lemmas. Proof. It is a standard result (for a proof see e.g., Proposition 4 of [49]) that there is a unique element in a convex closed set with minimum norm. Consider any τ ∈ IR n + . Now, G τ = {w τ : w ∈ G} is convex and closed so there exists a unique element w τ ∈ G τ with minimum value of w τ , so there is a unique w ∈ G with minimum value of w τ .

Lemma 15.
Let G be a convex subset of IR n , and let j be any element of {1, . . . ,n}. Then either (i) there exists w ∈ G such that w( j) = 0; or (ii) for all w ∈ G, w( j) > 0; or (iii) for all w ∈ G, w( j) < 0.
Proof. To prove a contradiction, suppose that neither (i), (ii) nor (iii) hold for j, so for all w ∈ G, w( j) = 0, and there exists u, v ∈ G such that u( j) > 0 and v( j convex and δ ∈ (0, 1). Then, v δ ( j) = 0, which shows that (i) holds for j, contradicting the earlier assumption.
Theorem 13. Let G be a convex and closed subset of IR n , and let u be an element of G. Then the following conditions are equivalent.

Results in Section 4.4
The definitions easily imply the following lemma, which relates pointwise dominance and zm-pointwise dominance.

Lemma 44. Consider any u, v ∈ IR n . If v pointwise dominates u then v zm-pointwise dominates u.
Now suppose that u ∈ G ⊆ IR n . If u is zm-pointwise undominated in G then u is pointwise undominated in G. In addition, the converse holds if none of the components of u is zero.
Proof. Suppose that v pointwise dominates u. Then, u = v and for all j ∈ {1, . . . ,n}, Proof. First, let us suppose that u is not zm-pointwise undominated in G. We will show that there exists v ∈ G such that neither condition (i) nor condition (ii) hold for v. Since u is not zm-pointwise undominated in G, there exists v ∈ G that zm-pointwise dominates u. By definition, there exists j ∈ {1, . . . ,n} such that v( j) = u( j) = 0, and thus, condition (i) does not hold for v; also for all k ∈ {1, . . . ,n} with u(k) = 0, either 0 ≤ v(k) ≤ u(k) or 0 ≥ v(k) ≥ u(k), which means that condition (ii) in this lemma does not hold for v.
Lemma 46 is used in the proof of Proposition 16.
(ii) u is zm-pointwise undominated in G if and only if for all v ∈ G, there exists τ ∈ IR n + such that (τ τ u) · (v − u) ≥ 0.
(iii) If u is rescale-optimal in G then u is zm-pointwise undominated in G.
Results in Section 4.5 The following lemmas are used in the proof of Theorem 18.
To illustrate, consider u = (1, 0) and v = (−1, 2) which is in G J u . We can see in Fig. 1(b) that the line segment from u to (0, 1) is in G I but beyond that from (0, 1) to v is not. That means that choosing δ = 1 /2 works for this case (because 1). We will next show that for all sufficiently small δ, v δ ∈ G I , i.e., that for all i ∈ I , v δ · λ i ≥ a i . Since, v δ ∈ G J u , this holds for all i ∈ J u . Now, consider any i ∈ I − J u . By definition of J u we have u · λ i > a i . This implies that there exists Proof. ⇒: Suppose that u is zm-pointwise undominated in G I . Consider any v ∈ G J u . By Lemma 47, there exists δ ∈ (0, 1) . Proposition 16(ii) implies that there exists τ ∈ IR n + such that for all w ∈ G I , (τ τ u) · (w − u) ≥ 0. In particular, (τ τ u) · (v δ − u) ≥ 0, i.e., (τ τ u) · (δ(v − u)) ≥ 0, which implies that (τ τ u) · (v − u) ≥ 0. Note that δ does not depend on the choice of v. Thus, there exists τ ∈ IR n + such that for all v ∈ G J u , (τ τ u) · (v − u) ≥ 0. Applying Proposition 16(ii) again gives that u is zm-pointwise undominated in G J u .
⇐: This is immediate because G I ⊆ G J u .
G J u + {−u} means translating G J u to move u to the origin. So, continuing the example, for u = (1, 0), Lemma 49. For u, v ∈ IR n , if u and v agree on signs and u = 0 then u · v > 0.

Theorem 18. Let u be an element of polyhedron G ⊆ IR n . Then, u is rescale-optimal in G if and only if u is zm-pointwise undominated in G.
Proof. G is a polyhedron, so, by definition, it can be written as {w ∈ IR n : ∀i ∈ I, w · λ i ≥ a i }. Let J u = {i ∈ I : λ i · u = a i }, and let G J u = {w ∈ IR n : ∀i ∈ J u , w · λ i ≥ a i }. Proposition 16(iii) implies that if u is rescale-optimal in G then u is zm-pointwise undominated. We next prove the converse.
Assume that u is zm-pointwise undominated in G. Let C = G J u + {−u}. By Lemma 19, C = {λ i : i ∈ J u } * , which is a polyhedral cone (i.e., a polyhedron that is cone), and thus, by the Minkowski-Weyl theorem (see e.g., Theorem 4.18 of [57]), is a finitely generated convex cone, so we can write C = co(W ) for some finite set W = {w 1 , . . . , w l }.
Let C = co(S) be the convex cone generated by S = W ∪ S Z where S Z = {e j , −e j : j ∈ Z }, and e j ∈ IR n is the unit vector in the jth dimension, and Z = { j ∈ {1, . . . ,n} : u( j) = 0}. Also, let T = E + ∪ E − ∪ R, where E + = {−e j : u( j) > 0}, and E − = {e j : u( j) < 0}, and R = {−w i : i ∈ M}, and where M = {i ∈ L : −w i / ∈ C } and L = {1, . . . ,l}. Let H be the convex hull of T . We will show that the assumption that u is zm-pointwise undominated implies that C and H are disjoint. If there exists h ∈ C ∩ H then h can be written as w + v 0 where w ∈ C and v 0 ∈ co(S Z ). Also, since h ∈ H , it can be written as v where v + ∈ co(E + ), v − ∈ co(E − ) and y ∈ co(R). (More specifically, for some q 1 , q 2 , q 3 ∈ [ 0, 1 ] with q 1 + q 2 + q 3 = 1 we have v + = q 1 v + for some v + in the convex hull of E + , and v − = q 2 v − for some v − in the convex hull of E − , and y = q 3 z for some z in the convex hull of R.) Since −y ∈ C , w − y ∈ C .
. Thus, if u( j) > 0 then v( j) ≤ u( j); and if u( j) < 0 then v( j) ≥ u( j). Since, u is zm-pointwise undominated in G, u is zm-pointwise undominated in G J u , by Lemma 48. Lemma 45 then implies that for all j ∈ {1, . . . ,n}, if u( j) = 0 then v( j) = u( j), and thus v + ( j) = v − ( j) = 0, and so, v + = v − = 0 (since also, if u( j) = 0 then v + ( j) = v − ( j) = 0, by definition of v + and v − , and of E + and E − ). This implies that w + v 0 = y and y ∈ H . Also, since 0 is neither in the convex hull of E + nor E − , we have q 1 = q 2 = 0, and thus q 3 = 1, and so, y is in the convex hull of R. By definition of convex hull, we can write y as i∈M t i (−w i ), with each t i ≥ 0, and for some k ∈ M, t k > 0. Then −t k w k = w + i∈M,i =k t i w i + v 0 . The right-hand-side is in co(S), which equals C , which implies that −w k ∈ C , which contradicts k ∈ M. Thus, C and H are disjoint.
Both C and H are convex and closed, and H is compact. A strict separating hyperplane theorem (see e.g., Theorem 2.1.5 of [58]) implies that there exists vector μ ∈ IR n and c ∈ IR such that for all g ∈ C , μ · g > c and for all h ∈ H , μ · h < c. Since 0 ∈ C , we have μ · 0 > c, so c < 0. Now, if g and −g are both in C then μ · g = 0. (Else μ · g < 0 or μ · (−g) < 0; without loss of generality assume μ · g < 0; then there exists r > 0 such that μ · (rg) = r(μ · g) < c, which contradicts rg ∈ C .) This implies that if u( j) = 0 (so j ∈ Z and e j , −e j ∈ C ) then μ · e j = 0 and thus μ( j) = 0. Also, if i ∈ L − M, then w i , −w i ∈ C , so μ · w i = 0. For any i ∈ M, we have that −w i ∈ H , so μ · (−w i ) < c < 0, so μ · w i > 0. Thus for any w i ∈ W , μ · w i ≥ 0, and therefore for any w ∈ C , μ · w ≥ 0, since w is a positive linear combination of the elements of W .
For any v ∈ G, v ∈ G J u , and so v − u is in C ; we have shown that μ · (v − u) ≥ 0, so μ · v ≥ μ · u = 1. Theorem 17 then implies that u is rescale-optimal in G.

Results in Section 4.6
Lemma 20. Consider a polyhedron G I and non-zero u ∈ G I . Then u is rescale-optimal in G I if and only if u is rescale-optimal in G J u .
This follows from Theorem 18 and Lemma 48, since G I and G J u are polyhedra. However, we give a more direct proof here.
Proof. Firstly, since G I ⊆ G J u , if u is rescale-optimal in G J u then u is rescale-optimal in G I (since the same scaling function τ can be used). We will go on to prove the converse; so, let us assume that u is rescale-optimal in G I . Theorem 17 implies that there exists μ ∈ IR n agreeing on signs with u such that μ · u = 1 and for all w ∈ G I , μ · w ≥ 1. Consider arbitrary v ∈ G J u ; we will show that μ · v ≥ 1.
We have shown that for all v ∈ G J u , μ · v ≥ 1; we also have that μ and u agree on signs and μ · u = 1. Using Theorem 17, this implies that u is rescale-optimal in G J u , as required.

Results in Section 7
Proposition 28. For finite consistent set of preference inputs ⊆ IR n , we have the following relationships between the sets of scenarios: Let be any of the relations mm , C , I , F and I,F . Then, is a pre-order preserved by translation and uniform positive scaling, and λ 0 for all λ ∈ (where is the strict part of ). In addition, these relations are nested in the following ways: C ⊆ I,F ⊆ I∧F = I ∩ F , and I ∪ F ⊆ mm .
Proof. It follows immediately that for any u ∈ IR n , the relation ≥ u on IR n is a total pre-order that is preserved by translation and uniform positive scaling. Also, if u ∈ ≥ then λ > u 0 for any λ ∈ . Suppose that S ⊆ IR n and that S = u∈S ≥ u . It follows easily that S is a pre-order (i.e., is reflexive and transitive), that is preserved by translation and uniform positive scaling (since all these properties are maintained by intersection).
Furthermore, if S ⊆ ≥ then λ S 0 for any λ ∈ . Using this notation we have that C = ≥ ; I = SI( ) ; F = SF( ) ; I∧F = SI( )∪SF( ) ; I,F = SIF( ) , and mm = {ω } . Therefore, we have that each of these relations is a pre-order preserved by translation and uniform positive scaling, if is the strict part of any of these relations then λ 0 for all λ ∈ .

Results in
Conversely, suppose that α ∈ A − UND S (A), so there exists some β ∈ A such that β S α. In fact, since A is finite and S is transitive, there exists γ ∈ UND S (A) such that γ S α. By the first part, γ ∈ . Let us write A as {α 1 , . . . , α h }, where the order reflects the order in which elements of A are chosen in the first for loop in the algorithm. Let i be the set at the beginning of the α i loop. For some different i and j, α i = α and α j = γ and we have α j S α i . Since α j , for all k > j, k α j . If j < i then i α j , and thus, α i = α will not be included in the current , i.e., i+1 / α, and so α / ∈ . If i < j then α i will be removed from j at line 16, and so again α / ∈ . Thus, A − UND S (A) ⊆ A − , and hence, ⊆ UND S (A) and = UND S (A), proving the correctness of the algorithm.

Proposition 31. For finite consistent set of preference inputs ⊆ IR n , given any subset S of ≥ , and any finite set A (⊆ IR n ) of alternatives, Algorithm 3 returns PO S (A).
Proof. Let * be the final set , and let be ( * ) ↓ , i.e., the set returned by the algorithm. If α ∈ A − then at some stage in the algorithm, F S ( ↓ , α) = NULL. But F S ( ↓ , α) = NULL implies that α / ∈ PO S ( ↓ ), which then implies that α / ∈ PO S (A), since ↓ ⊆ A. We have shown that A − ⊆ A − PO S (A), and thus PO S (A) ⊆ .
Conversely, suppose that α ∈ , so that for some scenario u ∈ S, (α, u) is in the final set * . It can be observed that at the end of every loop in the main algorithm, if (α, u) ∈ then for all β ∈ ↓ , α · u ≥ β · u. This is because when (α, u) is added to (in either line 6 or line 19) we have u = F S ( ↓ , α); and this condition is confirmed (see line 17) whenever a new element added. In particular, we therefore have that for any β ∈ , α ≥ u β (i.e., α · u ≥ β · u).
Let γ be any element of A maximising γ · u, so that for all β ∈ A, γ · u ≥ β · u. Thus, γ ∈ PO S (A), so, by the first part, γ ∈ . The fact that (α, u) is in * implies that γ · u ≤ α · u, and thus γ · u = α · u. This implies that α ∈ PO S (A), showing that ⊆ PO S (A), and hence, = PO S (A), proving the correctness of the algorithm.