A Canon of Probabilistic Rationality

We prove that a random choice rule satisfies Luce's Choice Axiom if and only if its support is a choice correspondence that satisfies the Weak Axiom of Revealed Preference, thus it consists of alternatives that are optimal according to some preference, and random choice then occurs according to a tie breaking among such alternatives that satisfies Renyi's Conditioning Axiom. Our result shows that the Choice Axiom is, in a precise formal sense, a probabilistic version of the Weak Axiom. It thus supports Luce's view of his own axiom as a"canon of probabilistic rationality."


Introduction
In 1977, twenty years after proposing it, Duncan Luce commented as follows about his celebrated Choice Axiom: 1 "Perhaps the greatest strength of the choice axiom, and one reason it continues to be used, is as a canon of probabilistic rationality. It is a natural probabilistic formulation of K. J. Arrow's famed principle of the independence of irrelevant alternatives, and as such it is a possible underpinning for rational, probabilistic theories of social behavior." This claim already appears in his 1957 and 1959 works that popularized the axiom and the resulting stochastic choice model. 2 The conceptual proximity of Arrow's principle, typically identified with the set-theoretic version of the Weak Axiom of Revealed Preference (WARP), 3 and Luce's Choice Axiom is indeed often invoked. As well-known, the former plays a key role in deterministic choice theory, the latter in stochastic choice theory.
Yet, the formal relation between these two independence of irrelevant alternatives (IIA) assumptions has remained elusive so far. 4 For instance, in analyzing several different IIA axioms Ray (1973)  1. its support, the set of alternatives that can be chosen, is a rational choice correspondenceà la Arrow (1948Arrow ( , 1959, so it consists of alternatives that are optimal according to some preference; 1 Luce (1977, p. 229), emphasis added. 2 See Luce (1957, p. 6) and Luce (1959, p. 9). 3 Arrow himself put forth this version of Samuelson's WARP in his 1948 and 1959 works. 4 See the discussion of Peters andWakker (1991, p. 1789) and Wakker (2010, p. 373).
2. tie-breaking among the optimal alternatives is consistent in the sense of conditional probabilityà la Renyi (1955Renyi ( , 1956.
In this way, our analysis formally supports Luce's "canonical rationality" claim for his Choice Axiom via a lexicographic composition of deterministic rationality (WARP) and stochastic consistency (Renyi's Conditioning Axiom). This is the set-theoretic form of WARP considered by Arrow (1948Arrow ( , 1959. Its IIA nature is best seen when Γ is a function: B ⊆ A and Γ (A) ∈ B =⇒ Γ (B) = Γ (A) In words, adding suboptimal alternatives is irrelevant for choice behavior.
We denote by ∆ (X) the set of all finitely supported probability measures on X and, for each A ⊆ X, by ∆ (A) the subset of ∆ (X) consisting of the measures assigning mass 1 to A.
Given any alternative a ∈ A, we interpret p A ({a}), also denoted by p (a, A), as the probability that an agent chooses a when the set of available alternatives is A.
More generally, if B is a subset of A, we denote by p A (B) or p (B, A) the probability that the selected element lies in B. 5 This probability can be viewed as the frequency with which an element in B is chosen. In particular, the set of alternatives that can be chosen from A is the support of p A , given by The condition p A (A) = 1 guarantees that it is a non-empty subset of A, so that the support correspondence Finally, the standard way of comparing the probabilities of choices in two different sets B and C are the odds in favor of B over C, that is, # of times an element in B is chosen # of times an element in C is chosen for all B, C ⊆ A. As usual, given any b and c in X, we set p (b, c) = p (b, {b, c}) and

Luce's model
The classical assumptions of Luce (1959) on p are: The latter axiom says that the probability of choosing an alternative a from the choice set A is the probability of first selecting B from A, then choosing a from B (provided a belongs to B). As observed by Luce, formally this assumption corresponds to the fact that {p A : A ∈ A} is a conditional probability system in the 5 Formally, x → p (x, A) for all x ∈ X is the discrete density of p A , but with an abuse of notation p A (·) is identified with p (·, A); we also write p A (a) instead of p A ({a}).
sense of Renyi (1955Renyi ( , 1956). 6 Remarkably, Luce's Choice Axiom is also equivalent to: This axiom says that the odds for a against b are independent of the other available alternatives. 8 Continuity Given any x, y ∈ X, if {x n } n∈N converges to x, then p (x n , y) > 0 for all n ∈ N =⇒ p (x, y) > 0 This axiom has a natural interpretation: if, eventually, x n may be always chosen (rejected) over y, and x n converges to x, then x can be chosen (rejected) over y. Continuity is automatically satisfied under Full Support as well as when X is countable and endowed with the discrete metric.

Main result
The next result generalizes Luce's Theorem 1 by getting rid of the Full Support assumption.

Theorem 2
The following conditions are equivalent for a random choice rule p : A → ∆ (X): (i) p satisfies the Choice Axiom; (ii) there exist a function α : X → R and a rational choice correspondence Γ : for all A ∈ A and all a ∈ A.
In this case, Γ is unique and given by Γ (A) = supp p A for all A ∈ A.
Since Γ is a rational choice correspondence, the relation ≻ defined by is a strict preference (see Kreps, 1988) and the corresponding weak preference When X is countable, is automatically represented by a utility function u and so we have In general, some additional conditions are needed, as next we show. of strict preferences such that, for all a ∈ A ∈ A, In particular, a Random Preference Model is Lucean if p (·, A) has the Luce form (LM).
A piece of terminology: the lexicographic composition of two binary relations ≻ and ≻ ′ is the binary relation ≻ • ≻ ′ defined by For instance, > 1 • > 2 is the usual lexicographic preference on the Cartesian plane. 9 We can now state the announced characterization. 9 Here > i is defined by (a 1 ,

Proposition 4 The following conditions are equivalent for a random choice rule
p : A → ∆ (X): (i) p satisfies the Choice Axiom; (ii) supp p : A → A is a rational choice correspondence and (iii) there exist a strict preference ≻ on X and a Lucean Random Preference Model This result presents two "deconstructions" of the Choice Axiom that both shed light on the second tie-breaking stage in (CA).
Specifically, to interpret (ii) observe that WARP says that, if a can be chosen from A (i.e., p A (a) > 0) and belongs to B ⊆ A, then it can be chosen also from B.
But, this axiom is silent about the relation between the frequencies of choice in the two sets A and B. Formula (COND) requires them to be related by the Conditioning Axiom of Renyi (1955Renyi ( , 1956), a classical probabilistic consistency condition. In particular, (COND) per se is weaker than Luce's Choice Axiom, which imposes To interpret (iii), note that the first-stage preference ≻ determines the support of p, while the second stage Random Preference Model {≻ ω } ω∈(Ω,F ,Pr) is the formal description of the Lucean tie-breaking among optimizers that we previously discussed.
Finally, (iii) also says that, when X is countable, random choice rules that satisfy the Choice Axiom are random utility models (RUM), something not obvious from the definition. 10 This opens the way to the study of general compositions of strict 10 See Section A.1 below for an independent RUM representation.
preferences and random utility models. The object of current research, a such study goes beyond the scope of this note.

Remarks
where u is a utility function that rationalizes Γ, {ǫ x } x∈X is a collection of independent errors with type I extreme value distribution, specific mean α (a), common variance π 2 /6, and λ is the noise level. In fact, Our analysis thus shows that, when noise vanishes, optimal choice is governed by u and tie-breaking among optimal alternatives is stochastically driven by alternative-specific biases captured by α.

A similar interpretation arises when adopting the perspective of Matejka and
McKay (2015) on the Multinomial Logit Model as the outcome of an optimal information acquisition problem. In this case, u is the true (initially unknown) payoff of alternatives, α captures a prior belief on payoffs held before engaging in experimentation, and λ is the cost of one unit of information.
Here our analysis shows that, when the cost of information vanishes, optimal alternatives are selected without error, and prior beliefs only govern the tiebreaking among such alternatives.

Related literature
The study of the relations between axiomatic decision theory and stochastic choice has been recently an active field of research. Horan (2020) and Ok and Tserenjigmid (2020) are the most recent works that we are aware of. The former also provides an insightful review of the state of the art. The latter expands on the main conceptual topic of this note: the relation between deterministic and probabilistic "rationality." Horan (2020) axiomatically unifies Luce (1956Luce ( , 1959 in a random choice model of imperfect discrimination of the form where Γ is a utility correspondence based on α. Specifically, in Horan, Γ describes the degree of imperfection in the discrimination of the α-values of alternatives; on the contrary, in this note α and Γ are independent, with the former tie-breaking the optimizers identified by the latter.
Horan also compares and provides alternative axiomatizations of several "Gen-

A Proofs and related analysis A.1 Independent RUM representations
At the end of Section 3, we observed how Proposition 4.(iii) shows that, when X is countable, random choice rules that satisfy the Choice Axiom are random utility models. Here we expand on this topic by providing an explicit independent random utility representation for the random choice rule (CA) of Theorem 2, which holds whenever Γ is the "arg max" of a utility function u : X → R with discrete range.
Note that, while this requires u (X) to be countable, no assumption is made on the cardinality of X.
for all A ∈ A and all a ∈ A.
Proof Let {V x } x∈X be a collection of independent random variables such that for all A ∈ A and all a ∈ A, and assume that −1 < V x (ω) < 1 for all x ∈ X and all ω ∈ Ω. 12 Since u (X) is discrete, for each x ∈ X there exists a constant r x > 0 which only depends on u (x) such that u (x) > u (y) =⇒ u (x) − r x > u (y) + r y Define U x = u (x) + r x V x and note that {U x } x∈X is a collection of independent random variables too. Now arbitrarily choose A ∈ A and set B = arg max z∈A u (z) and C = A \ B.
Two cases have to be considered. But, for all c ∈ C = A \ B and all ω ∈ Ω, Thus, U a (ω) > U c (ω) for all c ∈ C and all ω ∈ Ω, so If instead c ∈ C, then taking a ∈ B as above, U a (ω) > U c (ω) for all ω ∈ Ω, then as wanted.

A.2 Proofs
A preference on X can be given in either strict form, ≻, or weak form, .
• In the first case, ≻ is required to be asymmetric and negatively transitive, and is defined by • In the second case, is required to be complete and transitive, and ≻ is These approaches are well known to be interchangeable, 13 and for this reason we call weak order both ≻ and with the understanding that they are related by the equivalent (1) or (2). 13 See Kreps (1988, p. 11).
Lemma 6 Let p : A → ∆ (X) be a random choice rule. The following conditions are equivalent: (ii) p satisfies the Choice Axiom; (iv) p satisfies Odds Independence;

Moreover, in this case, p satisfies Positivity if and only if it satisfies Full Support.
Proof (i) implies (ii). Choose as C the singleton a appearing in the statement of the axiom. (iii) implies (iv). Let A ∈ A and arbitrarily choose a, b ∈ A such that p (a, A) /p (b, A) = 0/0. By (iii), three cases have to be considered: (i) p is a random choice rule that satisfies the Choice Axiom; (ii) p is a random choice rule such that σ p is a rational choice correspondence, and for all B ⊆ H ∈ A and all a ∈ σ p (H) ∩ B; (iii) there exist a function v : X → (0, ∞) and a rational choice correspondence In this case, Γ is unique and coincides with σ p .
Proof (iii) implies (i). Let p be given by (4) with Γ a rational choice correspondence and v : X → (0, ∞). It is easy to check that p is a well defined random choice rule, that the support correspondence supp p coincides with Γ, and that for all Y ⊆ X and all A ∈ A.
Let A, B ∈ A be such that B ⊆ A and a ∈ B. We have two cases: • If a ∈ Γ (B), then a ∈ Γ (A) and p (a, B) • Else a / ∈ Γ (B), and since a ∈ B, it must be the case that a / ∈ Γ (A), so These cases prove that p satisfies the Choice Axiom. (ii) implies (iii). Let p : A → ∆ (X) be a random choice rule such that σ p is a rational choice correspondence, and that satisfies (3). Since, σ p is a rational choice correspondence, then the relation is a weak order on X; and its symmetric part ∼ is an equivalence relation such that Moreover, by Theorem 3 of Arrow (1959), it follows that in particular, all elements of σ p (A) are equivalent with respect to ∼, and for all S ∈ A consisting of equivalent elements.
Let {X i : i ∈ I} be the family of all equivalence classes of ∼ in X. Choose a i ∈ X i for all i ∈ I. For each x ∈ X, there exists one and only one Since x ∼ a i , then r (x, a i ) ∈ (0, ∞); and so v : X → (0, ∞) is well defined. Consider any x ∼ y in X and any S ∈ A consisting of equivalent elements and containing x and y. Notice that, by (6), σ p (S) = S, hence x ∈ σ p (S) ∩ {x, y}, then by (3) with We are ready to conclude our proof, that is, to show that (4) holds with Γ = σ p .
Let a ∈ X and A ∈ A. If a / ∈ σ p (A), then p (a, A) = 0 because σ p (A) is the support of p A . Else, a ∈ σ p (A), and, by (5), all the elements in σ p (A) are equivalent with respect to ∼ and therefore they are equivalent to some a i with i ∈ I. It follows that σ p (A) ∪ {a i } ∈ A and it is such that σ p (A) ∪ {a i } ⊆ X i . By (6), we applying (8) to the pairs (x, y) = (a, a i ) and (x, as wanted. As for the uniqueness part, we already observed that (iii) implies Γ = σ p .
Theorem 2 immediately follows. The set W of all weak orders on X is endowed with the σ-algebra W generated by the sets of the form Given ≻ and ≻ ′ in W, the lexicographic composition ≻ • ≻ ′ of ≻ and ≻ ′ is routinely seen to be a weak order too (see, e.g., Fishburn, 1974).
Proof Arbitrarily choose a, b ∈ X, and study • else if a ≻ b, then a ≻ • ≻ ′ b for all ≻ ′ in W, that is, • else, it must be the case that a ∼ b and a ≻ • ≻ ′ b if and only if a ≻ ′ b, that is, Therefore f is measurable since the counterimage of a class of generators of W is contained in W.
A Random Preference Model is a measurable function It is common practice to write ≻ ω instead of P (ω). The Random Selector p based on the RPM P is given by The latter is well defined because since P is measurable. Moreover, depending on P , the RS p might not define a random choice rule. For instance, if P is constantly equal to the trivial weak order according to which all alternatives are indifferent, then p (a, A) = 0 for all a ∈ A ∈ A such that |A| ≥ 2.
The proof of Proposition 4 hinges on the study of the composition of the functions f ≻ and P .
First, such a composition defines a random preference model, because -being a composition of measurable functions, it is measurable.
Second, the random selector based on the random preference model f ≻ • P is a lexicographic version of P , that first selects the maximizers of ≻, then breaks the ties according to P .
In order to state these results formally, we denote by Γ = Γ ≻ the rational choice correspondence induced by ≻. 14 Lemma 10 Let ≻ be a weak order, P = {≻ ω } ω∈Ω be a RPM, and p be the RS based on P . Then f ≻ • P = {≻ • ≻ ω } ω∈Ω is a RPM and the RS based on it is given by for all a ∈ A ∈ A.
14 Γ ≻ (A) = {a ∈ A : a b for all b ∈ A} also recall that a b if and only if a ⊀ b.
Proof We already observed that f ≻ • P = {≻ • ≻ ω } ω∈Ω is a RPM. By definition of random selector based on a RPM p ≻ (a, A) = Pr (ω ∈ Ω : a ≻ • ≻ ω b ∀b ∈ A \ {a}) We have to verify that this formula coincides with (9) for all a ∈ A ∈ A.
For each A ∈ A and each a ∈ Γ (A), set If ω ∈ J, then a ≻ ω c for all c ∈ Γ (A) \ {a}; take any b ∈ A \ {a}, • if b is such that b / ∈ Γ (A), then, a ≻ b and hence a ≻ • ≻ ω b, • else b ∈ Γ (A), then a ∼ b and a ≻ ω b, again a ≻ • ≻ ω b, then a ≻ • ≻ ω b for all b ∈ A \ {a}, thus ω ∈ K.
Conversely, if ω ∈ K, then a ≻ • ≻ ω b for all b ∈ A \ {a}. Thus, for all b ∈ Γ (A) \ {a}, since relation a ∼ b, it must be the case that a ≻ ω b. Therefore ω is such that a ≻ ω b for all b ∈ Γ (A) \ {a}, and ω ∈ J.
Summing up, for all A ∈ A and a ∈ Γ (A), p (a, Γ (A)) = Pr J A (a) = Pr K A (a) = p ≻ (a, A) and the first line of (9) is true.
Let A ∈ A and a / ∈ Γ (A), then there existsb ∈ A \ {a} such that a ≺b, and for no ω it holds a ≻ • ≻ ωb , that is,