Random utility models with ordered types and domains

Random utility models in which heterogeneity of preferences is modeled by means of an ordered collection of utilities, or types, provide a powerful framework for understanding a variety of economic behaviors. This paper studies the micro-foundations of ordered random utility models with the objective of meeting empirical requirements. This is done by working with arbitrary collections of ordered menus of alternatives, and by making no parametric assumptions about the type distribution. The model is characterized by a simple monotonicity axiom. Goodness-of-ﬁt measures are proposed, with proof provided of the strong consistency of extremum estimators deﬁned upon them. A statistical test for the model is also provided. © 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (


Introduction
Settings in which a collection of ordered utilities describes a specific behavioral trait, with higher utilities selecting higher alternatives, are common in economics.For example, consider the case of decisions under risk; utilities can be ordered by risk aversion, with higher utilities leading to safer lotteries.In such settings, heterogeneity is often modeled as a probability distribution over the set of utilities, i.e., as an ordered random utility model.Recently, Apesteguia et al. (2017) introduce ordered models in the standard theoretical setting of stochastic choice.They derive a set of properties that characterize the single-crossing random utility model (SCRUM), in which all alternatives and utilities can be ordered (á la single-crossing), and a mixture over the utilities rationalizes the stochastic choices in all possible menus of alternatives.
Bringing theoretical models to empirical applications is sometimes challenging.The practitioner often faces a small number of menus of alternatives, instead of the standard theoretical assumption of having choice data for every menu.This immediately highlights the need for properties that are more suitable for restricted domains of menus.Furthermore, in the present setting with ordered models, some families of utilities may not universally produce ordered choices, but they may do so in the restricted domain of observed menus.In this paper, we study theoretically ordered random utility models for arbitrary domains of menus.
We consider the situation where an analyst has a dataset describing the observed choice frequencies over an arbitrary collection of menus of alternatives.The dataset represents either the aggregate choices of a heterogeneous population or the repeated choices of an individual subject to intrapersonal variation.The analyst has in mind a specific collection of ordered utilities T , maybe due to its analytical convenience or empirical prominence in the literature, such as CRRA expected utilities in the treatment of risk.The analyst then wonders whether a distribution over such ordered collection of utilities accounts for the data, that is, whether there is a random utility model over T , which we refer to as T -RUM, that can rationalize the data.
The first contribution of this paper is to provide a characterization of the choice frequencies that can be generated by a T -RUM.We use the following property, that we call T -Monotonicity.Suppose that the types that lead to alternatives B 1 in menu A 1 are a subset of the types resulting in the choice of alternatives B 2 in menu A 2 .In this case, T -Monotonicity states that the cumulative choice frequency of alternatives in B 1 within menu A 1 must be smaller than the cumulative choice frequency of alternatives in B 2 within menu A 2 .Interestingly, we show that, when the menus are ordered, T -Monotonicity is both necessary and sufficient.
The proof of the characterization theorem is fully constructive, and hence, when the property is satisfied by the data, we can determine the underlying type distribution that explains all choices.Moreover, we show that the model is uniquely identified on a subset of the set of types that we characterize.To ensure the applicability of our findings, we present a simple linear algorithm for the analysis of T -Monotonicity.We then discuss a generalization of the ordered domain assumption; we note that our analysis goes through even if the alternatives within a menu are not ordered, provided that every alternative in every menu is chosen by an interval of types.
The first part of the paper concludes with an extension of the model, where we allow for the possibility of observing choices of dominated alternatives, that is, alternatives that are never maximal and are hence predicted to have zero choice probabilities in the T -RUM.Our approach is to minimally extend our main model, by incorporating the possibility of mistakes in decisionmaking.
Given our interest in connecting theory with empirics, we then present some results of econometric interest dealing with finite data.First, observed choice frequencies, due to sampling issues, may violate T -Monotonicity and, consequently, we introduce choice-based goodness-of-fit measures.These are based on perturbing the underlying distribution of types.Our approach considers the smallest perturbation necessary to account for all observed choice frequencies, and implicitly defines a class of extremum estimators.Most importantly, subsequent analysis shows that any estimator within this class is strongly consistent.That is, as the number of observations per choice problem increases, the estimator converges to the true distribution of probabilities over the types.Second, we show how the model can be statistically tested.We exploit the i.i.d.nature of T -RUM which enables us to interpret the model as a collection of independent multinomial distributions with parameters linked through the distribution of types.We then propose an aggregated Pearson statistic to statistically test the model.
In the Online Appendix we provide a detailed and exhaustive guide for implementing the model, and illustrate each step in our guide using an existing experimental dataset that involves decision problems over lotteries.

Related literature
We start by elaborating on the main differences between this paper and Apesteguia et al. (2017). 1 As mentioned in the introduction, there are two fundamental differences between the two settings: (1) in this paper we use an arbitrary domain of menus which allows us to work with any dataset and with any collection of utilities that produce ordered choices, and (2) instead of deriving the collection of utilities from observed choices, we consider a fixed collection of utilities.The first point helps us bring the model to data, at the cost described in the second point. 2n immediate consequence of these differences is that the characterizing properties of SCRUM are useless in the present setting.This is the case because these properties, formulated over the universal domain, apply to pairs of menus related by set inclusion.With arbitrary domains, data may comprise any collection of menus of alternatives, such as disjoint menus, and the properties would be emptily satisfied.Conversely, our property of T -Monotonicity contains insights of the properties of Regularity and Centrality in the treatment of SCRUMs.However, T -Monotonicity is only applicable insofar as we are using a specific collection of utilities T and hence it is not applicable to the setting of SCRUMs.
In a recent theoretical contribution, Filiz-Ozbay and Masatlioglu ( 2023) study a random model using an ordered collection of choice functions rather than utilities and thus, importantly, provide the theoretical foundations for what can be considered a model of stochastic, boundedly rational, ordered choice.The main difference between their paper and ours is that we work on the practical implementation of random utility models.
A handful of recent empirical papers focus on exploiting the single-crossing condition.In a risk environment, Barseghyan et al. (2021) combine a family of expected utilities ordered à la SCRUM with models of limited attention.They show that with sufficiently rich variation in the menus of lotteries, point-identification of the parameters for risk and attention can be achieved, and provide a simple method to compute a likelihood-based estimator.Our paper contributes by providing foundational properties to the ordered models incorporated in this application, and by showing that a large class of consistent estimators, including maximum-likelihood, are readily available to the practitioner even when data is scarce.Our theoretical framework sets the basis for the consideration of other behavioral variables, such as attention, when data are scarce.For example, one could start by assuming a given ordered family of utilities and a given attention model and then reformulate the property of T -Monotonicity to account for attention considerations.Chiappori et al. (2019) also impose the single-crossing condition on individual risk preferences in a parimutuel horse-racing setting.The authors first derive the equilibrium of the model and give necessary conditions on the data implied at equilibrium and, when the data is rich, they ultimately identify the model.Our analysis shares with this paper the idea of using the critical type that determines the jump from one choice to the next one in the order.We show that this logic can be formalized as a property, T -Monotonicity, that it is not only necessary but also sufficient for ordered-choice rationalizability.Our results then pave the way for characterizing stochastic data that emerge from equilibrium conditions.
A series of applied papers have implemented parametric versions of the random utility model, over an ordered collection of utilities, to estimate a specific behavioral trait, most frequently, risk aversion. 3 Barsky et al. (1997) is one of the first examples of the use of this methodology, where the ordered structure of a menu involving lotteries is exploited to obtain population estimates of risk aversion and perform covariate analysis.Cohen and Einav (2007) use data on auto insurance contracts, showing that any given probability of accident leads to an ordered menu of deductibles and premiums, thereby facilitating the estimation of risk aversion.Andersson et al. (2020) use menus involving two states with fixed probabilities to show that choice variability is determined by cognitive ability rather than risk aversion.Our paper contributes to this applied literature by providing foundations for a general version of the model.
The econometrics literature on the non-parametric identification of ordered discrete choice models is also of relevance here (see Cunha et al., 2007, andreferences cited therein, andGreene andHensher, 2010 for a survey).The papers in this literature focus on identification relying on the existence of a relationship between the probability of choice of any alternative and the mass of types for which the alternative is optimal, which, given the structure, takes the form of an interval.However, there are no axiomatic exercises of any kind in this literature.Hence, the novelty of our paper is to bring the ordered choice logic from the applied and econometrics literature to a revealed preference setting that imposes minimal requirements on the data structure, to provide a novel and easily testable property, T -Monotonicity, and to show that it fully axiomatizes the ordered random type model.

Ordered random utility models and ordered domains
Let X be the set of all alternatives.We fix an ordered collection of utilities {U t } t∈T , where T = {1, 2, . . ., T }. 4 Given the ordinal nature of all our results, we could equivalently work with the corresponding collection of ordinal preferences.A random utility model over T , or T -RUM, is defined by a probability distribution ψ over T , which describes the probability mass with which each type is realized.In each menu of alternatives, one of the utility functions is independently realized according to ψ and maximized, thus determining the choice.Menus are finite subsets of alternatives.We work with an arbitrary collection of menus, {A j } j ∈J , where J = {1, 2, . . ., J }.Given the T -RUM with distribution ψ, the probability of choosing alternative x in menu j is ψ(T (x, j)), where T (x, j) denotes the set of types for which alternative x is the utility maximizer in menu j .
Ordered collections of utilities induce an order over some pairs of alternatives.We say that alternative x h is higher than alternative x l whenever there exists t * ∈ T \ {T } such that U t (x l ) > U t (x h ) ⇔ t ≤ t * .In this case we write x l x h , and, as usual, x y whenever x y or x = y.In words, x h is higher than x l if x h is the preferred alternative of high types (with at least type T expressing this preference) and x l is the preferred alternative of low types (with at least type 1 expressing this opposite preference).For instance, types can be ordered by risk aversion or altruism, and hence the notion of a higher alternative corresponds either to the notion of a safer lottery or to that of a more altruistic distribution.We now introduce the only relevant assumption in the paper: i.e., that every menu in the domain is ordered, in the sense that its maximal alternatives can be ordered by .

Domain of Ordered Menus.
For every j ∈ J , is complete over {x : T (x, j) = ∅}.Domains composed of ordered menus appear naturally when studying a particular behavioral trait.The following three economic applications illustrate this point: (1) Expected Utility.Let {EU t } t∈T ={1,2,...,T } be a collection of expected utilities ordered by increasing risk aversion, i.e., by increasing concavity of their respective monetary utilities.Thus, in this case, the induced relation represents the notion of a safer lottery.Classical results on second order stochastic dominance, such as Hammond (1974), guarantee that standard domains of menus of lotteries used in the study of risk aversion are ordered.5 (2) Quasi-Linear Utility.Consider pairs of the form (q, w), with q in an ordered set Q and w representing money.For example, q may describe the quality of a product, and w the income after salary.6Consider a collection of quasi-linear utilities, {QL t } t∈T ={1,2,...,T } , with QL t (q, w) = v t (q) + w, such that the family {v t } t∈T satisfies the well-known increasing differences condition (see, e.g., Topkis (1978) or Milgrom and Shannon (1994)).In this context, represents the notion of alternatives with higher quality, and it is immediate to see that every domain of menus of such objects is ordered.(3) Cobb-Douglas Utility.Consider a collection of Cobb-Douglas utilities on two-dimensional bundles, i.e., {CD t } t∈T ={1,2,...,T } , with The induced relation trivially corresponds to the idea that: (x 1 , x 2 ) (y 1 , y 2 ) if and only if x 1 > y 1 and x 2 < y 2 .It is again trivial to see that, in this setting, every domain of menus of bundles is ordered.7

A characterization of T -RUMs
Suppose that the analyst has access to a stochastic choice function p over an ordered domain of menus J .Formally, p is a map from X × J to [0, 1] such that, for every j ∈ J , p(x, j) > 0 implies that x ∈ A j , and x∈A j p(x, j) = 1.Consider the following property.
T -Monotonicity: T -Monotonicity captures the intuition that more support must lead to a larger choice probability.Whenever the set of types leading to alternatives B in menu j is contained in the set of types leading to alternatives B in menu j , the cumulated probability of alternatives in B must be lower than that of the alternatives in B .T -Monotonicity incorporates the well-known axiom of Regularity in the treatment of classical RUMs, which applies to menus related by set inclusion.In our setting, notice that if a menu is modified by the incorporation of new alternatives, the choice probability of any existing alternative cannot increase because the set of types for which it is maximal cannot expand.
An alternative way of interpreting T -Monotonicity uses the largest type for which an alternative is maximal in a menu: if such largest type for alternative x in menu j is below the largest type for alternative y in menu j , the cumulated choice probability of alternatives in j below x must be lower than that of the alternatives in j below y. 8 Under this version, it is immediate to see how our property also incorporates the logic of ordered RUMs described by the Centrality axiom of SCRUMs.Basically, the above reasoning implies that when considering three alternatives such that x y z, the elimination of alternative z from the menu does not modify the critical type of alternative x, and hence it does neither change its choice probability.

Theorem 1. In a domain of ordered menus, p satisfies T -Monotonicity if, and only if, p is a T -RUM.
Theorem 1 responds to the situation where an analyst has a dataset p, and hypothesizes whether a distribution over a given ordered collection of utilities T accounts for the data.Theorem 1 gives an exact answer; dataset p is rationalized by T à la RUM if and only if the data satisfies T -Monotonicity.
The following discussion is instrumental in understanding the strategy of the proof, and hence the result.First, we show that the ordered structure of menus guarantees that choices are ordered, that is, for any menu j and alternative x ∈ A j , the set of types T (x, j) is always an interval and the set of types y∈A j ,y x T (y, j) is of the form {1, 2, . . ., max T (x, j)}.That is, in every menu, choice is ordered by the set of types, with lower types selecting lower alternatives.This suggests, further, that the most relevant types are those that are the largest maximal types for the different alternatives and menus.We denote the set of such types by T I .
The proof then constructs a correspondence F on T I , that will provide the basis for the construction of the CDF over the relevant set of types that rationalizes choice.Given the ordered choices, F assigns to each type t ∈ T I with t = max T (x, j) the sum of the choice probabilities of all alternatives lower than x in menu A j .The proof then uses T -Monotonicity to show that F satisfies the conditions to be a CDF, that is, it is a single-valued increasing map, with F (T ) = 1.Since types outside T I are inconsequential for choice, we can then construct a monotone extension, G, of F over the entire collection of types.By a monotone extension, we mean that F (t) = G(t) for every t ∈ T I and that whenever t 1 < t 2 , G(t 1 ) ≤ G(t 2 ).Then, the probability distribution ψ derived from G is shown to rationalize the data.

Identification
The proof of Theorem 1 is based upon the construction of the model on the set of types T I , which are in fact the ones at which the CDF of the T -RUM is fully identified.While the above discussion shows that types in this set are identified, we now discuss how, for every t / ∈ T I , the CDF at t cannot generally be fully identified.For example, consider t 1 < t < t 2 , where t 1 and t 2 are two consecutive types in T I .If the value of the CDF at t 2 is strictly greater than its value at t 1 , the value of the CDF at t can be any value between these two.This is because, given G(t 1 ) < G(t 2 ), the value G(t) ∈ [G(t 1 ), G(t 2 )] is irrelevant for choice, since every t ∈ (t 1 , t 2 ] has the same maximal alternative in every menu in the domain.Therefore, the CDF at type t can only be fully identified in the extreme case where G(t 1 ) = G(t 2 ), which implies G(t) = G(t 1 ) = G(t 2 ) and ψ(t) = 0. Obviously, data on more menus may expand the set of identifiable types.For example, in an experimental setting, the analyst may use these results to select the domain that allows the identification of the most important components of the CDF.

Complexity
We now discuss the computational complexity of testing the rationalizability of data.To do this, we use the alternative statement of T -Monotonicity discussed above, that formally reads as: max T (x, j) ≤ max T (x , j ) ⇒ z∈A j ,z x p(z, j) ≤ z∈A j ,z x p(z, j ).We argue that testing for this property is a simple task.Let k J = j ∈J |A j |, which corresponds to the total number of possible pairs of alternatives and menus, (x, A j ) with x ∈ A j .A brute force algorithm entails checking all such pairs, requiring a total number of k J (k J − 1) checks, which is already polynomial in the input k J .This can be significantly improved, moreover, by using the following recursive argument.Let all pairs of alternatives and menus be ordered in some way.Suppose that the property holds for the first n pairs.When considering pair n + 1, one should relate this pair to, at most, two previously-considered pairs.For the case where there are previously considered pairs with the same largest maximal type, one can select any of these to check for the equality of their (cumulative) choice probability with that of pair n + 1.Otherwise, one needs to select the pairs with largest maximal types closest, above and below, to the current one, and check that the (cumulative) choice probability of pair n + 1 lies between the choice probabilities of these two.Hence, this algorithm involves at most 2k J comparisons, and is therefore very low in complexity.

Interval domain
Domains of ordered menus have the attractive feature of being built on the basis of the intuitive "higher than" relation, which is arguably the building block for the empirical study of any given behavioral trait.This section discusses a generalization of this domain assumption sufficient for our purposes.
In an ordered menu j , if t 1 < t 3 are types leading to the choice of x, t 2 ∈ (t 1 , t 3 ) must also result in the same choice, given that, otherwise, the alternative chosen by t 2 would be incomparable with x according to .That is, the set of types T (x, j) is an interval set, with formal proof of this fact given in Claim 1 of Theorem 1.This leads us to consider the following domain.

Domain of Interval Menus.
For every j ∈ J and x ∈ A j , T (x, j) is an interval.This is indeed a larger domain, as the following example illustrates.Consider {x, y, z}, and the following types: It is immediate to see that x z and y z, since x is maximal for types 1 and 3 and y is maximal for type 2. Hence, menus {x, y} and {x, y, z} are not ordered and fall outside our initial domain assumption.However, {x, y, z} satisfies the interval structure, since type 1 chooses x, type 2 chooses y and type 3 chooses z.
The following corollary shows that the main result also applies in this domain.We omit its proof, that follows immediately from the fact that Claim 1 in the proof of Theorem 1 follows now by assumption.

Tremble
Given a menu j , denote by D j the set of alternatives that are not maximal for any of the utilities, i.e., D j = {x ∈ A j : T (x, j) = ∅}.We know that T -RUMs assign zero probability to the choice of alternatives in D j , and their choice therefore constitutes a mistake.In this section, we extend T -RUMs in order to allow for the possibility of such mistakes.
The T -RUM with tremble (T -RUMT) is defined by means of a possibly menu-dependent tremble function λ : J → [0, 1], such that λ j = 0 whenever D j = ∅, and a probability distribution ψ over the set of types.Then, for any menu j , the total mass of choices from D j is given by λ j , while the choice probability of x ∈ A j \ D j is equal to ψ(T (x, j ))(1 − λ j ).That is, mistakes occur with probability λ j , and otherwise behavior is governed by the T -RUM with distribution ψ.Notice that this tremble version of the model is agnostic about the size and distribution of mistakes in D j and allows different trembles in different menus.The only assumption in the trembling mechanism is that when choices are not mistakes, they follow a T -RUM.
T -RUMT is a simple model whose characterization follows immediately from the analysis in Theorem 1.Given the stochastic choice function p, denote by p the conditional stochastic choice function9 : otherwise.
We then have: Proposition 1.In a domain of ordered menus, p is a T -RUMT if, and only if, p satisfies T -Monotonicity.
Remark 1.The implementation of the tremble in T -RUMT is flexible.For example, one could discard the observed choices of the non-maximal alternatives, and estimate a T -RUM over the remaining data.Alternatively, one could adopt a particular structure for tremble.The simplest way would be to use a λ that is menu-independent and such that choices over non-maximal alternatives are uniformly random.
The main assumption of the trembling model is that the conditional choice probabilities over the maximal alternatives follow a T -RUM.There are other plausible trembling mechanisms that would also lead to this property.For example, one could consider the possibility of mistakes occurring uniformly over the entire menu j , not just over D j , and behaving a la T -RUM otherwise.

Estimation and statistical testing
We now return to our base model described in Section 3, and present several results on the estimation and statistical testing of T -RUMs when the data are finite.Formally, the data form a map z : X × J → Z + , describing the number of observed instances in which each alternative is chosen in each menu, with z(x, j) > 0 implying that x ∈ A j .For every j ∈ J , we denote by z(•, j) the vector describing the observed choices in menu j , and by Z j = x∈A j z(x, j) > 0 the total number of observations for this menu.The observed choice frequencies in menu j are therefore zj (•) = z(•,j ) Z j .

Estimation
Suppose that the data are generated by a T -RUM but that, due to sampling issues, choice frequencies violate T -Monotonicity.In this section, we provide a class of estimators that are based on a notion of rationalizability and show that they are strongly consistent.For this, we assume that T = T I , that is, for ease of exposition, we work directly with the set of types that are fully identified.
Consider the following generalization of the rationalizability notion embedded in T -RUMs.Let denote the set of all probability distributions over the given set of types.Let d : × → R + be a continuous function measuring the divergence between two probability distributions such that d(ψ, ψ) = 0 and |ψ − ψ | |ψ − ψ | implies d(ψ, ψ ) < d(ψ, ψ ).10 Now, let f : R J + → R + be a continuous function aggregating all deviations across menus, such that f (0, . . ., 0) = 0 and γ We then say that the data z is -rationalizable if there exist distributions (ψ, {ψ j } j ∈J ) such that: (i) for every j , distribution ψ j rationalizes the choice frequencies in menu j , zj , and (ii) f (d(ψ, ψ 1 ), . . ., d(ψ, ψ J )) ≤ .That is, there exists a fundamental probability distribution, ψ, over the set of types, but at the moment of choice, the menu-dependent distribution ψ j determines choices in menu j .However, the aggregate measure of deviations between ψ and {ψ j } j ∈J is smaller than or equal to .We can now present the following straightforward result: Proposition 2. For every z there is a minimum such that z is -rationalizable.Moreover, z is 0-rationalizable if, and only if, the stochastic choice function defined by its choice frequencies is a T -RUM.
With large enough, any finite choice data generated by a T -RUM can be -rationalized by allowing sufficiently large menu-dependent perturbations.The continuity of maps d and f guarantees a minimal rationalizability value.The second part of the result describes how -rationalizability constitutes a generalization of rationalizability by a T -RUM, as the latter requires no perturbation whatsoever, i.e., = 0. Hence, when finite data violate T -Monotonicity, a natural goodness-of fit measure for the model is given by the smallest magnitude of that yields -rationalizability. 11Moreover, the distribution ψ in which yields the minimal value , represents an intuitive estimator, which we now show to be strongly consistent. 12Theorem 2. ψ is strongly consistent.
The proof of the strong consistency of the estimators built on the basis of -rationalizability is related to known results of extrema estimators for a multinomial model (see, e.g., van der Vaart, 2000).The proof takes care of the fact that our model involves collections of multinomials, one per menu, the parameters of which are connected by the underlying distribution of types.Also, given that we only use the monotonicity and continuity of functions d and f , the argument applies also to non-additive estimators.

Statistical testing
We now discuss a method for statistically testing the model.Given the multinomial structure of the choices in each of the menus, we can intuitively construct the following statistic, which aggregates the standard Pearson statistic across menus in J : Z j ψ(T (x, j )) .
We can then show the following result.
Theorem 3. C(z) converges to a Chi-square with j ∈J |A j | − J degrees of freedom.
The proof of the theorem immediately follows from the independence of the multinomial distributions across menus in a T -RUM.

Final remarks
We close by briefly commenting on some differences between T -RUMs and additive RUMs (ARUMs), which are also very popular in applied work.First, note that, in a T -RUM, there is a given ordered collection of utilities over which the individual is assumed to have a preference distribution.In an ARUM, the analyst assumes that the individual has a particular utility function that is subject to additive, cardinal, shocks.This implies that, in an ARUM, the individual has a 11 See Apesteguia and Ballester (2021) for a discussion of goodness of fit measures for stochastic choice models. 12Several natural examples of estimators belong to this family.In standard maximum-likelihood or least squares or minimum Chi-square estimations, for instance, d is a map that operates on each subset of types T (x, j), considering the logarithmic ratio, or the square of the difference between the two distributions, or the normalized square of the difference.Then, in these three standard cases, f additively aggregates all of the above-mentioned distances across menus.distribution with strictly positive mass over every single possible utility function over the set of alternatives, where the distribution is shaped by some assumption on the shocks (e.g., logistic, normal).Furthermore, T -RUM requires only an ordinal understanding of the utility functions at stake, while ARUM requires a cardinal interpretation of the utility functions, since shocks enter additively and choice probabilities are determined by cardinal utility differences.Finally, T -RUMs are monotone, in the sense that shifts in the distribution over types generate intuitive shifts in the choice distributions, thus facilitating the interpretation of the relevant behavioral parameters.However, as shown in Apesteguia and Ballester (2018), the typical implementation of ARUM in risk settings, combining expected utility and i.i.d.additive errors, may suffer from severe non-monotonicity.

Declaration of competing interest
We have no conflict of interest.
We now consider the sub-collection of types T I ⊆ T and the correspondence F : T I ⇒ [0, 1] defined by: T I = {t ∈ T : there exists (x, j ) such that max T (x, j ) = t}, and k ∈ F (t) whenever there is (x, j ) such that t = max T (x, j ) and k = y∈A j ,y x p(y, j ).Claim 3. F is a single-valued increasing map.
Proof of Claim 3. To see this, consider two types t, t ∈ T I such that t ≤ t .By definition of T I , there exist pairs (x, j) and (x , j ) such that t = max T (x, j) and t = max T (x , j ).By Claim 2, and the fact that t ≤ t , we know that z∈A j :z x T (z, j) = {1, 2, . . ., t} ⊆ {1, 2, . . ., t } = z∈A j :z x T (z, j ).T -Monotonicity guarantees that z∈A j ,z x p(z, j) ≤ z∈A j ,z x p(z, j ), with equality when t = t , which proves the claim.p(y, j ).Given that p is a stochastic choice function, y∈A j \{x} p(y, j) ≤ 1, and, given the last inequality, it must in fact be equal to 1. Consequently, p(x, j) = 0, proving the claim.
Proof of Claim 5. To see the first part, consider any menu j and let x be the alternative such that T ∈ T (x, j).It can only be the case that max T (x, j) = T , and hence, T ∈ T I .To see the second part, consider any menu j , and any alternative y ∈ A j such that T (y, j) = ∅.Since menu j is ordered, the joint consideration of types max T (y, j) and max T (x, j) = T guarantees that y x.Hence, the use of T -Extremeness and the definition of stochastic choice function guarantee that 1 = y∈A j p(y, j) ≥ y∈A j ,y x p(y, j) = F (t) ≥ y∈A j :T (y,j ) =∅ p(y, j) = 1, which proves the claim.
Having constructed a T -RUM that rationalizes all choice probabilities, we have proved the sufficiency of the property and hence the theorem.
Proof of Proposition 1. Necessity is immediate.To see sufficiency, notice that p must be a T -RUM.The techniques in the proof of Theorem 1 can be used to construct the corresponding distribution ψ , and the definition of λ j = x∈D j p(x, j) completes the T -RUMT.The claim then follows immediately.
Proof of Proposition 2. We start by proving the first part.For a single menu j ∈ J , notice that the set of distributions that rationalizes data in j is nonempty.As a result, the existence of a minimum value of such that z is -rationalizable follows directly from the compactness of × J and the continuity of d and f .Moreover, if there exists a distribution ψ rationalizing all choices across menus, 0-rationalizability holds due to the fact that f (d(ψ, ψ), . . . , d(ψ, ψ)) = f (0, . . ., 0) = 0. Finally, if such a distribution does not exist, the data cannot be 0-rationalizable because of the strict monotonicity properties of both d and f .This concludes the proof.
Proof of Theorem 2. Given any distribution over types ψ and data frequencies z, consider the value g(ψ, z) = min f (d(ψ, ψ 1 ), . . ., d(ψ, ψ J )), subject to ψ j rationalizing the choice probabilities zj .Note that, using the same logic as in Proposition 2, g is well-defined.Consider a sequence of data functions {z n } ∞ n=1 with lim n→∞ Z n j = ∞ for every j ∈ J .Given the definition of the estimator and the properties of f and d, it follows that the estimator for z n is ψ n = arg min ψ∈ g(ψ, zn ).Now suppose that the sequence of data functions is generated by a T -RUM with probability distribution ψ * ∈ .Consider menu j .For every alternative x ∈ A j such that ψ * (T (x, j)) = 0, either because T (x, j) = ∅ or because no mass is associated to the types for which x is maximal, we know that zn (x, j) = 0 always holds.For every alternative x ∈ A j such that ψ * (T (x, j)) > 0, standard arguments guarantee that the multinomial i.i.d.choices in menu j generate frequences zn (x, j) that converge, almost surely, to ψ * (T (x, j)).Thus, the finiteness of each menu and of J guarantees that zn converges, almost surely, to the choice probabilities generated by ψ * .It then follows immediately that ψ n converges, almost surely, to ψ * .This concludes the proof.
Proof of Theorem 3. As discussed in the proof of Theorem 2, the i.i.d.nature of the T -RUM guarantees that we can understand choices in any given menu j as a multinomial distribution where entry x has probability ψ(T (x, j)).It is well known that, as the number of observations grows, the statistic x∈A j (z(x,j )−Z j ψ(T (x,j ))) 2 Z j ψ(T (x,j )) converges to a chi-square distribution with |A j | − 1 degrees of freedom.The i.i.d.nature of the model also guarantees that, even though their parameters are linked by ψ and the domain structure is ordered, the multinomial distri-