Preference for Knowledge

We examine the subjective value of gaining knowledge in a version of Savage’s model for decisions under uncertainty in which the received outcome provides information about which event has obtained. Decision makers commonly value such knowledge either because they want to use it in future decisions or because they are intrinsically interested in it. We find that in our model, the sure-thing principle and several other axioms of Savage are inconsistent with this value for knowledge about events. We provide a representation theorem for a subjective value of knowledge consisting of the sum of expected utility and a function of the information partition generated by the outcomes of an act. Bayesian updating of likelihood judgments and stationarity of preferences imply that decision makers rank acts by the sum of expected utility and the Shannon entropy. Our results also provide a novel critique of the necessity of Savage’s axioms for rational decisions under uncertainty.


Introduction
Decisions often involve choices between alternatives from which knowledge can be gained. Consider the number of test subjects recruited by an experimenter, or the too few hours of sleep left after finishing an addictive novel -both of these reveal a preference for acquiring knowledge. Sometimes, this knowledge is acquired as a means to achieve other goals; other times, this knowledge is completely inconsequential. The experimenter hopes to gain useful knowledge but Agatha Christie's Murder on the Orient Express is read for pleasure only. In this paper, we construct a theory of decision under uncertainty that accounts for preferences for knowledge in general.
We embed our analysis into Savage (1954)'s framework and axiomatization of rational behavior under uncertainty. In this model, decision makers have preferences over acts. An act yields an outcome for every possible state of the world. We assume that after an act is resolved, the decision maker infers from the obtained outcome and the chosen act what event obtained. The decision maker can therefore only distinguish two events via an act that yields distinct outcomes on the two events.
We argue that if a decision maker is interested in gaining knowledge about which event obtains, then several of Savage's axioms lose their normative appeal. This is easiest to see for the monotonicity axiom that requires that the preferences over achieving outcomes on an event are the same irrespective of what other outcomes are achieved on the other events. If the decision maker cares about knowledge, replacing an outcome by a worse outcome on some event may allow the decision maker to identify if this event obtains and may thus be desirable. For example, in order to find out whether an event obtains, you may reject a sure payoff of 100 Euros in favor of a gamble that pays you 100 Euros only if an event obtains and nothing otherwise.
We provide an axiomatization of a decision model that accounts for a preference for acquiring knowledge about events. For this, we weaken the axioms of Savage (1954). Savage's sure-thing principle imposes separability of preferences on conditional acts given an event from the preferences on conditional acts on the complement. We only assume this separability of preferences whenever the the event can be distinguished from its complement.
Similar adjustments are made to the monotonicity axiom and the likelihood outcome independence axiom. We add one axiom that guarantees that the preference for knowledge is separable from the preference for achieving good outcomes. Our representation theorem characterizes what we call a subjective knowledge utility in which the utility of an act is additively separable into the expected utility of the outcomes and the value of the information partition revealed by the act. The value of the information partition is in turn additively separable in the revealed events. We then analyze the value of knowledge in more detail both conceptually and axiomatically.
Conceptually, our axiomatic analysis gives rise to a utility representation of the value of knowledge from which we derive further concepts. First, we define a classification of decision makers by their knowledge preferences into knowledge-seeking, knowledge-neutral, and knowledge-averse and show that these can be linked to subadditivity and superadditivity properties of our utility representation of knowledge preferences. Second, we define a measure of the value of knowledge about events and a knowledge equivalent which roughly play the role of risk attitude measures and certainty equivalents in decisions under uncertainty. Third, we derive a curiosity relation with which decision makers can be ranked by their value of knowledge.
Axiomatically, we sharpen our decision model further, by adding axioms that guarantee that the decision maker does not care about obtaining specific pieces of knowledge but rather cares about the (quantified) amount of information gained about the state of the world. Probabilistic knowledge utility consists of the sum of expected utility and a very general information measure. It is characterized by assuming that the decision maker's value of knowledge only depends on the likelihoods of the involved events. This representation turns out not to be stationary in the sense that the value of information may change as additional knowledge is gained.
If the information measure is the Shannon (1948) entropy, we obtain entropic knowledge utility. It is characterized by a stationarity condition which ensures that after Bayesian updating of the probabilities on an event, the tradeoff remains the same between achieving good outcomes and being able to distinuishing two events with a fixed relative probability. Essentially, this allows the decision maker to use the same utility functions for evaluating conditional acts after Bayesian updating of the probabilities.
In the context of probabilistic knowledge utility we also define what we call the elasticity of curiosity that describes the change in intensity of knowledge preferences as the decision maker becomes more and more informed.
From this definition, we derive constant elasticity of curiosity utility which is a convenient two parameter functional form that incorporates two important aspects of knowledge preferences: the intensity of knowledge preferences and the elasticity of curiosity.
Stationary information cost utility accounts for the fact that decision makers may sometimes only be interested in a specific statistical variable but consider any information received as costly to process. For example, a decision maker who is interested in the mean of a random variable would prefer to know the sample mean of a data set instead of being informed about every single data point. Stationary information cost utility attaches a prior-independent value to each posterior probability measure over a random variable. The cost of processing the information is given by the Shannon entropy. We characterize this functional form by weaking the stationarity axioms used to characterize entropic knowledge utility. An example of a stationary information cost utility is the mutual information between the known events and the variable the decision maker is interested in.
In the light of our results, we briefly revisit Savage's endeavour to provide a foundation of statistics. Savage (1954) considered statistics as a practice of solving partition problems in which a decision maker tries to distinguish if an event obtains or not. In our framework, acts not only provide more or less good outcomes but also solve partition problems. This allows us to consider the choice of a hypothesis test or the choice of the number of subjects invited to an experiment as choices over acts, enabling us to study them within our theory. As an application we therefore briefly sketch how estimation problems, hypothesis tests, or optimal data collection fall into the domain of the decision problems to which our model applies.
The paper proceeds as follows. In section 2, we briefly review the related literature. The notation is introduced in section 3. In section 4, we introduce our model of knowledge in a Savage framework, propose our adjustments to Savage's axioms, and characterize subjective knowledge utility based on these axioms. Several concepts to analyze the value of knowledge of decision makers are introduced in section 5. We then axiomatically characterize various forms of knowledge preferences, specifically, probabilistic knowledge utility in subsection 6.1, entropic knowledge utility in subsection 6.2, constant elasticity of curiosity utility in subsection 6.3, and stationary information cost utility in subsection 6.4. Section 7 translates aspects of statistical estimation into our framework. Finally, section 8 discusses alternative ways of modeling knowledge and the necessity of Savage's axioms for rationality before section 9 concludes.
2 Literature Our paper is closely related to the literature on the intrinsic value of information, which has been studied theoretically (Grant, Kajii, & Polak, 1998;Golman & Loewenstein, 2018) and empirically (Bennett, Bode, Brydevall, Warren, & Murawski, 2016;Falk & Zimmermann, 2016;Kops & Pasichnichenko, 2020;Masatlioglu, Orhun, & Raymond, 2017). Kadane, Schervish, and Seidenfeld (2008) discuss whether it can be rational for a decision maker to pay for not receiving information and how a theory of decision can incorporate this. Our adjustment to Savage's model allows for a negative value of information and therefore also for such preferences. Wakker (1988a) shows that violations of the independence axiom imply that a decision maker displays information-averse behavior for some choices. Safra and Sulganik (1995) shows that for every information structure there exists a nonexpected utility maximizer that prefers less information to more. In a working paper, Alaoui (2012) studies under which circumstances behavioral decision makers attach value to useless information. Aczél (2008a, 2008b) study the functional form of our entropic knowledge utility representation in the context of utility for gambling.
In a recent working paper, Liang (2019) characterizes intrinsic information preferences in an Anscombe and Aumann (1963)  information using a multi-stage model in which choices over intial consumption and choices over menus of later consumption are observed. Caplin and Leahy (2001) model anticipatory feelings over the resolution of lotteries, such as suspense. In contrast to their model, in which probabilities are fully objective and are separated into two stages, the uncertainty in our paper is fully subjective on arbitrary event spaces.
There exists a large literature on the instrumental value of information. Blackwell (1953) shows the equivalence of ranking experiments by their instrumental value to expected utility maximizing decision makers and ranking experiments by statistical sufficiency. Ç elen (2012) extends this to max min preferences. Torgersen (1991) provides a comprehensive review of the literature on comparisons of experiments. Hilton (1981) summarizes some of the findings on the instrumental value of information and its relation to risk aversion, flexibility, wealth, and prior uncertainty. Snow (2010) examines the instrumental value of information under ambiguity. Bassan, Gossner, Scarsini, and Zamir (2003), Jakobsen (2016), Rosenberg (2006, 2010), Shmaya (2010, 2013), De Meyer, Lehrer, and Rosenberg (2010), Rosenberg, Salomon, and Vieille (2013) consider the value of information in various game-theoretic settings.
The conception of knowledge about the state of the world that we employ is the usual one employed in decision and game theory, using partitions of the state space. Aumann (1999) links this to a syntactic representation of knowledge. Gilboa and Lehrer (1991) characterize the value of information partitions that is expected utility rationalizable using methods from cooperative games. Their paper provides an answer to the question under which conditions the value of knowledge we characterize may arise from expected utility maximization of a subsequent decision problem. Azrieli and Lehrer (2008) extend their analysis to stochastic information structures. Ghirardato (2002) discusses how a dynamic consistency axiom that resembles the sure-thing principle expresses a nonnegative instrumental value of information. In a recent working paper, Galanis (2019) further extends the analysis of this relation between dynamic consistency and information value to ambiguity averse decision makers.
If the value of knowledge in our paper is seen as purely instrumental for solving future decision problems, then there is a natural relation to the value of flexibility (Kreps, 1979). Flexibility allows decision makers to react to future information while knowledge allows decision makers to use flexibility. Nehring (1999) axiomatizes preference for flexibility in a Savage framework. Studying the duality between these axiomatizations is an interesting avenue for further research.
3 Notation Let S be a set of states of nature. E denotes a σ-algebra of events on S. For any event E, let E = S − E be the complement of E in S. X is a set of outcomes. An act is a function a : S → X that is measurable, i.e., for all subsets of outcomes X ⊆ X, the preimage is an event, a −1 (X) ∈ E. In this paper we only consider simple acts, i.e., acts that are finite-valued |a(S)| < ∞.
The set of (simple) acts is denoted by A. If E ∈ E is an event, and f an act then f E : E → X denotes the restriction of f to E which we will call a conditional act. The set of conditional acts on an event E is denoted A E . The set of outcomes resulting from a conditional act f E and an event F ⊆ E is defined as the image f E (F) = {α ∈ X : f E (s) = α for some s ∈ F}. For convenience, we introduce several ways of denoting acts. If f and g are acts and E is an event, then f E g denotes the act that agrees with f on the event E and with g on the event E. Constant acts are simply denoted by their outcome, α ∈ X.
A decision maker ranks acts via a preference relation . f g means the act f is at least as good as the act g, f ∼ g is the symmetric part of the relation and means that the decision maker is indifferent between the two acts. denotes the asymmetric part of the relation and indicates strict preference. A function U : An event E is nonnull if for some outcomes γ E β β. We denote the set of null events by N ⊆ E. An atom is a nonnull event E that cannot be partitioned into two nonnull events. We assume throughout the paper that there are no atoms. This guarantees that the event space is sufficiently rich.
We make the following additional richness assumption that greatly simplifies our analysis. For every outcome α, there exists a countable number of indifferent outcomes α , α ,. . .. We can interpret the outcome α as the same as the outcome α but with a label attached to it that allows the decision maker to distinguish between α and α .
For some results, we also assume that the set of outcomes is rich enough such that conditional acts are outcome-solvable, i.e., for all events E and all conditional acts f E and some γ ∈ f E (E) there exists an outcome = γ, such that E γ ∼ f E γ.
4 Axiomatization of a Preference for Knowledge Savage (1954) did not specify what information the decision maker acquires about the state space as an act is resolved. To start, we clarify that Savage's axioms maintain their normative appeal in case the exact state of the world is always revealed when the act is resolved. 1 However, there are many decisions in which different information is acquired depending on which act is chosen.
To account for such decisions, we must clarify what information is acquired as an act is resolved.
We assume that the decision maker finds out which event obtains by combining the memory of which act she played with the information which outcome she experienced. Since this is a justified true belief, we say that the decision knows that this event obtains. Thus, if the decision maker receives outcome α from act f , then the decision maker knows that the event f −1 (α) obtains. The knowledge that will be acquired from this act can therefore be described by the An example is provided in Table   1. a is a constant act that does not reveal any knowledge about events. b yields distinct outcomes on events E and E and the decision maker can conclude from the outcome which of these two events obtains. We justify this choice of modeling knowledge and discuss alternative possibilities in section 8. alternative axioms where necessary. The normative position from which we discuss the axioms can be summarized as follows: "A decision maker may gain joy from acquiring knowledge, irrespective of whether this knowledge will be useful in future decisions. The decision maker may be willing to forego better outcomes in favor of gaining this knowledge. A theory of rational choice under uncertainty must allow for this behavior." We think this statement is uncontroversial to most decision makers. If you ever bought and read a book purely for entertainment and were excited about the events revealed in the book, most likely you will need to accept this statement. We say that a decision maker who exhibits the described behavior has a preference for knowledge. Our theory of a preference for knowledge about events naturally encompasses a theory of preference for information about events or states of the world. This is because knowledge of an event may provide information about other, correlated events. This will be made more precise in subsection 6.4.
We now show that Savage's axioms of expected utility are incompatible with a preference for knowledge, and provide alternative axioms that characterize a novel representation of preferences. P 1 (Weak Order).
is complete and transitive.
We will maintain the Weak Order assumption. The Blackwell order is an example of an informativeness ranking that does not fulfill completeness as it ranks experiments by whether they reveal unambiguously more information.
Relaxing the assumption of completeness may therefore be an interesting avenue for future research.
P 2 (Sure-Thing Principle). For all f , g ∈ A, and all α, β ∈ X, The sure-thing principle guarantees that we can ignore identical conditional subacts when comparing two acts. A simple example shows that the sure-thing principle is in direct conflict with a preference for knowledge. Consider the acts in Table 1. The sure-thing principle requires that a b if and only if c d.
However, b and c provide knowledge about E while a and d do not. A decision maker who would like to know whether event E obtains may therefore prefer b to a and c to d as long as the outcomes α and β are sufficiently close in value.
The issue is that a and b as well as c and d not only differ by a change in the outcomes but also by a change in the information partition. The sure-thing principle demands that the change in the information partition is irrelevant for preference. Notice however that in case the outcomes yielded by f E and g E on E all differ from α and β, then P2 is not inconsistent with a preference for knowledge because E is always part of the information partition. In this case, the sure-thing principle maintains its normative appeal.
We now formalize this idea using disjoint conditional acts.
Two conditional acts are therefore disjoint if they have no outcomes in common. This definition helps us to specify which events the decision maker can identify after an act is resolved. If f E and f E are disjoint, then after the act f , the decision maker will know with certainty whether event E has happened or not. We adjust the sure-thing principle in the following manner: 2 Axiom 2 (Information-Neutral Sure-Thing Principle). Let E ∈ E be an event.
Suppose f E and g E are disjoint from h E and k E . Then, The information-neutral sure-thing principle differs from the sure-thing principle in two respects. First, full separability of the preferences across E and E is limited to acts in which the decision maker learns with certainty whether event E obtains. This is achieved by requiring disjointness of the conditional acts on E and the conditional acts on its complements. Second, instead of only changing a single outcome, we allow for changes of conditional acts, i.e., the change in Equation (2) from h E to k E may not only change a particular outcome but may change several outcomes. This means that also the knowledge about events contained in h E and k E may differ. The independence condition therefore not only imposes separability of the preferences on outcomes but also on knowledge about events. For example, h E may achieve a single good outcome and k E achieves several poor outcomes but informs the decision maker about several subevents of E.
Savage's monotonicity axiom conflicts in a similar manner as the sure-thing principle with a preference for knowledge.

P 3 (Monotonicity)
. For all f ∈ A, E ∈ E − N, and all β, γ ∈ X, Monotonicity states that outcomes are ranked the same way on every event.
It implies in our previous example in Table 1 that a is preferred to d if and only if b is preferred to d. However, note that b reveals whether event E obtains but a and d do not. The decision maker may therefore judge d to be better than a because the outcome β is preferable to the outcome α and at the same time judge b to be better than d because b reveals event E. Note again that there is no issue with imposing the monotonicity axiom with respect to event E in case the outcomes on E are disjoint from the outcomes of E.
We therefore make the following adjustment to the monotonicity axiom: Axiom 3 (Information-Neutral Monotonicity). Suppose h E is disjoint from γ and β. Then, Information-neutral monotonicity implies that outcomes are identically ranked on all events as long as a change in outcomes does not change the information partition. By requiring disjointness, we guarantee that when combining β E and γ E with h E , the information about the event E is identical across the two acts. For example, β E γ E γ and γ β is a legitimate preference in our model. It means that the outcome γ is preferable to β but obtaining information about the event E may be more valuable than receiving the better outcome γ instead of β on the event E.
P 4 (Likelihood Outcome Independence). For all E, F ∈ E and all β, β , γ, γ ∈ X, such that γ β and γ β , The likelihood outcome independence axiom ensures that the decision maker has preferences consistent with the existence of a likelihood relation over events. For an expected utility maximizing decision maker who prefers γ E β to γ F β, we can conclude that the decision maker judges E to be more likely than F if γ β. No matter whether γ is a lot or just slightly better than β , the decision maker should prefer to realize γ on the more likely event. Thus, the decision maker should also prefer γ E β to γ F β .
For a decision maker who cares about knowledge, γ E β γ F β need not mean that E is strictly more likely than F. Instead, the decision maker may simply prefer to know whether E obtains to knowing whether F obtains. Indeed, if F = S, then γ F β = γ is completely uninformative and γ E β γ F β simply means that the decision maker is willing to accept the loss of realizing β instead of γ on event E in order to find out whether event E or event E obtains. Likelihood outcome independence then requires the decision maker to not only accept this loss but also any arbitrarily large loss of realizing β instead of γ on E. In this manner, likelihood outcome independence disallows tradeoffs between the value of knowledge and outcomes.
Yet again there are also cases in which likelihood outcome independence is consistent with a preference for knowledge. From γ E β β E γ = γ E β, we can indeed conclude that E is at least as likely as E; in both acts the decision maker learns whether the event E obtains. The former act must then be at least as good as the latter act in virtue of event E being at least as likely as its complement. We can generalize this idea to arbitrary disjoint events E and F by concluding that E is at least as likely as F if the decision maker weakly prefers to receive the good outcome γ on E and the bad outcome β on F as long as both acts yield the same information. We therefore change the likelihood outcome independence axiom to be consistent with this way of identifying likelihoods.
Axiom 4 (Information-Neutral Likelihood Outcome Independence). For all disjoint events E, F ∈ E and all distinct outcomes α, β, β , γ, γ ∈ X, if γ β, γ β , then In fact, it is straightforward to show that the above axiom implies likelihood outcome independence given the sure-thing principle. Under the weaker information-neutral sure-thing principle however, the two axioms are distinct.
For a preference fulfilling information-neutral likelihood outcome independence, there exists a well-defined likelihood relation * over events. 3 Definition 2 (Likelihood Relation). Let E, F ∈ E be events and α, β, γ ∈ X be distinct outcomes such that γ β. E is revealed by to be more likely than F, P 5 (Nontriviality). There are β, γ ∈ X such that γ β.
Nontriviality is consistent with a preference for knowledge. However, it imposes that the decision maker also has some nontrivial consequentialist preference and thus cannot only have a preference for knowledge.
Finally, Savage introduces a continuity assumption that guarantees that one can find sufficiently small events to measure the utility of every act in terms of an arbitrarily chosen outcome.
P 6 (Event Continuity). If f , g ∈ A are such that f g and α ∈ X, then there is a finite partition H of S such that for every H ∈ H, While we technically would be able to work with this axiom, we find that it is not quite in the spirit of our paper. Specifically, the change from f to α H f may involve different changes to the information partition than the change from g to α H g. We therefore employ a continuity axiom that ensures that preferences are continuous in monotone changes in events on which outcomes are achieved.
We state our continuity properties in terms of convergent sequences of acts in which both the information and the outcomes converge. Stating continuity properties using convergent sequences is quite standard in the context of decisions within a topological space such as consumer choice. To avoid introducing a topological structure on outcomes or acts, we use set-theoretic limits of events.
For a sequence f k , k = 1, 2, . . ., the notation f k → f means that the acts in the sequence f k become arbitrarily similar to f with the following condition.
Every outcome α is either acquired on more and more states or on less and less states as the sequence progresses. Moreover, the set of states in which the outcome obtains is in the limit the same as in f . We make this notion precise in Appendix A.
Continuity of the preference relation is then defined as follows: Axiom 6 (Continuity). If f k → f and g k → g and for all k, f k g k , then f g.
As sequences of acts converge to an act, the preference is not permitted to "jump" -similar acts must be similarly ranked.
This concludes our adjustments to Savage's original axioms. In addition, we assume that the intrinsic value of information is separable from the outcomes obtained. In principle, a decision maker may find information especially valuable in case a particular outcome is obtained. For example, knowing about how to drive a car will generally be more useful if one acquires a car. There are many cases in statistics, research, and media consumption in which it is plausible to assume that the benefits from knowledge are separable from the outcomes obtained. Nevertheless, we fully acknowledge that the following axiom is not a pure axiom of rationality.
Axiom 7 (Learning Independence). Suppose α ∼ α , δ ∼ δ are disjoint outcomes from f , g, and E ∩ F = ∅. Then, This axiom guarantees that the value of learning whether event E or F obtains is unrelated to whether the outcomes α or δ arise on these events.
Our main representation theorem based on the previously introduced axioms characterizes the sum of expected utility and the value of learning whether particular events obtain or not. We call this value a subjective knowledge utility because the value of knowing if an event obtains is subjectively determined. A decision maker may prefer learning about event E rather than event F simply because she finds E more interesting than F.
Definition 3 (Subjective Knowledge Utility). on the set A defined on events E has a subjective knowledge utility representation if there exist functions U : A → R, u : X → R, a monotonely continuous function h : E → R and a probability measure µ : E → R such that U( f ) ≥ U(g) if and only if f g and The first component of the representation is the expected utility of the outcomes and will be called the instrumental component or the instrumental preference of the decision maker. The second component contains the utility from gaining knowledge about what event obtains. After choosing act a, the 1. has a subjective knowledge utility representation with a unique probability measure.
By adjusting Savage's axioms we have therefore obtained a representation that accounts for the value of knowing which event obtains. 4 These preferences encompass a wealth of possible attitudes to information as we will see in section 6. They can express a varying intensity of preference for knowledge about some events but a preference against knowledge for other events. In the following section we therefore analyze the subjective value of knowledge a decision maker attaches to being able to distinguish between events in more detail.

Value of Knowledge
Regarding risk attitudes, we often distinguish preferences that are risk-seeking, risk-neutral, and risk-averse. A similar classification can also be done with knowledge preferences which we will call the knowledge attitude of a decision maker. A decision maker may exhibit knowledge aversion or knowledge seeking behavior. An individual is knowledge seeking about distinguishing event E from event F if for outcomes α ∼ α , β we have that α E α F β α E∪F β. Similarly, the decision maker is knowledge averse about these events if the preference is reversed.
If for all nonnull, disjoint events E, F the decision maker is knowledge seeking (averse), we say simply that the decision maker is knowledge seeking (averse). An expected utility maximizer is both knowledge seeking and knowledge averse and thus knowledge-neutral. Since a set function h defined on a Subadditivity and superadditivity of h are therefore the defining property of knowledge attitudes. We will see below that this property extends also to comparisons of decision makers.
For a decision maker with a subjective knowledge utility representation U, we define the value of knowledge as follows.
Definition 4 (Value of Knowledge). For a decision maker with a subjective knowledge utility representation, the value of distinguishing a finite set of where γ (1) , . . . , γ (n) are indifferent, mutually distinct outcomes.
This definition has the desirable property that it fulfills the identity and thus the value of gaining knowledge about several events can be decomposed into the expected value of knowledge of being able to distinguish between binary events.
In decisions under risk, certainty equivalents are a useful indicator of risk preferences. For knowledge preferences, we define the knowledge-equivalent (KE) as follows: Definition 5 (Knowledge Equivalent). At outcome β, the knowledge equivalent (KE) of whether E or F obtains is the outcome γ that fulfills: for some α and β ∼ β. We denote the knowledge equivalent by KE(E, F, β) = γ.
The following result is straightforward.
Proposition 2. Suppose conditional acts are outcome-solvable and the decision maker's preferences have a subjective knowledge representation. Then the decision maker has a higher value of knowledge for distinguishing E from F than distinguishing G from H if and only if its knowledge equivalent is preferable, i.e., Comparing the knowledge preferences of different decision makers is nontrivial in a subjective model. Decision makers may differ in their utilities of outcomes, u, probabilities, µ, and knowledge preferences, h. Consider as an example the decision between β E β F δ and γ E∪F δ which a decision maker 1 may perceive as the decision between a better outcome γ 1 β ∼ 1 β and knowledge about E in case E ∪ F comes about. A decision maker 2 who disagrees on u and h may see this as a decision between a lottery over outcomes β 2 β or a safe payoff γ. Yet another decision maker 3 may think that E is null and learning about E ∪ F is sufficient to know that F obtains. Comparisons between decision makers therefore need to carefully account for the various aspects decision makers can differ on.
To compare knowledge attitudes, we therefore assume that all decision makers agree on the likelihood of events, * . We say that two decision makers 1 and 2 have identical instrumental preferences if for all acts f and all acts g Definition 6 (Curiosity Relation). For two decision makers with preferences 1 and 2 , yielding identical likelihood relations 1 * = 2 * , and identical instrumental preferences, we say that 1 is more curious about (disjoint events) E and F than 2 if, and for all α, β ∼ β , γ ∈ X. We say that 1 is more curious than 2 if the former is more curious than the latter about all disjoint events E and F.
The following proposition establishes that a DM is more curious than another if she has a higher knowledge equivalent.
Proposition 3. Suppose two decision makers with a subjective knowledge representation agree on * and have the same instrumental preferences. Then the following statements are equivalent: 2. 1 is more curious than 2 .

D(E)
The function D(E) can be seen as the difference between the two decision makers in the (normalized) value of knowing whether event E obtains. The equivalence of the second and third statement is very natural given that subadditivity of h implies knowledge-seeking behavior. The equivalence between the two statements means that "higher subadditivity" means a higher curiosity.
This is similar to analogous results for risk preferences; the subadditivity of h is with respect to knowledge preferences what the concavity of u is with respect to risk preferences.
Equipped with an understanding of how h represents knowledge preferences, we characterize specific functional forms of h in the following section.
6 Characterizations of the Value of Knowledge While the appeal of the subjective knowledge utility representation lies in its generality, for practical applications it is often desirable to have more restrictive decision models. In this section we axiomatically characterize several functional forms that are more tractable in practical applications.

Probabilistic Knowledge Utility
A decision maker might only care about the probability of the known events and ignore all other aspects of these events. This is an appealing condition if the event space is very homogeneous and there are no events that are intrinsically more interesting than others. In practical applications it may also be a useful approximation in case the decision maker has already previously selected what information to gain and the primary tradeoff is how much information is gained.
Definition 7 (Probabilistic Knowledge Utility). on the set A of simple acts on events E has a probabilistic knowledge utility representation if there exist We call this utility representation a probabilistic knowledge utility because the knowledge about events is only valued in terms of the subjective probability attached to the events. Thus, acts are treated as signals about the state space and no special value is attached to knowing particular events. We formalize the notion of a decision maker only caring about the likelihood of events for their value of knowledge as follows.
Axiom 8 (Principle of Indifference of Information). If E ∼ * F, then, This principle of indifference of information states that if E and F are equally likely, then the decision maker is indifferent between being informed about whether E or G obtains or being informed about whether F or G obtains.
Together with our previous axioms, it characterizes a probabilistic knowledge utility. 5 Theorem 2. The following statements are equivalent: 1. has a probabilistic knowledge utility representation with a unique probability measure.

2.
fulfills weak order, the information-neutral sure-thing principle, informationneutral monotonicity, information-neutral likelihood outcome independence, continuity, learning independence, and the principle of indifference of information.

Entropic Knowledge Utility
A probabilistic knowledge utility representation is dynamically consistent in the sense that for every event we can find a probabilistic knowledge utility representation that represents the preferences over conditional acts on this event. However, this utility representation does not need to be the one obtained from applying Bayesian updating to the probabilities and then using the same representation. It is in this sense dynamically consistent but not stationary: suppose a decision maker chooses between acts that yield distinct outcomes on events E, F, G, which partition the state space. If the decision maker learns that event E does not obtain, updates beliefs according to Bayes' rule, and then uses equation (16) to evaluate the conditional acts given F ∪ G, preference reversals may occur. In contrast, the following representation based on the Shannon entropy does not exhibit such preference reversals.
Definition 8 (Entropic Knowledge Utility). on the set A of simple acts on events E has an entropic knowledge utility representation if there exist functions U : A → R, u : X → R, a real number v, and a probability measure µ : The parameter v determines how strong the knowledge preference is and whether the decision maker is knowledge-seeking or knowledge-averse. The entropic knowledge utility is indeed unique in the respect that if U is the entropic knowledge utility representation, then U can also be used to compare conditional acts in A E after updating probabilities. It therefore deserves its own characterization which we prepare with the following definitions.
Definition 9 (Bayesian Updating). A relation * E on the subevents intersecting E is obtained from Bayesian updating of * if it is a quantitative probability From the probabilities obtained by Bayesian updating we can derive comparisons between conditional likelihoods. Let C = E × E be the set of conditional events, with elements denoted by E|F. We can define a likelihood relation on C as follows. We impose that E|F † G|H if and only if µ The following stationarity condition sharpens the principle of indifference of information and guarantees that using the same utility function on conditional acts after Bayesian updating of beliefs does not yield preference reversals.
Let f F be an arbitrary conditional act on the event F and ψ( f F ) an act fulfilling (19) with H = S. Thus, ψ maps conditional acts into acts such that the conditional probabilities of the outcomes are unchanged. If fulfills stationarity I, then the function U • ψ can be used to evaluate conditional acts. This means that the relative value of knowledge remains the same after updating on an event.
The previous definition of stationarity employed the somewhat nonstandard conditional likelihood relation † . We therefore also provide an alternative definition that does not rely on † but is equivalent given our axioms of a subjective knowledge representation.
Definition 11 (Stationarity II). For all events E, F, G, H such that E ∼ F and G ∼ H and for all (not necessarily distinct) outcomes α, β, γ, δ ∈ X − { }, The next theorem shows that the two stationarity definitions are equivalent and that the entropic knowledge utility is the only probabilistic knowledge utility that fulfills these conditions.
6 This definition has a natural analogue using equiprobable partitions of events that does not directly refer to the probability measure µ. Simply define E|F † G|H if for partitions with Under monotone continuity this is identical to the definition in terms of probabilities.
Theorem 3. Suppose has a subjective knowledge utility representation and fulfills outcome solvability. Then the following statements are equivalent.

3.
has an entropic knowledge utility representation.
A good comparison of this result is the result that the only stationary time discounting rule is exponential discounting (Koopmans, 1960). Under exponential discounting of a time-independent utility over commodities, we can use the same utility function on substreams of commodities without preference reversals. Similarly, in an entropic knowledge utility representation, the value of distinguishing an event of (conditional) probability p from an event of probability 1 − p is the same irrespective of how much knowledge the decision maker has gained already. This raises the more general question of how the value of knowledge in probabilistic knowledge utility representations changes as additional knowledge is gained, which we address next.

Constant Elasticity of Curiosity Utility
For decision makers with probabilistic knowledge preferences, the value of knowledge can be expressed as a function of probabilities of events.
Using the functionĤ, we can express how the value of knowledge changes as events become more or less likely. Since the value of distinguishing E and F is only meaningful in case E ∪ F is already known, this at the same time expresses how the value of knowledge changes as more knowledge is gained. We thus define the the elasticity of curiosity as the percentage change of the value of knowledge for a percentage change in likelihood.
Definition 12 (Elasticity of Curiosity). The coefficient of the elasticity of curiosity at probability p for distinguishing events of probabilities q and 1 − q is el(p, q) = ∂Ĥ(pq, p(1 − q)) ∂p p H(pq, p(1 − q)) .
The coefficient expresses how many percent the value of knowledge changes for a one percent increase in probability p of acquiring the information. The relative probabilities of the events to be distinguished, q and 1 − q, are held constant in this comparison. It is straightforward to verify that entropic knowledge utility preferences have a zero elasticity of curiosity.
For many decision makers, the elasticity of curiosity may be nonzero. A decision maker who is knowledge seeking and has a high elasticity of curiosity will generally only choose to give up instrumental value for a coarsely grained partition of the state space. A decision maker with an entropic knowledge utility representation may also be willing to forego instrumental value to achieve a more detailed partition of the state space. A decision maker may even find a book "addictive" in the sense that reading the second half of the book (allowing to distinguish between equally likely events E and F) has a higher knowledge equivalent than reading the first half (allowing to distinguish between the equally likely events E ∪ F and E ∪ F). This is the case if the elasticity of curiosity is negative.
A functional form that is mathematically tractable and can account for both the intensity of knowledge preferences and different elasticities of curiosity is therefore desirable. The following two-parameter family of preferences achieves this: Definition 13 (Constant Elasticity of Curiosity (CEC) Utility). on the set A of simple acts on events E has a constant elasticity of curiosity representation if there exist functions U : A → R, u : X → R, real numbers r and v, and a probability measure µ : E → R such that U( f ) ≥ U(g) if and only if f g and The above representation consists of the expected utility and the Rényi (1961) entropy. v is a parameter that determines whether the decision maker is knowledge seeking or knowledge averse and how strong this preference is compared with the instrumental preferences. r is a parameter that determines the elasticity of curiosity.  The decision maker is sure that she will learn that either E or F obtains since the probabilities q and 1 − q sum to one. In situation B, the events E and F only make up 98% probability but their relative likelihood is the same as in A. Despite learning to distinguish between E and F being 2% less likely, the willingness to pay only decreases by 1%. Comparing situations C and D, we observe the same pattern; a 2% decrease in the likelihood of distinguishing E from F leads to a 1% decrease in the willingness to pay. Note that the relative likelihood of E and F may differ in situations C and D from situations A and B.
We now justify the name CEC preferences for the functional form given above.
Proposition 4 (Characterization of CEC Preferences). Suppose a decision maker's preferences have a probabilistic knowledge utility representation. Then the following statements are equivalent: 1. The elasticity of curiosity el(p, q) is constant in p.

2.
The preferences have an entropic knowledge utility representation or a constant elasticity of curiosity representation.
In some sense, one can see CEC preferences as corresponding to CRRA We make this idea precise using random variables. A variable V : S → S V is a measurable map from the state space S into the state space of the variable, . The state space of the variable, S V may for example specify a parameter value of an econometric model the decision maker is uncertain about. We define µ V E = µ E • V −1 as the probability measure over the variable given event E ∈ S.
A natural way to evaluate the knowledge gained from an act is to simply attach a value to each possible posterior probability measure over the random variable generated by the knowledge about events and calculate its expected value. 7 Definition 14 (Stationary Information Utility). has a stationary information representation over variable V if for all events E ∈ E, the conditional relation E can be represented by In words, the decision maker perceives a tradeoff between expected utility and the expected valuation of the information gained about the variable V.
Analogous to the Bernoulli utility function u for the evaluation of outcomes, the function v evaluates the value of the information µ V gained by learning that outcome α obtains and compares it with the value of the information µ V E the decision maker had initially. 8 What is special about stationary information utility 7 To clarify that the evaluation of the posterior probability measures over the variable V do not depend on the prior, we state this and the following representation as representations for all conditional acts. Since for a subjective knowledge representation the preferences over acts determine the preferences over conditional acts, this is without loss of generality. 8 The representation is unique up to joint linear transformations of u and v and separate additive transformations of u and v. The functional form is chosen such that U E (γ E ) = u(γ).
is that the evaluation of the posterior does not depend on the prior. An example of a stationary information representation is the mutual information in which each conditional measure is valued according to Similarly to the entropy, in mutual information the value of knowing about an event of the variable only depends on the likelihood of the event of the variable. However, in some cases this is not a plausible condition. For example, we may be more interested in the first than the 100th digit of a uniform random variable. If the state space of the variable is more structured, other interesting functional forms of v are possible such as the posterior variance of a real-valued parameter. If S V = R, i.e., if the parameter is real-valued, then using (the negative of) the variance, v(µ

is a plausible way of measuring how informed the decision maker is about V.
A stationary information representation is only sensible under the assumption that it is costless for the decision maker to process the information. If a decision maker is for example interested in the mean of a random variable, the decision maker may rather choose to take an action that estimates the mean from a data set instead of choosing an act that informs the decision maker about all data in the data set. We can account for this by introducing a cost of "too much information".
Definition 15 (Stationary Information Cost Utility). has a stationary information cost representation over variable V if for all events E ∈ E, the conditional relation E can be represented by The cost of too much information is expressed via the Shannon entropy of the information partition. If r is negative, then the decision maker perceives a tradeoff between gaining as much information as possible about the variable V and having an information partition that is as simple as possible.
In the remainder of this section, we characterize the stationary information cost utility. The stationary information utility then follows trivially from the additional condition that the decision maker only cares about knowledge of events in V as stated by the following condition.
Omitting the term −v(µ V E ) from the representation yields identical conditional preferences but does not fulfill this condition.
Definition 16 (Indifference to Irrelevant Information). Suppose E and F fulfill µ V E = µ V F and are nonnull. Then, Thus, if two events yield the same conditional probability measure (and thus the same information) about the variable, then being able to distinguish between the two events is irrelevant to the decision maker.
We adjust the stationarity conditions to account for the fact that not only the likelihoods of events are important but also how well these correlate with events in E V .

Definition 17 (Stationarity I*). Suppose for all
The corresponding change to the second stationarity axiom is the following condition.
Definition 18 (Stationarity II*). Suppose for all events E, F, G, H if E ∼ F and G ∼ H as well as µ E = µ G and µ F = µ H . Then for all (not necessarily distinct) We now state a similar result to the previous theorem that yielded an entropic knowledge representation but allow for a decision maker's special interest in V. We can understand this stationarity condition using the following simple example. Let's say a decision maker is willing to pay a certain amount of money for information about a parameter. Suppose now somebody offers to flip a coin and sell the information only in case the coin comes up "heads" but neither payment nor information is exchanged on "tails". In case the decision maker's willingness to pay changes, this would violate stationarity.
The additive separability of the evaluation of prior and posterior distribution turns out to be intrically linked with the stationarity of the value of knowledge.
Theorem 4. Suppose has a subjective knowledge utility representation and fulfills outcome solvability. Then the following statements are equivalent.

3.
has a stationary information cost utility representation.

In stationary information representations and stationary information cost
representations the actions become more or less valuable via the knowledge gained about the variable V. This gives the actions taken by the decision maker the character of estimators. If V is an unknown parameter of a distribution or an economic model, then the data E may be used to inform the decision maker about this parameter. This leads us very naturally to the problem of statistical estimation in general. In the following section, we therefore briefly sketch how certain practices of statistics reveal certain aspects of knowledge preferences.
7 Partition Problems Savage (1954) considered statistics as the practice of solving partition problems.
A partition problem is the problem of determining whether an event E or an event F obtains. Our representation theorem for subjective knowledge utility provides a subjective value of solving partition problems. This provides a novel perspective on partition problems in general because statistical estimators themselves can be considered as acts in our model. This in turn allows us to impose our axioms on the preferences over estimators and analyze these preferences using the tools developed in the previous sections. To see this, we define in the language of our model estimation problems, data collection problems, and hypothesis testing problems in the following paragraphs.
Let E V be a σ-algebra on a parameter space S V . From the perspective of the decision maker, the parameter is a variable V of the more general state space S of states of the world. For simplicity, we assume that S = ∏ n i=1 S i consists of n realizations of a distribution based on the parameter space. 9 We endow S with a product σ-algebra E. The set of outcomes X = R + × R consists of elements (c,r) containing a monetary cost c ∈ R + and estimation resultsr ∈ R, where R is the set of possible estimation results. 10 The set of estimators is simply the 9 In fact, we could assume an arbitrary, rich event space and let the variable and thus the parameter space the decision maker has in mind be fully subjective and only be revealed via the knowledge preferences.
10 Given our way of modeling knowledge with state-independent evaluation of outcomes, the estimation results are simply messages from a signal and are only meaningful in conjunction set of acts in this model. An estimation problem is a choice from the set of estimators given a subjective knowledge utility.
The concept of an estimation problem is rather abstract. While solutions to specific estimation problems are beyond the scope of our paper, we show how certain assumptions employed in statistics can be readily translated into our framework.
Curiosity about parameter: The decision maker attaches a positive value to information about the variable V but a nonpositive value to information about data that does not reveal additional information about V. A stationary information cost utility is an example that exhibits such preferences.
. Hypothesis testing: A hypothesis testing problem is an estimation problem in which the decision maker is interested in distinguishing whether a hypothesis about the parameters is true or not. Thus, we have a null hypothesis E V ∈ E V and its alternative E V for which the decision maker attaches a positive value of knowledge, H(V −1 (E V ), V −1 (E V )) > 0, and the value of any further partitioning of E V is nonpositive. In classical hypothesis testing, the value of h(V −1 (E V )) for some E V ∈ E V only depends on the likelihoods of observations given the parameters, l(E|E V ) but not on prior probabilities.
Properties of maximum likelihood estimation: Contrary to Bayesian estimators, maximum likelihood estimators do not depend on the prior probabilities µ V . In our framework, this means that in most cases, a maximum likelihood estimator can only be a solution to an estimation problem if h has a negative elasticity of curiosity. The less likely an event in E V is, the greater must be the value of with the chosen act.
knowledge attached to that event. Moreover, the elasticitiy must be -1 because the value of distinguishing two equally, highly likely events must be the same as distinguishing two equally, but less likely events. This in turn means that the value of knowing the exact parameter is infinite if E V is atomless. Maximum likelihood estimation therefore may therefore violate our continuity assumption in case the decision maker attaches zero prior probability to some parameter values. However, it may be approximated by an elasticity of curiosity that approaches -1.  It is important to realize that the standard way of modeling decisions under uncertainty that we employed comes with some underlying maintained assumptions. The most crucial implicit assumption is that any outcome α can be realized on any event. The following property therefore directly follows from the way decisions under uncertainty are modeled: P 0 (Outcome Permutability). If σ : f (S) → f (S) is a permutation of the outcomes resulting from an act f ∈ A, then permuting the outcomes of f yields also an act σ • f ∈ A.
Thus, starting from any act we can permute all outcomes and obtain another act. This assumption is implicit in almost all of axiomatic decision theory in one form or another. However, this imposes severe restrictions on the way the knowledge gained from an act (and thus its value) can be modeled. For example, defining an outcome as "knowing event E obtains" is inconsistent with outcome permutability -under outcome permutability, this outcome could also result from event E implying that the decision maker knows (and therefore has the correct belief) that E obtains when in fact it does not obtain! Including beliefs in the description of outcomes faces similar issues. In this case, the value of "believing event E obtains" should be state-dependent, depending on whether E actually obtains or not. 11 It seems therefore that if one accepts outcome permutability and wants to avoid state-dependent utility, then knowledge and beliefs about events cannot be (part of) outcomes. 12 It is for this reason that we assumed that knowledge about which event obtains is derived from the chosen act and the observed outcomes.
Another possible way of modeling knowledge would be to explicitly introduce an information partition in the description of an act. However, consider an act that yields outcome α on event E and β on event E but has a trivial 11 Savage responded to the critique that the evaluation of outcomes should be state dependent with the famous bonmot "I should not mind being hung so long as it be done without damage to my health and reputation". If the reason for state dependence is the inclusion of beliefts into outcomes, the bonmot would go "I should not mind being hung as long as I believe that I am not being hung".
12 For difficulties with including beliefs into outcomes to explain anticipatory feelings, see the extensive discussion in Eliaz and Spiegler (2006). information partition. In this act, the decision maker could refine the information partition to {E, E} by realizing that according to her chosen act outcome α occurs if and only if event E obtains. Thus, either the outcome must be unobservable after it is realized or the decision maker must "forget" what act she has chosen. Our way of modeling knowledge avoids such issues -indeed, the knowledge gained by the decision maker is exactly what can be concluded from observing an outcome and remembering which act has been chosen.
If the outcomes in Savage's axiomatization cannot contain descriptions of knowledge or beliefs without running into conceptual difficulties, then expected utility as axiomatized by Savage does not fulfill our normative position stated in section 4. This is because Savage's axiomatization of expected utility cannot properly account for an intrinsic preference for knowledge. Moreover, in case of an instrumental preference for knowledge (to improve future decisions), Savage's axiomatization makes excessive requirements on the decision maker to integrate the future decision problems into one grand decision problem. Our model allows decision makers to integrate a preference for knowledge into their preferences without specifying the exact way in which this knowledge will be used.
Our results extend previous critiques of the necessity of Savage's axioms for rational decisions (Gilboa, Postlewaite, & Schmeidler, 2009. However, the motivation for Gilboa et al. (2012) to reject the sure-thing principle refers to the inability of the decision maker to fix a precise prior. A tough-minded expected utility maximizer may state that this inability is simply a mistake. Our critique instead refers to the tastes of the decision maker for knowledge. De gustibus non est disputandum -clearly a theory of rational choice should allow a decision maker to give up a good night's sleep in return for finding out who the murderer on the orient express is. It is up to further research how expected utility can be salvaged as a normatively compelling criterion when decision makers have a preference for knowledge.

Concluding Remarks
While the instrumental value of information has been covered to great extent in the literature, it is perhaps surprising that the standard model of rational choice does not account for the value of knowledge simpliciter. This is especially striking as an intrinsic preference for knowledge is present in many contexts of everyday life, be it media, news, and entertainment consumption, scientific exploration, education, and social relationships. With this paper, we hope to provide a starting point for a systematic analysis of rational decisions in the presence of a subjective value of knowledge.

A Monotone Sequences of Acts and Events
In this section of the appendix, we make our definitions regarding monotone continuity precise.
Definition 19 (Monotonely Continuous Function). A function h : E → R is monotonely continuous if for any convergent monotone sequence of events A monotone sequence of acts therefore makes a particular outcome either increasingly likely or increasingly unlikely by adding or removing states in which the event obtains.
A sequence of acts converges monotonically to an act f if it is a monotone sequence and the set-theoretic limit of the events on which each outcome is obtained is the event on which f obtains that outcome.

B Proof of Theorem 1
The proof proceeds as follows. First, we show that any set of conditional acts forms a linear continuum. This is highly nontrivial because the value of knowledge prevents us from using standard results. Since in a linear continuum the order topology is connected, this allows us to use additive representation theorems to obtain additive separability of utility across conditional acts. Next, for arbitrary information partitions, we derive an additive utility representation for conditional acts that in turn depends on the additive utility representations of its conditional acts. This allows us to use the uniqueness of additive representations to further refine the representation into the desired subjective knowledge utility.
Proof. We prove this by outcome solvability. If f E E g E , then for some outcomes γ, β, γ E E β E and thus E is not null.
Definition 22 (Supremum). f is a supremum of S under relation , denoted f ∈ sup (S ) if for all g ∈ S , f g and for all h such that h g ∀ g ∈ S , h f .

Definition 23 (Linear Continuum). A set S ordered by is a linear continuum
if for all f , h ∈ S and all S ⊆ S: Definition 24 (Interval of Acts). An interval I[ f , h] of acts between f and h is defined as the set of all acts that are part of some monotone sequence from f to g, i.e., Lemma 2. If f h, and I[ f , h] = ∅, then there exists g ∈ I[ f , h] such that f = g = h.
Proof. By Lemma 1 and f h, it cannot be the case that f and h are identical on all nonnull events and therefore there must be an outcome α ∈ X such that f −1 (α) − h −1 (α) is nonnull. Since E contains no atoms, we can find a non-null proper subset, denoted A, of f −1 (α) − h −1 (α). Then α A h is the desired act on the interval. Proof. This follows from the assumption that all acts are simple. Since the acts are simple, there are finitely many events on which the two acts differ. If two acts differ only on a single event, it is straightforward to show that a nonempty interval exists. We can therefore construct the finite sequence of intervals by changing a single event on each interval.
Lemma 4 (Villegas (1964), Theorem 5). In a qualitative probability σ-algebra the following propositions are equivalent: 1. There are no atoms; 2. Every event can be partitioned into two equally probable events.
We call this element g the half distance element for I[ f , h].
Lemma 7. If f h, and I[ f , h] = ∅, then there exists g ∈ I[ f , h] such that f g h.
Proof. Letĝ 0 ≡ f andǧ 0 ≡ h. Define I 1/2 [ f , g] as the set of half-distance elements between f and g. Define g i as an arbitrary element of I 1/2 [ĝ i−1 ,ǧ i−1 ]. If f g i h, then we have found the desired element g. If g i f , then defineĝ i ≡ g i andǧ i ≡ǧ i−1 . If h g i , then defineĝ i ≡ĝ i−1 andǧ i ≡ g i . We obtain the monotone sequencesĝ i andǧ i . If the sequences do not terminate at some element g, then they converge to actsĝ * andǧ * , respectively, which may only differ on null events. By Lemma 1 we have thatĝ * ∼ǧ * . It follows from monotone continuity thatĝ * f and h ǧ * , a contradiction. It follows that the sequence terminates at some element g such that f g h.
Lemma 8 (Least Upper Bound Property). For every nonempty subset I of an interval I[ f , h] that is bounded by g with respect to , there exists a supremum g * .
Proof. By Lemma 5, there exists an interval I[ĝ 0 ,ǧ 0 ] such that for all x ∈ I, g 0 x and for some y ∈ I, y ǧ 0 . Define I 1/2 [ĝ 0 ,ǧ 0 ] as the set of half-distance elements in this interval. Define g i as an arbitrary element of I 1/2 [ĝ i−1 ,ǧ i−1 ]. If for all x ∈ I, g i x, then defineĝ i ≡ g i andǧ i ≡ǧ i−1 . If there exists y ∈ I such that y g i , then defineĝ i ≡ĝ i−1 andǧ i ≡ g i . We obtain the monotone sequencesĝ i andǧ i . The sequences converge to actsĝ * andǧ * , respectively, which may only differ on null events and are thus indifferent by Lemma 1.
Clearly, g * ≡ĝ * is a supremum of I, since if g * x ∈ I[ f , h], then there exists some y ∈ I andǧ i s.t. y ǧ i x. Lemma 10 (Connectedness). The order topology on the set of conditional acts on E is connected.

Proof. Suppose we can partition the set of conditional acts into two open sets,
A, B. Take an arbitrary act from each set. There exists a finite sequence of nonempty intervals between the sets and each interval is a linear continuum.
The union of these intervals forms a linear continuum U as well. But then by continuity A ∩ U and B ∩ U partition the linear continuum, a contradiction.
Using Connectedness, we can now obtain an additive representation U( f E g) = u E ( f E ) + u E (g E ). From here, we use the remaining axioms to obtain a representation of the form U( f ) = ∑ α∈O µ( f −1 (α))U(α) + h( f −1 (α)) We call any partition P E of E that has at least three elements and is a subset of E an admissible partition of E.
Lemma 11 (Additive Representation). For every event E and admissible partition P E , there exists a representation U P E : Proof. Choose any admissible partition P E . Note that for every element E of P E the set of acts { f E : E → R} is connected under the E -order topology. We now assume that α , β , etc. can only occur on E ∈ P E , α , β , etc. can only occur on E = E and so on. Since outcomes α ∼ α can be substituted for another by our monotonicity assumption and since we have countably many indifferent outcomes for each outcome, this is without loss of generality.
By the information-neutral sure-thing principle, we have jointly independent preferences over the product space ∏ E ∈P E A E . Since preferences over these are continuous in the order topology, it follows by Wakker (1988b) that an additive representation of the form exists.
Lemma 12 (Outcome Additive Representation). For every event E, there exists a representation U E : A E → R of E of the form: Proof. Let P E be a refinement of P E . Then U P E and U P E must be affine transformations of another by the uniqueness of additive representations over ∏ P E A E .
We may assume without loss of generality (by simply applying this affine transformation to one of the representations), that they are identity transformations of another on the domain ∏ P E A E . Thus, for all f , such that the decision maker is informed about P E , we have that Choosing U P E (γ E ) and U P E (γ E ) therefore uniquely fixes the scale and utility representation of all refinements of P E . Moreover, since all other partitions P E share a common refinement with P E , this indeed fixes the scale of all admissible partitions of E.
We now argue that for arbitrary partitions P E and P E , if there are at least two common partition elements E , E via the representation U {E ,E ,E−(E ∪E )} ( f E ). By continuity, for sufficiently small E , E , we can refine P E without changing the set of indifference curves much. Call this refinement P E . We can therefore ensure that any two partitions are consistent with another since we can choose E and E such that the partitions P E , P E , P E , P E are all consistent with another on an arbitrarily large utility interval.
We can therefore define the following utility representation on all acts with at least three distinct outcomes: To extend this representation to all acts, we consider disjoint monotone sequences E k → ∅ and F k → ∅. By continuity, the utility of the outcome α must be the limit of U S (γ E k δ F k α). The utility of α F β must be the limit of Lemma 13 (Monotone Additive Representation). For an arbitrary partition P E , there exists a function v P E such that: Proof. Note that under the normalizations employed in the previous lemma, . The desired result then follows from the existence of an additive representation U P E and that each u P E must be an increasing function of U E .

Lemma 14 (Affinity). For all E
Proof. Note that since we have normalized u P E and u P E such that they are identical on identical sub-acts, we know that v P E does not depend on the choice of partition but only on E. By the uniqueness of additive representations, we then have that v P E must be affine in its first argument.
Proof. Consider the utility of the following act.
It follows that is a utility representation. We now want to show that without loss of generality U E (α E ) = U(α) and that A E|S is a probability. Note that we can rescale for all events E, each representation U E such that U E (γ E ) = 1 > 0 = U E (β E ) for two arbitrary outcomes γ β. We now have for some suitably chosen acts f , g disjoint from β, γ: Thus, A E∪F|S = A E|S + A F|S . It is straightforward to show that A ∅|S = 0 and without loss of generality A S|S = 1. It follows that A E|S = µ(E|S) is the unique probability representation of * .
We now show that under our normalization, U E (α E ) = U F (α F ). For some acts f , g, we have: Thus, Since the LHS and RHS must be of the same sign and all events E ∪ F can be partitioned into two equally likely events E and F, it follows for all events we We thus obtain the representation: Defining h(E) = B E|S yields the desired representation. Lastly, we show that h has special uniqueness properties.
Lemma 16. The function h in the subjective knowledge utility representation is unique up to an additions of a finite measure, i.e., if h = h + m where m is a finite measure on the event algebra, then h represents the same knowledge preferences as h.
2 ⇒ 1 follows straightforward from transitivity and identical preferences over outcomes.
1 ⇒ 3 follows from: where the equivalence between the first and second line follows from the definition of the knowledge equivalent, and the remaining steps follow from the assumption that u 1 = u 2 .

D Proof of Proposition 4
Proof. It is straightforward to show that an entropic knowledge utility representation has an elasticity of curiosity of zero and that a constant elasticity of curiosity representation has a constant elasticity of curiosity. We prove the reverse implication. If el(p, q) = c, then and after reordering terms Defining k(p) = p · h (p) − (1 + c)h(p), we obtain: Substituing pq = x and p(1 − q) = y, we have Cauchy's functional equation: with the solution k( which is a linear differential equation. The integration factor of this differential equation is x −(1+c) . The solution is then: Since we may remove terms that are constant or linear in probability without changing the functional form of the representation, we obtain the desired representations.
E Proof of Theorem 2

E.1 Sufficiency Proof
Proof. Since we do not assume outcome solvability, we cannot use Theorem 1 to prove this result. However, the principle of indifference of information allows us to proceed in a more standard manner than in the proof of Theorem 1. We divide the proof into the following steps. First, we show that under our axioms induces a qualitative probability * . Since our event space has no atoms, we obtain a quantitative probability. Next we show that the probability distribution over outcomes determines the preference. Using our separability and monotonicity conditions we obtain a utility representation that is additively separable in outcomes. Lastly, we use learning independence to separate the utility function into an expected utility and the value of information.
Definition 25 (Qualitative Probability). * is a qualitative probability if it is a complete and transitive relation on E and for any events E, F, G ∈ E such that (E ∪ F) ∩ G = ∅, Conjecture 1 (Existence of a Qualitative Probability). * is a qualitative probability.
We first prove several lemmas about the properties of * .
Proof. Completeness is guaranteed since for any two events E, F ∈ E, we can Completeness of together with (7) then guarantee that either E * F, or F * , or both.
Definition 26 (Event Solvability). fulfills Event Solvability if whenever E * F and E and F are disjoint, then there exists an event E ⊆ E such that E ∼ * F.
In other words, if γ E β F E∪F β E γ F for γ β, then there exists a subevent E of the event E such that γ E β F α ∼ β E γ F α.
Lemma 18. If there are no atoms and continuity holds, then Event Solvability holds.
Proof. The proof is straightforward. We iteratively split the larger event and add or remove subevents to obtain a converging sequence of events. This sequence can be split into sequences of events that are too large and those that are too small. Continuity then guarantees that the associated acts characterizing the likelihood relation converge.
With no-atom E, the proof is trivial if either E ∼ * F or F ∼ * ∅. Hence, assume E * F * ∅.
Because E has no atoms, for every event A * ∅, ∃A ⊂ A such that ∅ ≺ * A ≺ * A. Furthermore, we can find a partition P of E {E 1 , E 2 , ..., E n } such that for all j, F − E j * E j and that for all j, ∅ ≺ * E j ≺ * F because F ∅. The former works because of Theorem 4 of Villegas (1964). Suppose the latter does not work, then there is an atom E in E such that E * F * ∅, a contradiction.
By construction, we can find i < n such that L 1 ≡ 1≤j≤i E j * F and that Similarly, we can again find a non-null partition P (2) of E i+1 , {E (2) 1 , E (2) 2 , ..., E (2) n 2 }, such that for every j (1) ∅ ≺ * E We repeat by finding such a partition P k+1 for E (k) i (k) +1 and obtain two se- The former implies E ∈ E (since E is a σ-algebra) whereas the latter implies E ∼ * F by continuity.
which is guaranteed by information-neutral monotonicity and γ β.
The result follows since the definition of E * F is the same as the definition Lemma 22 (Disjoint Measurability). If G ∼ H and E, F, G, and H are mutually disjoint, then E * F ⇔ E ∪ G * F ∪ H. Proof.
While the first, second, forth and fifth equivalence comes from informationneutral sure-thing principle, the third holds thanks to the principle of indifference of information.
We prove this in several steps.
Lemma 24 (Disjoint Transitivity). If E, F, G are mutually disjoint events, then E * F and F * G implies E * G.
1. Suppose F ⊆ E and F * G, then E * G.
2. Suppose G ⊆ F and E * F, then E * G. Proof.
2. By Event Solvability and E * F, we can solve for F on E, getting us E ⊇ E ∼ * F. Suppose for contradiction that G * E; that is, γ G β E α γ E β G α. This is identical to the following expression: Since E ∼ * F and they are disjoint, we can switch γ and β on E ∪ F. This gives us ∅ = (G − F) * E − E * ∅, a contradiction.
Proof. If E ∩ F G, then we have the result by Subset Transitivity 1. Otherwise Lemma 27 (Limited Transitivity 2). If E ∩ G = ∅, then E * F and F * G imply E * G.
Proof. Solve G − F into F − G, naming the set F . Also, E * F is equivalent to E − F * F − E. By Disjoint Measurability, we can add F to the LHS and G − F to the RHS, yielding (E − F) ∪ F G. By Subset Transitivity, E G.
. Thus, proving Limited Transitivity 4 is equivalent to proving Transitivity.
Lemma 31. * has a quantitative probability representation.
Proof. This follows from Villegas (1964) since we have a qualitative probability without atoms.
From this result, the lemma is almost obvious as we can change parts of events on which outcomes occur as long as the probabilities are maintained. We iteratively apply this result to construct the indifference in (96). For this, choose H i = F 1 ∩ E i and G i as an equiprobable event contained in E 1 − i−1 l=2 H l which exists by Event Solvability. By (98) we have that: Note, that our choice of E 1 and F 1 was arbitrary, we can therefore repeat the above argument for all events. Since F i ∩ F j = ∅ for all i = j, all previously changed events will not be changed by applying (99) to another event. Applying (99) to all events yields the desired indifference (96).
Lemma 33 (Epimorphism to Mixture Space). Let ∆X be the space of finite support probability measures over outcomes. Let µ be the quantitative probability representation of * . There exists a function φ : A → ∆X and a unique relation + such that for all α ∈ X and all a, b ∈ A: µ(a −1 (α)) = (φ(a))(α) Proof. By Event Swapping, any two acts are indifferent that induce the same probability measure on the outcomes. If φ maps every act to this probability measure, then (100) trivially holds.
Proof. Completeness and transitivity follow directly from completeness and transitivity of .
We employ the monotone subsequence theorem adapted to simple probability measures.
Proof. Let α be given. We first prove that every sequence {µ k } k has a monotonic subsequence. We call the m-th term of the sequence µ m a peak if for all n ≥ m we have that µ m (α) ≥ µ n (α).
Since every subsequence of a convergent sequence converges to the same limit, we have the desired result.
Lemma 36 (Mixture Space Continuity). + is continuous (closed weakly upper and lower sets) Proof. Consider sequences µ k → µ and ν k → ν of simple probability measures on X. By Lemma 35, for any outcome α ∈ supp(µ) the sequence {µ k } k has a convergent subsequence {μ k } → µ such thatμ k (α) ≥μ l (α) for all l ≥ k or µ k (α) ≤μ l (α) for all l ≥ k. We can therefore convert µ k into a sequence that monotonically changes the probabilities of the outcomes in the support of µ.
Next, we convert the sequence of measures into a sequence of acts. Let f be an arbitrary element of φ −1 (µ). We then choose a sequence of f k ∈ φ −1 (µ k ) such that it converges monotonically to µ. Proceeding the same way for ν k → ν, we obtain a monotonically convergent sequence g k → g. We then have that if for all k, µ k + ν k , then also f k g k . It follows by continuity that f g and therefore µ + ν.
Proof. By the information-neutral sure-thing principle, for a state space partition {E, F, G} we have that preferences are separable on conditional acts f E , f F , and f G for disjoint outcome domains. Since these conditional acts straightforwardly map into restrictions of µ to the subsets of outcomes µ |supp( f E ) , µ |supp( f F ) , and µ |supp( f G ) by a standard argument, it follows that there exists an additive representation of the form Choosing any other partition of outcomes on f E and f F yields another representation of the above form that must be a monotone transformation of U since only the probabilities of outcomes matter, not the event on which they arise. It is straightforward to show that by the uniqueness of additive representations, the two representations must be affine transformations of another. It follows that u (and v and w in a similar manner) is additively separable in the outcomes.
We obtain a representation: By information-neutral monotonicity, V must be increasing in U(α). The desired representation follows.