Ambiguity aversion, modern Bayesianism and small worlds

The central question of this paper is whether a rational agent under uncertainty can exhibit ambiguity aversion (AA). The answer to this question depends on the way the agent forms her probabilistic beliefs: classical Bayesianism (CB) vs modern Bayesianism (MB). We revisit Schmeidler's coin-based example and show that a rational MB agent operating in the context of a "small world", cannot exhibit AA. Hence we argue that the motivation of AA based on Schmeidler's coin-based and Ellsberg's classic urn-based examples, is poor, since they correspond to cases of "small worlds". We also argue that MB, not only avoids AA, but also proves to be normatively superior to CB because an MB agent (i) avoids logical inconsistencies akin to the relation between her subjective probability and objective chance, (ii) resolves the problem of "old evidence" and (iii) allows psychological detachment from actual evidence, hence avoiding the problem of "cognitive dissonance". As far as AA is concerned, we claim that it may be thought of as a (potential) property of large worlds, because in such worlds MB is likely to be infeasible.


Introduction
Since 1961, when Ellsberg introduced the concept of "ambiguity aversion" (AA) in the form of his well-known thought experiments (see Ellsberg, 1961), economists, psychologists and decision theorists have been trying to provide answers to the following two questions: (i) is AA empirically documented by means of formal experimentation and (ii) does AA violate the standard Bayesian conditions of rationality, or is it consistent with rationality when properly modified?
With respect to the first question, Machina & Siniscalchi (2014) offer an extensive survey of the empirical literature (including studies from the insurance and medical literature) that spans over a period of 50 years.The results of these experiments lend support to AA, although some results that point towards "ambiguity neutral" or even "ambiguity seeking" behavior do exist (see, for example, Viscusi & Chesson, 1999).
As far as the second question is concerned, the main argument in favor of AA being consistent with rational probabilistic beliefs is the following: AA implies that probabilistic beliefs are not sophisticated, in the sense that they cannot be represented by a unique prior probability measure.Is a probabilistically non-sophisticated agent necessarily irrational?The advocates of the argument that AA is consistent with rational probabilistic beliefs say no.Instead, they argue that the agent might possess so little (if any) information about the events she is interested in, that she is justifiably (and thus rationally) unable to form a unique prior distribution.As Gilboa & Schmeidler (1989) put it "...the subject has too little information to form a prior.Hence (s)he considers a set of priors as possible."(1989, pp.142) 1 .Schmeidler (1989) is one of the earliest and best-known attempts to provide an axiomatization of AA.Schmeidler argues that at the heart of AA lies the following asymmetry: Consider an event space F which contains the "events of interest" for agent X. Assume that X F ⊂ F contains those events whose probabilities are known to X, ) contains the rest of events for which X has no information about.AA means that X favors betting on events in F X than X ′ F .However, under the maintained assumption that X's subjective probability of any event D ∈ F is measured by X's willingness to bet on D, AA implies that X's degree of belief defined on F violate the additivity axiom of probability theory, thus X is not probabilistically sophisticated.
What is the main reason for the violation of the additivity property in X's degrees of belief, denoted by P X ?The answer is that X has specific information, I S , which is relevant for some specific events in F and irrelevant for all the rest.Is I S allowed to be used in the determination of the prior beliefs, or should these beliefs be based exclusively on the so-called "background" information, I B ?How is I B defined?A universally accepted sharp definition of I B is hard to find.I B is usually meant to include the "corpus of background knowledge" (Easwaran, 2008) that describes the general characteristics of the chance set up in hand (e.g.there are n coins involved, each one is two-sided, successive tosses are independent, etc).More importantly, I B should include the full set of hypotheses H that describe the probabilistic properties of the phenomenon of interest, the full set of evidential sentences E that are relevant for H together with the entailment relationships between elements of H and elements of E. On the other hand, I B should not include any actual outcomes of the experiment or information about the objective chances (e.g.coin is fair).This information is specific information (or relevant evidence) and should be thought of as part of I S .
The question of whether to allow both I B and I S or only I B to affect the generation of the agent's prior belief function is crucial for deciding whether AA is (or can be made) consistent with rationality.Hence the crucial question is: should the prior belief, P prior , depend only on I B (Option I) or on both I B and I S (Option II)?There are two approaches in the literature: one that claims that the distinction between I B and I S is not only useful but compelling, and the other which sees no point in distinguishing between the two.In the context of Option I, the prior probability (credence) is shaped by all the available information that the agent possesses at t, including both I B and I S In such a case, the evidential significance of I S is "built into" P prior .Notationally, we may emphasize this, by replacing P prior with , .

I I B S t P
Following Meacham (2007), we refer to this view as "classical Bayesianism" (CB).On the other hand, Option II, hereafter referred to as "modern Bayesianism" (MB), amounts to the prior being determined without any influence from the specific information I S .In the case where the agent doesn't possess any specific information at the time she wants to assign a probability to an event of interest, this condition is satisfied trivially.On the other hand, if she possesses specific information, this condition is satisfied if X employs a mode of counterfactual probabilistic reasoning.This type of reasoning can be described as a two-step procedure: i) The agent, at some point in time t, wants to assign a probability to an event of interest.She has at her disposal both background I B and specific information I S .ii) She travels back in time, at the start of her epistemic life, say t 0 , and forms her prior belief in light only of background information.Thus, specific information is not allowed to "contaminate" the formation of her prior belief.Put differently, the prior probability function should be determined, once and for all, at the unique point in time t 0 , at which only I B is available.At t 0 , I S is contingent (hypothetical) rather than actual.Notationally, P prior may be replaced by 0 .

P
The central question of this paper is whether an agent with rational beliefs (i.e. an agent who satisfies Savage (1954) axioms) can exhibit AA.The answer to this question depends on the way the agent forms her probabilistic beliefs.On one hand, as already discussed, AA can be consistent with rationality for a CB agent.On the other hand, the main point of this paper is that an MB agent, operating in the context of "small worlds" (more on this below), cannot exhibit AA.This brings about the question of whether MB is a normatively appealing hypothesis.To this end, we put forward three arguments in favor of MB, analyzed in detail in the section 3.In summary, we argue that an MB agent (as opposed to a CB one) (i) protects herself from falling into logical contradictions akin to the relation between her subjective probability and objective chance, (ii) fares better in cases of "old evidence" and (iii) allows psychological detachment from actual evidence.
What about MB's descriptive status?Is MB reasonably realistic as a model of actual behavior?Or is it the case that MB puts exceedingly high standards on the agent's probabilistic reasoning abilities?To answer this question, we must first define the context within which the agent operates.To this end, we distinguish between "small" and "large" worlds, a distinction already made by Savage (1954).For the purposes of the present paper, a "world" is defined to be the domain (language) ℒ of the agent's probabilistic beliefs.This domain includes all the hypotheses of interest H i , all the relevant evidential statements E j , together with the corresponding (logical) entailment relations between H i and E j .A world is small if (a) the number of hypotheses in ℒ is small and (b) ℒ does not evolve over time.These two assumptions imply that the agent is able to conceive in advance all possible hypotheses concerning the matter of interest.In this paper we argue that MB is a reasonable hypothesis in small worlds.
On the contrary, MB is not likely to work in cases in which a) ℒ is exceedingly rich and b) ℒ changes over time due to the formation of new concepts (hypotheses) by the agent.With respect to the second case, imagine a situation in which the agent's current domain, ℒ t , is richer than her domain ℒ 0 at the beginning of her investigations, t 0 (that is before any specific information I S is available).This may happen if between the time periods t 0 and t, a new hypothesis H new was conceived by the agent.In such a case, H new is part of ℒ t but not of ℒ 0 .MB requires the agent to "go back in time" in order to find herself in the epistemic state that corresponds to t 0 .Unfortunately, at this state, the agent is unable to find any probabilistic assignments to H new simply because H new was not included in ℒ 0 .In such a case, MB requires the agent to contemplate her probabilistic assignments not in the "old" domain ℒ 0 (which did not include H new ) but in the "new", expanded domain ℒ t .However, this shift between domains means that the agent is forced to change all her probabilistic assignments made in the context of ℒ 0 .This is the point at which MB loses its normative and descriptive edge.As Bammer & Smithson (2012, pp 95) remark "...changing the language (domain) in which items of evidence and hypotheses are expressed will typically change the confirmation relations between them."Do MB's losses translate to CB's gains?The answer is negative.Both MB and CB are equally vulnerable to the presence of large and evolving worlds because under those worlds, both MB and CB require frequent changes in prior probabilities

I I B S t P
Such changes invalidate many of the most interesting Bayesian theorems, including the celebrated "washing out of priors2 " result.
It is worth emphasizing that standard Bayesianism, in either MB or CB form, does not allow the formation of new hypotheses after the prior probability function has been formed.Instead, it assumes that the agent is able to think in advance of all the possibilities in the hypothesis space; in short, the agent is assumed to be "logically omniscient".Logical omniscience is considered by many critics as the Achilles heel of Bayesianism: it implies an all-knowing agent who is able to track down all the logical implications in ℒ. Obviously, the extent to which such an assumption is unrealistic depends on the degree of richness of the underlying domain ℒ.For simple domains, it is reasonable to assume that the agent can comprehend the few logical entailments that these domains contain, which renders standard Bayesianism realistic in small worlds.
The main points of the paper are the following: (i) An MB agent operating in small worlds does not exhibit AA, (ii) AA may be thought of as a (potential) property of large worlds, because in such worlds MB is likely to be infeasible, and (iii) The motivation of AA in the form of examples such as Ellsberg's classic urn-based or Schmeidler's coin-based ones is clearly poor, since these examples are textbook cases of small worlds (or elementary domains), in which the normatively preferable MB strategy is attainable.
The paper is organized as follows: section 2 describes Schmeidler's two-coin thought experiment that is supposed to produce AA.It also defines formally two of the central concepts of this paper, namely MB and CB.Section 3 is an analytic discussion of the arguments for preferring MB over CB.Section 4 provides the main objection to MB, namely logical omniscience.It also elaborates on the connection between MB and small worlds and the efficiency of counterfactual probabilistic reasoning in such worlds.Section 5 demonstrates formally the absence of AA under MB reasoning within Schmeidler's two-coin example.Section 6 concludes the paper.

Schmeidler's two-coin example and modern versus classical Bayesianism
Schmeidler (1989) uses the following coin example, which aims at conveying the same message with Ellsberg's two-urn paradox (1961).Assume that the agent X considers two coins A and B. X knows with certainty that the probability of "heads" for A (H A ) is 0.5.On the other hand, she has no such information about B. The two coins are about to be tossed and X has the option to bet either on A-related events or B-related events.Specifically, she may choose bet A in which she earns 1$ if the event occurs and loses 1$ if the following event materializes
Alternatively, she is offered the option to enter bet B which gives her 1$ if the event materializes and produces a loss of 1$ if the event

P D P D <
It is easy to show that such a P is non-additive.Indeed, where Ω is the relevant sample space, namely { } , , , .

A B A B A B A B H H H T T H T T Ω =
Assuming that P is additive, What causes the violation of the additivity property in P? It is not merely the presence of asymmetric information, but rather the fact that the agent allowed I S to affect directly her probabilistic beliefs, instead of utilizing (as she should) I S indirectly by conditionalization.But in order to be able to conditionalize on I S , a pre-existing vehicle is required, namely a probability function, prior to the acquisition of I S .Put differently, the agent should form her probabilistic beliefs by employing the MB counterfactual strategy outlined in the introduction.
Let us now define the concepts of MB and CB more formally.To this end, assume an agent, who started her investigations at time t, having both I B and I S at her disposal.The agent is interested in updating her beliefs about the proposition/event A in the light of new evidence E obtained at t + 1.A CB agent carries out this task as follows: On the other hand, the MB strategy advises the agent to pursue the following two-step procedure: First, "travel back" to your epistemic history and identify point, t 0 , at which I S was not yet available (although it was conceivable).Second, once this point was found, evaluate the prior that you would have formed at this point (counterfactual or hypothetical prior) and use it from this point onwards as your actual prior.

MB:
+1, , , 0 ( ) = ( | , ). is supposed to play: "A rational subject's credences are fixed by her hypothetical priors and her total evidence.A subject's credences are represented by a dynamic probability function, a function that changes with her evidence.A subject's hypothetical priors are represented by a static probability function, a function that encodes her disposition to respond to evidence.(Hypothetical priors are called 'priors' because they can be thought of as a rational subject's original credences in possibilities, prior to the receipt of any evidence, and 'hypothetical' because it is unlikely that one ever was in such a state.)"(2008, emphasis added).It is worth emphasizing that despite the term "modern", MB is not a form of Bayesianism that was put forward only recently.Indeed, several authors from the 1970s entertained the idea of "hypothetical priors", that is, "priors without evidence" 3 .
What should the basic probability concept be in the development of rational decision theory?Prior or Current Probability?To answer this question we must think of what personal probabilities should reflect if they were to be rational: the person's permanent dispositions for forming beliefs (on the basis of the available evidence) or merely her momentary inclinations at some given point in time?
Rudolph Carnap argues strongly in favor of the first option.Indeed, his monumental work on inductive logic (Carnap, 1950) is based on the concept of hypothetical or counterfactual initial credence function that can be ascribed to the agent X, before the collection of any evidence.More specifically, Carnap (1962) considers a sequence of data E 1 , E 2 , ..., E n obtained by the agent X up to the present time T n (in Carnap's notation).He also defines K n to be the "total observational knowledge" of X at T n , that is Then Carnap contemplates X's epistemic state at time T 0 in which X possesses no observational knowledge at all: "Now consider the sequence of X's credence functions.In the case of a human being we would hesitate to ascribe to him a credence function at a very early time point, before his abilities of reason and deliberate action are sufficiently developed.But again we disregard this difficulty by thinking either of an idealized human baby or of a robot.We ascribe to him a credence function Cr 1 for the time point T 1 ; Cr 1 represents X ′ s personal probabilities based upon the datum E 1 as his only experience.Going even one step further, let us ascribe to him an initial credence function Cr 0 for the time point T 0 before he obtains his first datum E 1 .Any later function Cr n for a time point T n is uniquely determined by Cr 0 and K n : For any H, where Cr′ 0 is the conditional function based on Cr 0 " (1962, pp.310, emphasis added).Is Cr 0 actual or counterfactual?Carnap explicitly allows the initial credence function to be hypothetical: He first raises the question: "How can we understand the function Cr 0 ?"The answer to this question depends on whether the agent X is a robot or a human being.In the first case, Cr 0 is robot's actual credence function at T 0 .With respect to the second (more relevant case) Carnap's answer is as follows: "In the case of a human being X, suppose that we find at the time T n his credence function Cr n .Then we can, under suitable conditions, reconstruct a sequence E 1 , E 2 , ..., E n , the proposition K n , and a function Cr 0 such that (a) E 1 , E 2 , ..., E n are possible observation data, (b) K n is defined by ( 3), (c) Cr 0 satisfies all requirements of rationality for initial credence functions, (d) the application of (4) to the assumed function Cr 0 and K n would lead to the ascertained function Cr n .We do not assert that X actually experienced the data E Teller (1975 same book) argues that in order to be able to conditionalize on I S (the relevant evidence) an initial probability function, which does not depend on I S , is required in order for the process of updating beliefs to get started: "However, for a Bayesian theory fully to characterize the degrees of belief which arise by conditionalization, the theory must specify the belief function from which to start.This initial function is called the prior probability function" (1975, pp.168-169).Earman (1992) also emphasizes the need of a starting point in order for the conditionalization process to be operational: "...an agent begins as a tabula rasa, chooses her priors, and forever after changes her probabilities only by conditionalization."(1992, pp.139-140 original emphasis).
on the distinction between the two competing concepts that drive a person's (say X) degrees of belief mentioned above, namely: (i) X's momentary inclination at time T and (ii) X's permanent disposition to believe.Carnap argues that what we are interested in, in developing rational decision theory is the second concept, that is a "trait" of X's "underlying permanent intellectual character".This disposition is best reflected on X's initial credence function, that is X's degrees of belief prior to the acquisition of any evidence.Any change in X's beliefs, due to the emergence of new evidence, E, should be based on Cr 0 (• | E).Such a conditionalization ensures that X's current credences will also be driven by X's permanent disposition for forming beliefs on the basis of the received evidence.To this end, Carnap argues as follows: "When we wish to judge the morality of a person, we do not simply look at some of his acts, we study rather his character, the system of his moral values, which is part of his utility function.Single acts without knowledge of motives give little basis for a judgement.Similarly, if we wish to judge the rationality of a person's beliefs, we should not simply look at his present beliefs.Beliefs without knowledge of the evidence out of which they arose tell us little.We must rather study the way in which the person forms his beliefs on the basis of evidence.In other words, we should study his credibility function, not simply his present credence function."(1962, pp.312, emphasis added).
The distinction between actual versus hypothetical priors in the context of Schmeidler's two-coin example may be described as follows: Assume that at time t, in which the desire for a specific investigation or bet emerged (for example the time at which the agent decided to bet on A H D ), specific information I S already existed (for example, the coin A has already been flipped five times, with the results being H A , H A , T A , H A , T A ).What is the agent advised to do in this case?Should she form her priors by taking I S into account?Or should she "travel back in time" and ask herself "what would have my priors been at time t 0 when I S was not available?"MB responds negatively to the first question and positively to the second.
Based on the above MB recommendation, the following pressing question presents itself: Why should the agent take the burdensome and imagination-stretching counterfactual route (2) in forming her current beliefs, while the alternative actual (and intuitively appealing) course ( 1) is readily available?Put differently, why should we require the agent to ponder her subjective probabilities under an epistemic state which is not her actual state of knowledge but rather a counterfactual one?There are at least three reasons that compel an agent to follow the counterfactual path: These reasons are analyzed in detail in the next Section.
3 Arguments for MB 3.1 MB and the chance-credence relationship: avoiding inconsistencies Principal Principle (PP), originally suggested by David Lewis (1973), aims at capturing the following intuitive idea: If the agent X comes to know (with certainty) that the objective probability of the event (proposition) A, Ch(A), is p and does not possess any inadmissible evidence for A 4 , then X's subjective probability of A must be set equal to p.The following argument discusses a special case, under which MB is consistent with PP, whereas CB is not consistent with PP.
How to describe formally the PP?One option is to use the current credence function , .

I I B S t P
, which gives: where E is admissible evidence obtained at t + 1.
The first version of PP, given by ( 5), may be interpreted in the light of CB, as follows: The agent X, being at time t, has at her disposal I B and I S .Being a CB, the agent uses both these items of information to generate her "unconditional" probability , .

I I B S t P
(A) at t.If she acquires a further item of information E then her new updated probability of A would be It must be noted that Lewis never gave a precise definition of "admissibility".The standard interpretation of this concept is "that evidence is admissible if it is not relevant to the outcome of the chance event in question.As a rule of thumb, he (Lewis) takes information about the past to be admissible, and information about the future to not be admissible."(Meacham, 2007, pp.18).
However, if in addition to E the agent learned that the chance of A is p, then the latter piece of information would dominate the formation of her beliefs, in the sense that < Ch(A) = p > screens-off any admissible evidence, E, for A, thus yielding (5).Hence, her new (updated) probability of A is given by, , A second option for the agent is to operate as an MB, thus treating I S not as actual but rather as counterfactual.As a result, she employs the initial prior probability function .0 I B P , instead of the current one, thus yielding the following version of PP: What is the restriction that (7) imposes on X's current credences?Put differently, assume X is currently at time t + 1, in which she has come to know I S , < Ch(A) = p > and E. MB's condition (2) implies the following relation: , , ) .

B B S I t I I E Ch A p s P A P A Ch A p E I p
Comparing ( 6) with ( 8), we form the impression that it makes no difference to the agent's updated probability of A, whether she acts as a CB or MB.However, Meacham (2007) argues that this is not the case.He argues that (5), as opposed to (7), leads to contradictions.To understand why, we must first emphasize the difference that the role that I S plays in ( 5) versus ( 7): On one hand, under the MB formulation (7) of PP, I S is conditioning information and therefore, we must examine if it is admissible or not.Now, let us consider the special case in which A is included in I S at time t.In such a case, I S becomes automatically inadmissible and therefore, PP becomes non-applicable and the agent's probability should be set equal to one.
On the other hand, I S is not conditioning information; it is rather a direct determinant of the agent's current probability function , .

I I B S t P
As such, admissibility is not relevant.Therefore, the agent cannot condition on A since it has already determined her probability function.Hence, (5) dictates that the agent's subjective probability of A should be equal to p < 1.But on the other hand, this probability should be equal to one, since A is known to the agent.Hence, we have run into the following contradiction: Obviously the extent to which the prior probability is preferable to the current probability for both formal and conceptual reasons, determines the extent to which MB is preferable to CB.

MB and the problem of old evidence
The "problem of old evidence" concerns the role of evidence, E, that was already known (hence, old) at the time of the formation of a scientific theory (T) for the confirmation (or not) of T. The problem was first presented by Glymour (1980) who described it as follows: "Scientists commonly argue for their theories from evidence known long before the theories were introduced.Copernicus argued for his theory using observations made over the course of millennia.... Newton argued for universal gravitation using Kepler's second and third laws, established before the Principia was published.The argument that Einstein gave in 1915 for his gravitational field equations was that they explained the anomalous advance of the perihelion of Mercury, established more than half a century earlier.... Old evidence can in fact confirm new theory, but according to Bayesian kinematics it cannot.For let us suppose that evidence e (E in our notation) is known before theory T is introduced at time t.Because e is known at t, P t (e) = 1.Further, because P t (e) = 1, the likelihood of e given T, P t (e|T), is also 1.We then have:

= =
The conditional probability of T on e is therefore the same as the prior probability of T : e cannot constitute evidence for T.... None of the Bayesian mechanisms apply, and if we are strictly limited to them, we have the absurdity that old evidence cannot confirm a new theory."(1980, pp.85-86).The previous quotation highlights a conflict between Bayesian theory of confirmation and the standard scientific practice when it comes to the use of E. Specifically, E, under the standard scientific practice, can in fact confirm new theory, but according to Bayesian theory it cannot.
Bayesian philosophers have responded to the problem of old evidence in (mainly) two different (and incompatible) ways.The first response, put forward by Garber (1983), Niiniluoto (1983) and Jeffrey (1983) goes as follows: What makes the scientist to increase her confidence in T, is not E per se, since this information was already known to her at the time she invented her theory.Rather it was the realization of the logical entailment relationship that made the scientist to increase her confidence in T. For example, "Einstein discovered only after writing down the equations of General Relativity that they entailed the anomalous perihelion advance of Mercury."(Howson, 1991, pp.553).Hence, the increase in the agent's degree of belief in T comes from conditionalization on the entailment relationship (9) itself rather than conditionalization on just E. The proposition (9) at t 0 was uncertain, in the sense that the agent had not established/proved the validity of the postulated entailment relationship, i.e.

( ) 1, P T E < 
in which case the realization of (9) at t results in an increase in probability of T.
Note that this line of defense is based on the assumption that the agent conditionalizes on the entailment (9), which in turn implies that at t 0 , the agent had already assigned a prior probability to this proposition, i.e 0 ( ) P T E  = p, with p < 1.This means that at t 0 the agent was not aware of the truth of (9) (which truth she discovered only later) which in turn implies that at t 0 the agent had assign a probability less than one to a logical truth.This, however, means that P 0 is not coherent.Hence, this line of defense against the problem of old evidence leads to another problem (potentially more serious) which is incoherence of the agent's probability function.
The second argument for solving the problem of old evidence is more in the spirit of MB since it is based on the following counterfactual "what-if" strategy.Garber (1983) portrays this strategy as follows: "One obvious response might begin with the observation that if one had not known the evidence in question, then its discovery would have increased one's degrees of belief in the hypothesis in question (H).That is, in the circumstances in which E really does confirm H, if it had been the case that P(E) < 1, then it would also have been the case that P(H|E) > P(H)." (1983, pp.103, emphasis added).The solution outlined by Garber, suggests that the agent should not evaluate her probabilities relative to I = I B ∪ I S but only relative to I B .This is exactly the proposal offered by Howson (1991) in his attempt to solve the problem of old evidence.Indeed, he explicitly states that the probabilities "should always be relativized to K − {E}" (1991, pp.548, with K and E in Howson's notation corresponding to I B and I B , respectively) 5 .But, as analyzed in the introduction, this is exactly the point at which MB and CB differ.Hence, choosing MB over CB might be motivated by the fact that the first strategy fares better than the second in cases of "old evidence".

MB and cognitive dissonance
Let us now turn our attention to the third reason for preferring the counterfactual MB strategy over the actual CB one.This reason is akin to the possibility that the agent evaluates the entailment relationships between theoretical and evidential statements in a more unbiased way when the evidence I S is hypothetical than when it is actual.The psychological finding of treating hypothetical and actual evidence differently, relates to the theory of cognitive dissonance, used by Albert Hirschman (1965) to describe attitude changes toward modernization in the course of development.More generally, cognitive dissonance stems from the fact that persons who have made decisions, tend to discard information that would suggest such decisions are wrong.As Akerlof & Dickens (2012) put it, "Using Bayesian decision rules, agents' estimates of the state of the world is only influenced by the information available to them and their preferences over states of the world, but these estimates are independent of their preferences for beliefs per se.The cognitive dissonance model not only predicts systematic 5 Chihara (1987) found this solution unsatisfactory: "First note that K − {E} will probably contain many propositions that imply E. ... Thus, to avoid absurdity, it will not be enough to simply delete E from K: One must also delete all propositions that imply E." In response, Suzuki (2005) tackled this criticism by devising a probabilistic version of AGM (Alchourron et al., 1985) theory.He proceeds in two steps.The first step amounts to deleting the evidential statement E from K, by adopting a system of classical sentential logic and deleting E together with some sentences from K so that one cannot deduce E in this system.The second step is to adopt what he refers to as "ternary-Bayesian confirmation theory" to construct a new probability measure * { } K E P − (from P) on K − {E} under which the problem of old evidence does not arise.
differences in interpretation of given information but also systematic differences in receptivity to new information according to preferences."Under our setup, the probability that the agent assigns to the hypothesis H A : "I shall live for at least another five years" conditional on the actual event B: "I am diagnosed with aggressive lung cancer" is likely to be bigger than the probability that she would have assigned to H A if the event B were not actual but just a member of a set of many hypothetical events including the event non-B.In this example, the actual knowledge of the unfortunate event B at time t (now) is likely to create psychological pressure to distort the partial entailment relationship between H A and B that the agent was willing to accept at t 0 , thus yielding , 0 ( ) ( | ).

t B A A P H P H B >
This argument suggests that the subjective evaluation of probabilities should be made, as Longino (1979, pp.35) puts it, "in rational, cool, moments".As already mentioned, such a "cool" and "impartial" agent is an MB agent.In other words, an MB agent will never succumb to cognitive dissonance.

Objections to MB: logical omniscience
The arguments presented above suggest that MB is the optimal strategy that must be followed by an agent who is in the process of forming her subjective probability function at t (now).For reasons of clarity, this strategy is summarized below.The agent is currently at time t, knowing both I B and I S .The agent is advised to "pretend" that she does not know I S and perform the following thought experiment: go back in time at period t 0 in which only I B was available.At t 0 , the background information I B contains the full set of hypotheses H that the agent actually entertained at t 0 , the full set of evidential sentences E that the agent conceived as possible at t 0 , (every conceivable possible course events after t 0 ) as well as all the entailment relationships between elements of H and elements of E (e.g.H i ⊢ E j ).More specifically, each member of E describes the observations that have not been actually made at t 0 but are considered by the agent at t 0 as possible to make at the future period t.At t 0 the agent's subjective probability function P 0 was defined over the field of propositions L (which includes H and E).Assume that at t, the particular element E ∈ E is realized, which implies that the agent at t has this specific information, i.e.I S ≡ E. According to the MB counterfactual strategy, the agent's new probability function, P t , at t should be her old probability function P 0 conditional on E. As a result, her unconditional new probability for the uncertain event A (that will occur at t+1) should be set equal to the conditional probability P 0 (A | E) rather than being determined "from scratch" as P t,E (A).
Is the above counterfactual strategy without any problems?The answer is negative.The main objections raised against this strategy revolve around the concept of logical omniscience (or its lack thereof).Specifically, observe that in the description of the counterfactual strategy presented above, we defined H and E as the sets of theoretical and evidential statements, respectively, conceived by the agent at t 0 .However, Bayesianism assumes something significantly more than that.It assumes that the agent knows all the possible hypotheses, H * all the possible evidential statements E * and all the possible entailment relationships between H * and E * ; in other words the agent is assumed to be logically omniscient and her prior probability function, * 0 P , is defined over the field ℒ* of all propositions, which is (infinitely) richer than ℒ.Why does Bayesianism require such an extreme condition?The answer is to ensure coherence of * 0 P .Specifically, if we wish (as Bayesianists do) to ensure that * 0 P is a proper probability measure then we must establish that to each logical consequence p ⊢ q the agent assigns probability equal to one, that is: This is because if p logically entails q, it does so in every possible state of the world.Hence, it appears that coherence requires that the agent is capable of tracking all logical consequences at t 0 .In other words, the agent is not allowed to be unaware even of a single logical truth that exists within the underlying domain ℒ.As will be shown below only under this extreme form of logical omniscience the aforementioned MB counterfactual strategy is expected to work under all epistemic states that the agent may entertain at t.

Local vs. global Bayesianism
Logical omniscience of the form described above is a condition that is usually met within the variant of Bayesianism, known as global Bayesianism (GB).Garber (1983) defined GB as follows: "One popular conception of the Bayesian enterprise is what I shall call global Bayesianism.On this conception, what the Bayesian is trying to do is build a global learning machine, a scientific robot that will digest all of the information we feed it and churn out appropriate degrees of belief.On this model, the choice of a language (domain) over which to define one's probability function is as important as the constraints that one imposes on that function and its evolution.On this model, the appropriate language to building into the scientific robot is the ideal language of science, a maximally fine grained language ℒ, capable of expressing all possible hypotheses, all possible evidence, capable of doing logic, mathematics, etc.In short, ℒ must be capable, in principle, of saying anything we might ever find a need to say in science."(1983, pp.110 emphasis added).
In what respect, does the MB strategy suffer in the case of an agent whose prior probability function is P 0 rather than * 0 P ?This question is equivalent to the following one: "Why is GB not realistic?".The problem is the following: Assume that at t the agent comes across a piece of evidence E * that she had not conceived of as possible at t 0 .Assume that this piece of evidence is the agent's specific information (I S ) obtained at t.According to MB, the agent should go back to t 0 and conditionalize on E * in order to generate her new probability function at t.However, an unpleasant surprise awaits the agent: At t 0 there is no probabilistic assignment on E * simply because E * was not in the domain of P 0 at t 0 .To make things even worse, assume that E * was obtained at t because a new hypothesis, H * , was put forward at t which motivated the observations (or experiments) that led to E * .Obviously, H * was not in the domain of P 0 either.For example, consider as E * and H * the "deflection of light by the sun" and "Einstein's general relativity theory", respectively.If a physicist of the late nineteenth century was asked to assign his personal probability to the event that the light is deflected by the sun, he would have been caught by surprise.His domain, ℒ at t 0 was simply not rich enough to accommodate either E * or H * (let alone the entailment H * ⊢ E * ).In such a case, counterfactual conditionalization seems impossible and the whole MB strategy runs into trouble.
The preceding analysis suggests that the MB counterfactual strategy is unrealistic to the same extent as GB, (as a description of the epistemic state of real people) is.If the MB strategy is unrealistic within the GB framework, then is it possible that there is another framework, local Bayesianism (LB), within which the MB strategy is likely to work?Before we answer this question, let us first describe, following Garber (1983) how LB might be defined: "Typically when scientists or decision makers apply Bayesian methods to the clarification of inferential problems, they do so in a much more restricted scope than global Bayesianism suggests, dealing only with the sentences and degrees of belief that they are actually concerned with, those that pertain to the problem at hand.This suggests a different way of thinking about the Bayesian learning model, what one might call local Bayesianism.On this model, the Bayesian does not see himself as trying to build a global learning machine, or a scientific robot.Rather, the goal is to build a hand-held calculator, as it were, a tool to help the scientist or decision maker with particular inferential problems.On this view, the Bayesian framework provides a general formal structure in which one can set up a wide variety of different inferential problems.In order to apply it in some particular situation, we enter in only what we need to deal with in the context of the problem at hand, i.e., the particular sentences with which we are concerned, and the beliefs (prior probabilities) we have with respect to those sentences."(1983, pp.111, emphasis added).
Formally, assume that the investigator is interested in the set of specific hypotheses (theoretical propositions) H i , i = 1, 2, ..., n that cover the full set of possibilities for the problem at hand (e.g. the coin is fair, the coin is biased towards H, the coin is biased towards T, etc).Related to the aforementioned theoretical propositions are the set of evidential propositions E j , j = 1, 2, ..., m which are relevant for H i (e.g the outcome of the next toss is H, the outcomes of the next two tosses are H,T etc).No other proposition enters the agent's "problem over relative domain" ℒ that is relevant for the specific problem at hand.More specifically, Garber (1983, pp.111) defines ℒ to be "just the truth-functional closure" of the H i , E j and propositions of the logical entailment between H i and E j , that is propositions of the form "H i ⊢ E j ".As a result, the agent's prior probability function is defined over the "local" more modest domain ℒ rather than the "global" ideal domain ℒ*.This in turn implies that it is much easier for the agent to track all logical consequences "H i ⊢ E j " (for some i and j) within ℒ than within ℒ*, thus making local Bayesianism more realistic than global Bayesianism (Bayesianism with a human face, as Jeffrey (1983) puts it).

Small vs. large worlds
The description presented above makes clear that LB refers to well-defined, small-scale cases (characterized by a small number of hypotheses, evidential propositions and logical relationships) which are usually referred to as "small worlds".In such a world, it is quite plausible to assume that the agent can consider all the possibilities (hypotheses, evidential propositions and their entailment relations) that are relevant for the problem at hand at t 0 .In such a case, no surprises await the agent at t, in the sense that his epistemic state at t is identical to that at t 0 .Binmore (2009) defines a small world as one "within which all potential surprises have been predicted and evaluated in advance of their occurrence."(2009, pp 8).
What are the implications of small worlds for the effectiveness of the MB counterfactual strategy?Binmore (2009) comments on this question as follows: "Only in a small world, in which you can always look before you leap, it is possible to consider everything that might be relevant to the decisions you take."(2009, pp.139 emphasis added).Indeed the "look before you leap" proverb is attributed to Savage who used it as antithetical to "cross that bridge when you come to it" that referred to the so called "large worlds".It is worth mentioning that Savage himself made quite clear that his own conception of subjective probability together with its axiomatization is relevant only for small worlds.This is because Savage's framework is essentially static in the sense that it does not allow for the so-called "concept formation", that is the formation of a new hypothesis or a new idea sometime in the future.Commenting on Savage's approach, Suppes (1966) observes: "The important thing I wish to emphasize is that the theory provides no place for the decision-maker to acquire a new concept on the basis of new information received.The theory is static in the sense that it is assumed that the decision-maker has a fixed conceptual apparatus available to him throughout time."(1966, pp.21).Savage himself acknowledged the fact that the static nature of his theory makes it inapplicable in the case of large evolving worlds by referring to such an extension as "ridiculous" and "preposterous".

Small worlds, counterfactual strategy and ambiguity aversion
The aforementioned discussion may be summarized as follows: MB and CB refer to the agent's strategy or course of action in forming her probability function at GB and LB refer to the agent's epistemic status at t 0 as reflected in the contents of her background information I B or her domain, (ℒ or ℒ*).Whether MB or CB is appropriate depends on which epistemic status, GB or LB characterizes the agent.Obviously there are four combinations for the agent to consider in her attempts to solve the problem of forming her subjective probability at t ′ : (i) MB-GB: ideal but unrealistic (infeasible) solution.
(ii) MB-LB: the appropriate solution in small worlds.
(iii) CB-GB: a feasible solution in large worlds but with the kind of problems analyzed in section 3.
How does the above classification bear upon AA?The preceding discussion has made clear that the fact that the agent allowed I S to affect directly her probabilistic beliefs, is only a necessary condition for AA.A further condition is that the problem at hand falls into the category of large worlds.Let us clarify this argument, which forms our basic thesis in this paper.Assume that the agent at time t has information, I S , about the objective probabilities of the events in k F , but she lacks similar information about the events in k ′ F .If the world is small, the agent can go back in time at t 0 , ignore I S and assign probabilities in F ∪ k ′ F in an information-symmetrical way.Once her prior probability is thus determined (counterfactually at t 0 ), the agent is allowed to bring I S back to the picture by conditionalizing on it.In such a case, the only way for the agent to produce AA is to deviate at t from her probabilistic commitments at t 0 , that is by exhibiting dynamic inconsistency.All the cases motivating AA that have appeared in the literature (including the original Ellsber's paradox as well as Schmeidler's two coin example) are "textbook cases" of small worlds.Hence, although we do not claim that AA cannot arise in large worlds, (because in such a context the counterfactual MB strategy is unrealistic) we do argue that AA has been poorly motivated!

Schmeidler's two-coin example revisited
Let us now revisit Schmeidler's two-coin example discussed in the introduction in light of the analysis presented in section 4. First describe the agent's epistemic background at t 0 .For simplicity, we assume that concerning coin A, the agent knows with certainty that only one of the following three hypotheses is true: To make things even simpler assume that in the case of 2 A H the objective probability of H is equal to 0.6, whereas in the case of 3 A H the objective probability of T is 0.6.Similar assumptions are made for coin B, namely the agent knows that one of the following three hypotheses is true: , , The agent has to decide about her prior probabilities of the above hypotheses.Since there is no direct information supporting one hypothesis over the rest, the agent is likely to subscribe to the principle of indifference (or the maximum entropy principle) according to which: Alternatively, the agent might consult her past experience on similar cases (part of her background information) which suggests that it is more often for someone to encounter cases of fair coins than not.In any case, the important thing to notice is that there is no specific information at t 0 , hence there is no informational asymmetry between the set of hypotheses concerning coin A and that of coin B.
Let us now calculate the probabilities that the agent assigns to the events/propositions Using the law of total probability, the agent gets: In a similar fashion, we obtain, Let us assume that at time t > t 0 the agent acquires an important piece of specific information for the problem at hand.In particular she is given a set of data, A E ∞ , consisting of the results of a very long series of tosses of coin A for which (almost) half of them are H.The following two questions are of interest: (i) how does the agent update her probabilities of A E ∞ about A and (ii) does she end up with a new system of beliefs that violate coherence?Bayesian conditionalization implies: The Principal Principle implies that (knowledge of the objective chance screens off any admissible evidence): | Furthermore, given that the evidence A E ∞ is assumed to be arbitrarily large: The above relationships together with (10) imply that Similarly, ( ) 0.5.

A t T P D =
Let us now turn our attention to the events, B H D and B T D (related to coin B) and examine how the acquisition of A E ∞ affects their probabilities.
As before, PP implies Now the crucial question is the following: does the evidence on coin A affect the prior probabilities attached to coin-B related events?Assuming independence between the sets H A and H B the answer is negative.Hence, Finally, the above relationships imply ( ) ( ) 0.5.

P D P D <
which would imply an ambiguity averse, probabilistically non-sophisticated agent?The answer is that the analysis presented above implies that the agent follows the MB counterfactual strategy in forming her probability function P t at t. Indeed, the key point is to focus on how the agent formed her new probability function P t once she became aware of A E ∞ .Specifically, the agent did not use A E ∞ as a direct determinant (on a par with background information) of P t .Instead, she "traveled back in time" at point t 0 , identified P 0 and stuck to those commitments at t 6 .In this setting, AA would emerge only if the agent abandoned her commitments made at t 0 and assigned different probabilities at t, thus exhibiting dynamic inconsistency.
The preceding analysis implies that the only way for AA to emerge (without causing dynamic inconsistency) is to eliminate P 0 .Eliminating P 0 means that at t the agent realizes that she faces a new reality that had not (and could not have) anticipated at t, which obliges her to re-evaluate her full set of probabilistic beliefs (to form a new prior).Such a setting can arise within a large world framework.Teller (1975) gives an example of when such a strategy is required: "Examples are the wildcatter's problem of where to drill an oil well and the manufacturer's problem of how much to produce.The personalists advise the wildcatter and the manufacturer to estimate their initial degrees of belief subjectively for any one such problem, but if a similar problem arises years later under considerably different circumstances they are advised to make new evaluations rather than to try to conditionalize their old estimates on the vast body of intervening observations of uncertain relevance to the problem."(1975, pp.170, emphasis added).However, Schmeidler's and Ellsberg's examples analyzed above are quite different from the wildcatter's problem.Consider the following question with respect to Schmeidler's two coin example: is there any reason for the agent to refine her probabilistic judgements at t, because at t she came across a piece of surprising information, that is information that she could not have conceived back at t 0 ?The answer is negative, because Schmeidler's two coin example takes place in a small world, where the concept of "surprising information" is not relevant.As a result, the MB strategy is perfectly feasible; in fact it follows quite naturally!6 Note that the agent having formed P 0 at t 0 , was at an epistemic state of not knowing with certainty that A E ∞ has occurred

Concluding remarks
We conclude by summarizing our main arguments.There are two ways to use specific information, I S , in constructing the agent's subjective distribution at t, P t .The first is to use it as conditioning information, consistent with MB while the second is to use it on a par with any other background information I B , consistent with CB.There are good reasons, analyzed in detail in section 3, for preferring the first interpretation (MB) over the second (CB).Does the choice of MB over CB have implications for the main issue under scrutiny in the present paper, namely the emergence of AA?The answer is yes.If we choose MB there is no room for AA, since being ambiguity averse in such a framework is equivalent to being dynamically inconsistent.On the other hand, AA can arise under CB because when I S determines the agent's subjective distribution, leads to informational asymmetries.
Can we always choose MB over CB?The answer is negative.MB is applicable only in small worlds.In large worlds, MB may not be operational, especially when the agent's epistemic state at the time that I S arrives is richer than the one at the time prior to the arrival of I S , which is the case in large worlds.Hence, AA may be thought of as a (potential) property of large worlds.To this end, the motivation of AA in the form of examples such as Ellsberg's classic urn-based or Schmeidler's coin-based, is clearly poor since these examples are textbook cases of small worlds.

Christoph Kuzmics
University of Graz, Graz, Austria Michael Greinecker University of Graz, Graz, Austria There are decision problems where all uncertainty comes with objective probabilities such as gambling with fair coins and dice.Most real-world decision problems come without natural probabilities, problems in which decision-makers face fundamental uncertainty or, as it is often called, ambiguity.Decision-makers know what their options are and what the possible consequences might be, but not more.The radical subjectivist Bayesian position is that rational decision-makers should be able to quantify all forms of uncertainty -whether they come with natural probabilities or not -in terms of a probability distribution over the states of the world.These probabilities are subjective beliefs, they can even be unique to their owner.L. J. Savage gave the definitive account of the radical subjectivist Bayesian position.Decision-makers choose acts, which are defined as functions from states of the world to consequences.If their preferences satisfy some normatively appealing axioms, they must behave as if they have a numerical index over consequences (``utilities'') and a (uniquely specified) probability assignment on the states of nature such that their preferences can be represented by the expectation of the index with respect to the probability distribution -they are subjective expected utility maximizers.
In the 1960s, Daniel Ellsberg came up with some thought experiments in which behavior that violates the axioms of Savage seems quite natural and in which people, once the thought experiments were made into actual experiments, indeed often violate the Savage axioms.Importantly, the people whose behavior violated the Savage axioms often stick to their choices when it is explained to them how their choices violate supposed rules of rationality.The Ellsberg experiments gave rise to two literature strands: In one, the actual behavior of people in similar experiments is studied; see Trautmann and Van De Kuilen (2015) for a survey.In the other strand, researchers tried to develop more permissive theories of rationality that allow for the observed behavior in Ellsberg's experiments; see Gilboa and Marinacci (2013) or Machina and Siniscalchi (2014) for surveys.The paper under review is mostly concerned with the latter strand, though the authors also discuss the plausibility of such theories as descriptive theories of human behavior.
According to a famous argument by Schmeidler (1989), a pioneer of normative theories of decision making that can accommodate the choices from the Ellsberg experiment, the problem with subjective expected utility maximization is that probabilities sweep important differences for a decision-maker under the carpet.Consider a decision-maker who has to bet on whether a coin comes up heads or tails.The decision-maker assigns the same probability to both events.But this could be because they do not know what any potential bias might be or because they have tested the coin extensively and decided it is fair.The assignment of probabilities is the same in both cases and this, argues Schmeidler, shows that probabilities do not contain enough information and a rational individual might treat these situations as different.They might well prefer decisions where they know the probabilities and be averse to fundamental uncertainty -they can be rationally ambiguity averse.
The authors of the paper under review put the focus on where beliefs, liberally interpreted and possibly including objects other than probability measures, come from.They call such belief-based decision-making Bayesian. 1 They draw a distinction between what they term classical Bayesianism and modern Bayesianism.Classical Bayesians put all previous information into their beliefs, while modern Bayesians only start with, fairly universal, background information in their priors and incorporate all additional information by updating their previous beliefs.We think this is an important point that is not sufficiently in focus in most contemporary work on decision theory: Beliefs have to come from somewhere, and good theories should relate current beliefs to past beliefs and the information learned in between.The modern Bayesian approach is clearly superior to classical Bayesianism on conceptual grounds.However, it requires very detailed and comprehensive models that include all possible sources of information.In small isolated settings (``small worlds'', a term already used by Savage) such as the Ellsberg experiments, this might well be possible.However, in the ``large worlds'' we live in, modern Bayesianism becomes overly demanding.In the language of modern decision theories, one needs to include all possible sources of information one might ever learn in the state space, which will then end up too big to be useful.The authors draw the consequence that ambiguity aversion should be something one may expect to be present in complex real-world decision problems but less so in the original Ellsberg experiments and similar isolated settings.
We think the paper raises important questions that decision theory ought to focus on, such as the origins of prior beliefs and how they should be interpreted.The conclusion that the maxims of modern Bayesianism are hard to follow in large worlds does not readily imply that commonly used models of ambiguity aversion are good compromises for large worlds.Formal results to that effect would be great contributions in our opinion, and we hope this paper will eventually give rise to such works.To do so, such works should cast the arguments in a more explicit formal setting.For example, it would be desirable to have an explicit representation of the information in I B and I S .
Further more detailed comments: The view that Ellsberg experiments are not the best motivation for ambiguity aversion is also held by others in the literature.For instance, according to Gilboa (2009, p. 136), David Schmeidler is fond of saying that "[r]eal life is not about balls and urns.''Indeed, Schmeidler's initial research in this area was motivated by what he perceived to be shortcomings of probability distributions as general representations of uncertainty. 1.
Similarly, the paper under review points to why ambiguity is best studied for decisions concerning real, large, world events rather than simple artificially created events in the lab.Some such work has begun by, for example, Baillon, Huang, Selim, and Wakker (2018).

2.
The present paper partially explains why the two strands of literature, the experimental one and the normative one, developed largely independently.The latter literature does not necessarily aim at understanding how actual people make choices in all settings, acknowledging that people do not always make optimal choices even according to their view of the world, but instead aims at understanding what could be considered rational decision making.The Ellsberg experiments were, thus, just food for thought that provoked a new research direction.The present paper helps to understand why.

3.
The survey by Machina and Siniscalchi (2014) focuses on pure theory and is not a good reference for experimental evidence.For experimental evidence, we recommend looking at Trautmann and Van De Kuilen (2015).

4.
Savage shows that under his axioms there can be no ambiguity aversion.The first line of the second paragraph on page 4 should, therefore, be stated differently.

5.
There is extensive literature on the relationship between dynamic consistency (usually in combination with a version of consequentialism) and ambiguity aversion.The authors mention that ambiguity aversion leads to dynamic inconsistencies but do not quote from the relevant literature.A readable introduction to the topic is given by the debate between Al-Najjar and Weinstein (2009) and Siniscalchi (2009).

6.
They build up to their conclusion by using the small world two coin example (Schmeidler's) and show that a rational agent of modern Bayesianism is dynamically consistent and hence ambiguity neutral.The agent is consistent because new (specific) information is not changing the background information, and does not violate the additivity property either.
Behaviourally, AA has been reported in small world contexts like the Ellsberg two-urn paradox.However, as Trautmann and van de Kuilen (2015) reviewed, AA may depend on the elicitation method and is not found for unlikely events.This might refer to the discussion whether event A is included in the specific information (and hence relevant for the outcome and the Principal Principle cannot be applied) or not.
As the authors nicely show, modern Bayesianism (employing a counterfactual strategy and using hypothetical priors) is only feasible in small worlds as large worlds run into coherence problems (one can hold conflicting beliefs, see e.g., Caie 2013 or Tarnowski 2020 "Is having contradictory beliefs possible?Discussion and critique of arguments for the psychological principle of noncontradiction").
In a coherent belief system a modern Bayesian agent cannot be ambiguity averse.But since coherence is not given in large worlds, nor necessarily in all small worlds (e.g., coin task but participant conspicuous about task purpose) behaviourally one may still find AA.The results could potentially explain why AA (behaviorally) shows low reliability (e.g., test-retest) and is more a state, not a trait (see also Carnap "traits") or said differently indicating that participants are better described as agents using classical Bayesianism not modern Bayesianism.
The article should stimulate further research into Simon's "scissor blades" and identify the boundaries of human rationality.
A few minor issues 1) The authors introduce all symbols and abbreviations but D (data, see page 5). 1.
Page 8 second paragraph has three questions, though it is clear what is meant by "MB responds negatively to the first question and positively to the second".

2.
page 9 "difference that the role that I_s plays" -could it be written as "different role that I_s plays" ? 3. page 9 "the agent cannot condition on A since it has already determined her probability function" -I would replace it with she.
page 10 "corresponding to I_B and I_B, respectively," should be I_B and I_S? 6. page 13: "MB and CB refer to the agent's strategy or course of action in forming her probability function at GB and LB refer to the agent's epistemic status ..." -I am wondering whether the sentence should have a "and" before / instead of at "GB and LB refer to ..." 7.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?Not applicable Are all the source data underlying the results available to ensure full reproducibility?Yes

Are the conclusions drawn adequately supported by the results? Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Decision-making under uncertainty, cognitive biases, human rationality I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

P
"as expressing counterfactual possibilities of what she (the agent) imagines she would believe if she didn't have some of the information (in our case, I S ) she does now."(2008, pp.147).In a similar fashion, Meacham (2008) explains the epistemic role that 0 I B P

D
in light of the new information, The important thing to notice is that the agent's new set of beliefs, P t , obtained at t (after the specific information A E ∞ has arrived) is coherent, that is, it is a proper probability function, i.e. it does not violate the additivity property.Why did the preceding analysis not produce AA?Put differently, why the acquisition of A E ∞ did not produce a system of beliefs having the property