Amgiguity Aversion, Modern Bayesianism and Small Worlds

The central question of this paper is whether a rational agent under uncertainty can exhibit ambiguity aversion (AA). The answer to this question depends on the way the agent forms her probabilistic beliefs: classical Bayesianism (CB) vs modern Bayesianism (MB). We revisit Schmeidler's coin-based example and show that a rational MB agent operating in the context of a "small world", cannot exhibit AA. Hence we argue that the motivation of AA based on Schmeidler's coin-based and Ellsberg's classic urn-based examples, is poor, since they correspond to cases of "small worlds". We also argue that MB, not only avoids AA, but also proves to be normatively superior to CB because an MB agent (i) avoids logical inconsistencies akin to the relation between her subjective probability and objective chance, (ii) resolves the problem of "old evidence" and (iii) allows psychological detachment from actual evidence, hence avoiding the problem of "cognitive dissonance". As far as AA is concerned, we claim that it may be thought of as a (potential) property of large worlds, because in such worlds MB is likely to be infeasible.

As far as the second question is concerned, the main argument in favor of the theory that AA is consistent with rational probabilistic beliefs may be stated as follows: AA implies that probabilistic beliefs are not sophisticated, in the sense that they cannot be represented by a unique prior probability measure. Is a probabilistically non-sophisticated agent necessarily irrational? The advocates of the AA theory say no. Instead, they argue that the agent might possess so little (if any) information about the events she is interested in that she is justi…ably (and thus rationally) unable to form a unique prior distribution. As Gilboa and Schmeidler (1989) put it "...the subject has too little information to form a prior. Hence (s)he considers a set of priors as possible." (1989, pp. 142). Schmeidler (1989) is one of the earliest and best-known attempts to provide an axiomatization of AA. Schmeidler argues that at the heart of AA lies the following asymmetry of information: Consider an event space F which contains the "events of interest" for the agent S. Assume that F k F contains those events whose probabilities are known to S, whereas F 0 k F; (F k \ F 0 k = ;, F k [ F 0 k = F) contains the rest of events for which S has no information about their probabilities. AA means that S favors betting on events in F k than F 0 k : However, under the maintained assumption that S's subjective probability of any event D 2 F is measured by S's willingness to bet on D, AA implies that S's degrees of belief de…ned on F violate the additivity axiom of probability theory, thus falling short of being a proper probability function.
What is the main reason for the absence of the additivity property in S's degrees of belief, denoted by P S ? The answer is the presence of information about the chances of events in F k and the lack of similar information for the events in F 0 k : Put di¤erently, the argument that leads to the non-addiditivity property of P S is based on the premise that the agent posseses speci…c (rather than only background) information, which is relevant for some speci…c events in F and irrelevant for all the rest. However, if we assume that such speci…c information, I S , is available, then the following question arises: does P S represent the prior beliefs of the agent? Put di¤erently, is speci…c information allowed to be used in the determination of the prior beliefs, or should these beliefs be based exclusively on the so-called "background" information, I B ?
How is I B de…ned; A universally accepted sharp de…nition of I B is hard to …nd. I B is usually meant to include the "corpus of background knowledge" (Easwaran 200?) that describes the general characteristics of the chance set up in hand (e.g. there are n coins involved, each one is two-sided, successive tosses are independent, etc). More importantly, I B should include the full set of hypotheses H that describe the probabilistic properties of the phenomenon of interest, the full set of evidential sentences E that are relevant for H together with the entailment relationships between elements of H and elements of E (more on this below). On the other hand, I B should not include any actual outcomes of the experiment or information about the objective chances (e.g. coin j is fair). This information is speci…c information or relevant evidence and should be thought of as part of I S : The question of whether to allow both I B and I S or only I B to a¤ect the generation of the agent's prior belief function is crucial for deciding whether AA is (or can be made) consistent with rationality. Hence the crucial question is the following: should P 0 depend only on I B (Option I) or on both I B and I S (Option II)? There are two approaches in the literature: one that claims that the distinction between I B and I S is not only useful but compelling, and the other which sees no point in distinguishing between the two. Sometimes this debate is expresed in an equivalent form: The advocates of Option II argue that the agent's current probabilities (credences), P t ; are elligible to play the role of the "prior" for the evidence that will emerge at t+1. As Howson and Urbach (1991) put it "the priors in any calculation can also be posterior probabilities calculated from the historical data" (1991, pp. 374). As such, current credences are shaped by all the available information that the agent possesses at t, including both I B and I S : In such a case, the evidential signi…cance of I S is "built into" P t in the sense that, for every (future) A 2 F, P t (A j I S ) = P t (A): Notationally, we may emphasize this, by replacing P t with P I B ;I S t : Following Meecham (2007), we shall refer to this view as "Classical Bayesianism" (CB).
On the contrary, the advocates of the opposite view, hereafter refered to as "Modern Bayesianism" (MB), insist on the prior, P 0 ; being determined at a time in which no speci…c evidence I S was available. Put di¤erently, the prior probability function should be determined, once and for all, at a unique point in time, say t 0 ; in which only I B was available. Hence, the prior and current probability functions di¤er for every t 6 = t 0 . Indeed, at t 0 ; I S should be treated as hypothetical evidence, being on a par with any other evidential sentence that the agent might have envisaged at t 0 . In other words, the formation of the agent's prior must be independent of the time in which the agent is actually located at the beginning of her investigations. Instead, it must be generated at a time in which all speci…c evidence for the events of interest is contigent rather than actual. Notationally, P 0 may be replaced by P I B 0 : Does the distinction between Classical and Modern Bayesianism bear any implications for the topic of our current investigation, namely whether or not AA may be made consistent with rationality? The answer to this question is emphatically a¢ rmative. The main point of the present paper is to show that for an MB agent, operating in the context of a "small world" (more on this below), AA is impossible to arise. Put di¤erently, the assumption that the agent is capable of "counterfactual probabilistic reasoning" (the main assumption of MB) entails the absence of ambiguity aversion. Hence, the important question seems to be, not whether the agent exhibits AA, but rather, whether the agent can (descriptive aspect) or should be able to (normative aspect) contemplate her degrees of beliefs under an epistemic state which in fact she is not in.
Is MB a normatively appealing hypothesis? To this end, we put forward three arguments in favor of MB, analyzed in detail in the following section. In a nutshell, an MB agent (as oposed to a CB one) (i) reveals her permanent dispositions rather than momentary inclinations in forming beliefs, (ii) protects herself from falling into logical contradictions akin to the relation between her subjective probability and objective chance, and (iii) ...
What about MB's descriptive status? Is MB reasonably realistic as a model of actual behaviour? Or is it the case that MB puts exceedingly high standards on the agent's probabilistic reasoning abilities? To answer this question, we must …rst de…ne the context within which the agent operates. To this end, we distinguish between "small" and "large" worlds, a distinction already made by Savage (1954). For the purposes of the present paper, a "world" is de…ned to be the domain (language) L of the agent's probabilistic beliefs. This domain includes all the hypotheses of interest H i , all the relevant evidential statements E j , together with the corresponding (logical) entailment relations between H i and E j . A world is small if (a) the number of hypotheses in L is small and (b) L does not evolve over time. These two assumptions imply that the agent is able to conceive in advance all possible hypotheses concerning the matter of interest.
In this paper we argue that MB is a reasonable hypothesis in small worlds. On the contrary, MB is not likely to work in cases in which a) L is exceedingly rich and b) L changes over time due to the formation of new concepts (hypotheses) by the agent. With respect to the second case, imagine a situation in which the agent's current language, L t ; is richer than her language L 0 at the beginning of her investigations (that is before any speci…c information I S was available). This may happen if between the time periods 0 and t, a new hypothesis H new was conceived by the agent. In such a case, H new is part of L t but not of L 0 . MB requires the agent to "go back in time" in order to …nd herself in the epistemic state that corresponds to t = 0: Unfortunately, at this state, the agent is unable to …nd any probabilistic assignments to H new simply because H new was not included in L 0 : This is bad news for MB. Speci…cally, MB requires the agent to contemplate her probabilistic assignments (in the case that I S were not known) not in the "old" language L 0 (which did not include H new ) but in the "new", expanded language L t . The shift from L 0 to L t , however bears an unpleaseant consequence: the agent is forced to change all her probabilistic assignments made in the context of L 0 . As Hajek (200?, pp ?? -A Philosopher's guide to probability) remarks "...changing the language in which items of evidence and hypotheses are expressed will typically change the con…rmation relations between them." This is the point at which MB loses its normative and descriptive edge. Does MB's loses translate to CB's gains? The answer is negative. Both MB and CB are equally hurt by the presence of large and evolving worlds since in both contexts, frequent changes in the prior probabilities P I B 0 and P I B ;I S t , respectively are called for. Such changes invalidate, in both contexts, many of the most interesting Bayesian theorems, such as the celebrated "washing out the priors" result. The important point that we wish to stress is that once MB becomes unrealistic (or outright infeasible), AA (together with other potential types of probabilistically incoherent behviour) becomes plausible.
It is worth emphasizing that standard Bayesianism, in either MB or CB form, does not allow for the formation of new hypotheses after the prior probability function has been formed. Instead, it assumes that the agent is able to think in advance of all the possibilities in the hypothesis space; in short, the agent is assumed to be "logically omniscient". Logical omniscience is considered by many critics as the Achilles heel of Baysianism: it implies an all-knowing agent who is able to track down all the logical implications in L. Obviously, the extent to which such an assumption is unrealistic depends on the degree of richness of the underlying language L. For simple languages, it is reasonable to assume that the agent can comprehend the few logical entailments that these languages contain. And this explains why we entertain the view that MB is a realistic strategy in small worlds.
The main points of the paper are the following: (i) An MB agent operating in small worlds does not exhibit AA. (ii) AA may be thought of as a (potential) property of large worlds, because in such worlds MB is likely to be infeasible. (iii) The motivation of AA in the form of examples such as Ellsberg's classic urn-based or Schmeidler's coin-based ones is clearly poor, since these examples are textbook cases of small worlds (or elementary languages), in which a (normatively preferable) MB strategy is attainable.
The paper is organized as follows: Next section describes Schmeidler's twocoin thought experiment that is supposed to produce AA. It also de…nes formally two of the central concepts of the present paper, namely MB and CB. Section III is an analytic discussion of the arguments for MB mentioned above. It also elaborates on the connection between MB and small worlds, arguing along the lines presented above, for the e¢ ciency of counterfactual probabilistic reasoning in the case that the number of probabilistic hypotheses under consideration is small. Section IV, demonstrates formally the absence of AA under MB reasoning within Schmeidler's two-coin example. Section V concludes.
2 Schmeidler' s Two-Coin Example, and Modern versus Classical Bayesianism Schmeidler (1989) illustrates the points mentioned above by means of the following coin example, which aims at conveying the same message with Ellsberg's (1961) two-urn paradox: Assume that the agent S considers two coins A and B. S knows with certainty that the probability of "heads" for A (H A ) is 0.5. On the other hand, she has no such information about B. The two coins are about to be tossed and S has the option to bet either on A-related events or B-related events. Speci…cally, she may choose bet A in which she earns 1$ if the event occurs and looses 1$ if the event although she is not informed about the probabilities P (D B H ) and P (D B T ). However, although she does not know the exact values of P (D B H ) and P (D B T ) she "feels" that (given she is less willing to bet on D B H than D A H and on D B T than What causes the violation of the additivity property in P ? As already mentioned, it is the presence of asymetric information, namely speci…c information (I S ) about the chance properties of coin A and the lack of similar information for coin B that is responsible for the no-standard properties of P . To be more speci…c, it is not merely the presence of I S that caused the non-additivity of I S , but rather the fact that the agent allowed I S to a¤ect directly her probabilistic beliefs, instead of utilizing I S indirectly by conditionalization. But in order to be able to conditionalize on I S ; a pre-existing vehicle is required, namely a probability function, prior to the acquisition of I S : How can such a probability function may be obtained? Simply by the MB counterfactual strategy outlined above, namely by assuming that the agent did not know I S and contempalted what her probabilistic beliefs would have been in such a counterfactual epistemic state.
After a rather long discussion involving the concepts of MB and CB, it is now time to de…ne these concepts more formally. To this end, assume an agent, who started her investigations at time t, having both I B and I S at her disposal. The agent is interested in updating her beliefs about the proposition/event A in the light of new evidence E obtained at t (or t + to be more precise). A CB agent carries out this task as follows: Classical Bayesianism (CB): On the other hand, the MB strategy advises the agent to pursue the following two-step procedure: First, "travel back" to your epistemic history and identify the point, t 0 ; in which I S was not yet available (although it was conceivable). Second, once this point was found, evaluate the prior that you would have formed at this point (counterfactual or hypothetical prior) and use it from this point onwards as your actual prior.
Modern Bayesianism (MB): Easwaran (200? is supposed to play: "A rational subject's credences are …xed by her hypothetical priors and her total evidence. A subject's credences are represented by a dynamic probability function, a function that changes with her evidence. A subject's hypothetical priors are represented by a static probability function, a function that encodes her disposition to respond to evidence. (Hypothetical priors are called 'priors' because they can be thought of as a rational subject's original credences in possibilities, prior to the receipt of any evidence, and 'hypothetical' because it is unlikely that one ever was in such a state.)" (2003? sleeping beauty paper -emphasis added). Along the same lines, Titelbaum and Kopec (200?), refer to the concept of hypothetical prior as follows: "We can think of an agent's hypothetical prior as representing her "evidential standards" -antecedent to the in ‡uence of any contingent evidence, the hypothetical prior encodes how an agent would respond to any package of evidence she might encounter, and which bodies of evidence she would take to support which hypotheses." (200?, pp. 5 emphasis added).
The distinction between actual versus hypothetical priors in the context of Schmeidler's two-coin example may be described as follows: Assume that at the time, t, in which the desire for a speci…c investigation or bet emerged, (for example the time at which the agent decided to bet on D A H ), speci…c information I S already existed (for example, the coin A has already been ‡ipped …ve times, with the results being H A ; H A ; T A ; H A ; T A ). What is the agent advised to do in this case? Should she form her priors by taking I S into account? Or should she "travel back in time" and ask herself "what would have my priors been at time t 0 when I S was not available?" MB responds negatively to the …rst question and positively to the second.
Based on the above MB recommendation, the following pressing question presents itself: Why should the agent take the burdensome and imagination-stretching counterfactual route (3) in forming her current beliefs, while the alternative actual (and intuitively appealing) course (2) is readily available? Put di¤erently, why should we require the agent to ponder her subjective probabilities under an epistemic state which is not her actual state of knowledge but rather a counterfactual one? There are at least three reasons that compel an agent to follow the counterfactual path: These reasons are analyzed in detail in the next Section.

Arguments for Modern Bayesianism (MB)
Before we present our arguments in favor of MB over CB, it is worth emphasizing that despite the term "modern", MB is not a form of Bayesianism that was put forward only recently. Instead, several authors from the 1970s entertained the idea of "hypothetical priors", that is "priors without evidence". For example, Hesse (1975) suggests that the agent's prior probability should include only "previous background evidence which does not enter the current calculations" (Hesse, 1975 volume pp 53 -emphasis added). She then goes on to explain the nature of the prior probability: "It is very important to stress that all this (the prior probability) refers only to possibilities, none of which are yet realized at t 0 . It refers to what the decision-maker should be prepared to choose in hypothetical circumstances before more evidence is collected. It describes the static situation at t 0 in terms of a certain probability distribution satisfying the probability axioms." (1975, pp. 64, emphasis added).
Teller (1975 same book) argues that in order to be able to conditionalize on I S (the relevant evidence) an initial probability function, which does not depend on I S , is required in order for the process of updating beliefs to get started: "However, for a Bayesian theory fully to characterize the degrees of belief which arise by conditionalization, the theory must specify the belief function from which to start. This initial function is called the prior probability function" (1975, pp. 168-169).
Earman (1992) also emphasizes the need of a starting point in order for the conditionalization process to be operational: "...an agent begins as a tabula rasa, chooses her priors, and forever after changes her probabilities only by conditionalization." (1992, pp. 139-140 original emphasis).
After this brief (and by no means exhaustive) survey we can now proceed in analyzing the arguments in favor of Modern Bayesianism.

Permanent Dispositions versus Momentary Inclinations as Determinants of Rational Degrees of Belief.
What should the basic probability concept be in the development of rational decision theory? Prior or Current Probability? To answer this question we must think of what personal probabilities should re ‡ect if they were to be rational: the person's permanent dispositions for forming beliefs (on the basis of the available evidence) or merely her momentary inclinations at some given point in time?
Rudolph Carnap argues strongly in favor of the …rst option. Indeed, his monumental work on inductive logic (Carnap 1950) is based on the concept of hypothetical or counterfactual initial credence function that can be ascribed to the agent X, before the collection of any evidence. More speci…cally, Carnap (1962) considers a sequence of data E 1 ; E 2 ; :::; E n obtained by the agent X up to the present time T n (in Carnap's notation). He also de…nes K n to be the "total observational knowledge" of X at T n , that is Then Carnap contemplates X's epistemic state at time T 0 in which X posseses no observational knowledge at all: "Now consider the sequence of X's credence functions. In the case of a human being we would hesitate to ascribe to him a credence function at a very early time point, before his abilities of reason and deliberate action are su¢ ciently developed. But again we disregard this di¢ culty by thinking either of an idealized human baby or of a robot. We ascribe to him a credence function Cr 1 for the time point T 1 ; Cr 1 represents X 0 s personal probabilities based upon the datum E 1 as his only experience.
Going even one step further, let us ascribe to him an initial credence function Cr 0 for the time point T 0 before he obtains his …rst datum E 1 . Any later function Cr n for a time point T n is uniquely determined by Cr 0 and K n : For any H; where Cr 0 0 is the conditional function based on Cr 0 " (1962, pp. 310, emphasis added). Is Cr 0 actual or counterfactual? Carnap explicitly allows the initial credence function to be hypothetical: He …rst raises the question: "How can we understand the function Cr 0 ?" The answer to this question depends on whether the agent X is a robot or a human being. In the …rst case, Cr 0 is robot's actual credence function at T 0 : With respect to the second (more relevant case) Carnap's answer is as follows: "In the case of a human being X, suppose that we …nd at the time T n his credence function Cr n . Then we can, under suitable conditions, reconstruct a sequence E 1 ; E 2 ; :::; E n , the proposition K n ; and a function Cr 0 such that (a) E 1 ; E 2 ; :::; E n are possible observation data, (b) K n is de…ned by (4), (c) Cr 0 satis…es all requirements of rationality for initial credence functions, (d) the application of (5) to the assumed function Cr 0 and K n would lead to the ascertained function Cr n : We do not assert that X actually experienced the data E 1 ; E 2 ; :::; E n , and that he actually had the initial credence function Cr 0 , but merely that under idealized conditions, his function Cr n could have evolved from Cr 0 by the efect of the data E 1 ; E 2 ; :::; E n :" (1962, pp. 310, emphasis added) Why does Carnap identify the initial credence function Cr 0 , instead of the current credence function Cr n , as the basic concept upon which (any) rational decision theory must be built? The answer to this question is based on the distinction between the two competing concepts that drive a person's (say X) degrees of belief mentioned above, namely: (i) X's momentary inclination at time T and (ii) X's permanent disposition to believe. Carnap argues that what we are interested in, in developing rational decision theory is the second concept, that is a "trait" of X's "underlying permanent intellectual character". This disposition is best re ‡ected on X's initial credence function, that is X's degrees of belief prior to the acquisition of any evidence. Any change in X's beliefs, due to the emergence of new evidence, E, should be based on Cr 0 ( j E): Such a conditionalization ensures that X's current credences will also be driven by X's permanent disposition for forming beliefs on the basis of the received evidence. To this end, Carnap argues as follows: "When we wish to judge the morality of a person, we do not simply look at some of his acts, we study rather his character, the system of his moral values, which is part of his utility function. Single acts without knowledge of motives give little basis for a judgement. Similarly, if we wish to judge the rationality of a person's beliefs, we should not simply look at his present beliefs. Beliefs without knowledge of the evidence out of which they arose tell us little. We must rather study the way in which the person forms his beliefs on the basis of evidence. In other words, we should study his credibility function, not simply his present credence function." (1962, pp. 312, emphasis added).

Modern Bayesianism and the Chance-Credence Relationship: Avoiding Inconsistencies
One of the arguments for MB is akin to a fundamental relation between objective probability (chance) and subjective probability (credence) known as the Principal Principle (PP). PP, originally suggested by David Lewis (1970?), aims at capturing the following intuitive idea: If the agent S knows (with certainty) that the objective probability of the event (proposition) A, Ch t (A); is x and does not possess any inadmissible evidence for A (more on that below), then S's subjective probability of A must be set equal to x. The question which naturally arises is how to describe formally the aforementioned relationship. One option is to use the current credence function P I B ;I S t : In such a case, PP takes the form: It must be noted that Lewis never gave a precise de…nition of "admissibility". The standard interpretation of this concept is "that evidence is admissible if it is not relevant to the outcome of the chance event in question. As a rule of thumb, he (Lewis) takes information about the past to be admissible, and information about the future to not be admissible." (Meecham 2007, pp. 18). The …rst version of PP, given by (6), may be interpreted in the light of CB, as follows: The agent S, being at time t, has at her disposal I B and I S : Being a CB, the agent uses both these items of information to generate her "unconditional" probability P I B ;I S However, if in addition to E the agent learned that the chance of A is x, then the latter piece of information would dominate the formation of her beliefs, in the sense that < Ch t (A) = x > screens-o¤ any admissible evidence, E, for A, thus yielding (6). Hence, her new (updated) probability of (A) is given by, A second option for the agent is to operate as an MB, thus treating I S not as actual but rather as countergactual. As a result, she employs the initial prior probability function P I B 0 , instead of the current one, thus yielding the following version of PP: What is the restriction that (8) imposes on S's current credences? Put differently, assume S is currently at period t, in which she has come to know By comparing (7) with (9), we form the impression that it makes no di¤erence to the agent's updated probability of A, whether she acts as a CB or MB. However, Meecham (2007) argues that this is not the case. Speci…cally, he argues that (6) as oposed to (8) leads to contradictions. To understand why, we must …rst emphasize the di¤erence in the role that I S plays in (6) versus (8): In the former case, I S is not conditioning information; it is rather a direct determinant of the agent's current probability function P I B ;I S t : As such, it is not covered by the admissibility clause -only conditioning information may (or may not) be admissible. Now, let us consider the special case in which A is included in I S : Since admissibility is not relevant, we cannot claim that A is inadmissible. Hence, (6) dictates that the agent's subjective probability of A is x < 1. But on the other hand, this probability should be equal to one, since A is known to the agent. Hence, we have run into the following contradiction: On the other hand, the MB formulation (8) of PP does not su¤er from this problem. Indeed, since I S is conditioning information, the admissibility clause applies. Hence, given that we have assumed that A is included in I S , the latter becomes automatically inadmissible, In such a case, PP loses its force which in turn implies that there is nothing to cause the agent's probability of A to be set equal to x < 1: The preceding analysis has shown that if the agent wishes to secure herself against the possibility to run into contradictions, she must subscribe to MB rather than CB. In a more abstract sense, the purpose of PP is to impose constraints on rational belief in addition to coherence. These constraints aim at bringing subjective beliefs in line with objective features of the world, namely objective probabilities (in the case that they are known). Are there any other theories of rational beliefs that impose constraints on subjective probabilities over and above coherence. The answer is a¢ rmative. Carnap's system of inductive logic analyzed in the previous subsection is one such case. Of course, Carnap's constraints are quite di¤erent than those implied by PP. Speci…cally, these constraints are introduced in the form of the so-called axioms of symmetry, regularity and convergence. Regardless of their di¤erences, however, both types of constraints should be applied to the prior rather than current probability function, because only for the prior, as Carnap puts it "can we …nd a su¢ cient number of requirements of rationality" (1962, pp. 312). Obviously the extent to which the prior probability is preferable to the current probability for both formal and conceptual reasons, determines the extent to which MB is preferable to CB.
We now turn our attention to the third argument that we propose in favor of MB, namely how to treat the problem of "old evidence".

Modern Bayesianism and the Problem of Old Evidence
The "problem of old evidence" concerns the role of evidence, e, that was already known at the time of the formation of a scienti…c theory, T , (hence, it is old) for the con…rmation or corroboration of T . This problem highlights a con ‡ict between the (standard) Bayesian theory of con…rmation (according to which e does not increase our con…dence in T ) and the standard scienti…c practice which assigns to e an important con…rmatory role. The problem was …rst presented by Glymour (1980) who described it as follows: "Scientists commonly argue for their theories from evidence known long before the theories were introduced. Copernicus argued for his theory using observations made over the course of millenia.... Newton argued for universal gravitation using Kepler's second and third laws, established before the Principia was published. The argument that Einstein gave in 1915 for his gravitational …eld equations was that they explained the anomalous advance of the perihelion of Mercury, established more that half a century earlier.... Old evidence can in fact con…rm new theory, but according to Bayesian kinematics it cannot. For let us suppose that evidence e is known before theory T is introduced at time t. Because e is known at t, P rob t (e) = 1. Further, because P rob t (e) = 1, the likelihood of e given T , P rob t (e; T ), is also 1. We then have: P rob t (T; e) = P rob t (T )P rob t (e; T ) P rob t (e) = P rob t (T ) The conditional probability of T on e is therefore the same as the prior probability of T : e cannot constitute evidence for T .... None of the Bayesian mechanisms apply, and if we are strictly limited to them, we have the absurdity that old evidence cannot con…rm a new theory." (1980, pp. 85-86).
Bayesian philosophers have responded to the problem of old evidence in (mainly) two di¤erent (and incompatible) ways. The …rst response, put forward by Garber (1983), Niiniluoto (1983 and Je¤rey (1983) goes as folllows: What makes the scientist to increase her con…dence in T , is not e per se since this information was already known to her at the time she invented her theory. Rather it was the realization of the logical entailment relationship that made the scientist to increase her con…dence in T: For example, "Einstein discovered only after writing down the equations of General Relativity that they entailed the anomalous perihelion advance of Mercury." (Howson, 1991, pp. 553). Hence, the increase in the agent's degree of belief in T comes from conditionalization on the entailment relationship (10) itself rather than conditionalization on just e. The proposition (10) at t 0 was uncertain, in the sense that the agent had not established/proved the validity of the postulated entailment relationship, i.e. P 0 (T`e) < 1; in which case the realization of (10) at t results in an increase in probability of T: Note that this line of defence is based on the assumption that the agent conditionalizes on the entailment (10), which in turn implies that at t 0 , the agent had already assigned a prior probability to this proposition, i.e P 0 (T`e) = p, with p < 1: This means that at t 0 the agent was not aware of the truth of (10) (which truth she discovered only later) which in turn implies that at t 0 the agent had assign a probability less than one to a logical truth. This, however, means that P 0 is not coherent. Hence, this line of defence against the problem of old evidence leads to another problem (potentially more serious) which is incoherence of the agent's probability function steming from logical non-omniscience (more on this below). As will be discussed below, Garber's (1983) solution to the problem of old evidence is based on the idea of weakening the axioms that P 0 has to obey in order to allow for logically non-omniscient rational agents (see, also Hacking 1967 for similar ideas outside the "old evidence" framework). The second argument for solving the problem of old evidence is more in the spirit of MB since it is based on a following counterfactual "what-if" strategy. Garber (1983) portrays this strategy as follows: "One obvious response might begin with the observation that if one had not known the evidence in question, then its discovery would have increased one's degrees of belief in the hypothesis in question. That is, in the circumstances in which e really does con…rm h, if it had been the case that P (e) < 1, then it would also have been the case that P (h=e) > P (h)." (1983, pp. 103, emphasis added).
In our set up, the solution outlined above suggests that the agent should not evaluate her probabilities relative to I = I B [ I S but only relative to I B : This is exactly the proposal o¤ered by Howson (1991) in his attempt to solve the problem of old evidence. Indeed, he explicitly states that the probabilities "should always be relativized to K feg" (1991, pp. 548, with K and e in Howson's notation corresponding to I B and I S ; respectively). But, as analyzed in Introduction, this is exactly the point at which MB and CB di¤er. Hence, choosing MB over CB might be motivated by the fact that the …rst strategy fares better than the second in cases of "old evidence".

Modern Bayesianism and Psychological Detachment from Actual Evidence
Let us now turn our attention to the fourth reason for prefering the counterfactual MB strategy over the actual CB one. This reason is akin to the possibility that the agent evaluates the entailment relationships between theoretical and evidential statements in a more unbiased way when the evidence I S is hypothetical than when it is actual. For example, the probability that I assign to the hypothesis H A : "I shall live for at least another …ve years" conditional on the actual event B: "I am diagnosed with aggressive lung cancer" is likely to be bigger than the probability that I would have assigned to H A if the event B were not actual but just a member of a set of many hypothetical events including the event non-B. In this example, the actual knowledge of the unfortunate event B at time t (now) is likely to create psychological pressure to distort the partial entailment relationship between H A and B, that I was willing to accept at t 0 ; thus yielding The argument presented above suggests that the subjective evaluation of probabilities should be made, as Longino (1979, pp. 35) put it, "in rational, cool, moments". As already mentioned, such a "cool" and "imparial" agent is referred to as MB in the present paper. As will be shown in the next section, it is the opposite case, namely the one in which the speci…c information is not treated as conditioning information but rather as a direct determinant (on a par with background information) of the prior probability function (referred to as CB in the present paper) that generates the AA paradox.

Objections: Logical Non-Omniscience. Local vs. Global Bayesianism and Small vs. Large Worlds
The arguments presented above suggest that MB is the optimal strategy that must be followed by an agent who is in the process of forming her subjective probability function at t (now). For reasons of clarity, this strategy is summarized below. The agent is currently at time t, knowing both I B and I S : The agent is advised to "pretend" that she does not know I S and perform the following thought experiment: Go back in time at period t 0 in which only I B was available. At t 0 ; the background information I B contains the full set of hypotheses H that the agent actually entertained at t 0 ; the full set of evidential sentences E that the agent conceived as possible at t 0 , (every conceivable possible course events after t 0 ) as well as all the entailment relationships between elements of H and elements of E (e.g. H i`Ej ). More speci…cally, each member of E describes the observations that have not been actually made at t 0 but are considered by the agent at t 0 as possible to make at the future period t. At t 0 the agent's subjective probability function P 0 was de…ned over the …eld of propositions L (which includes H and E). Assume that at t, the particular element E 2 E is realized, which implies that the agent at t has this speci…c information, i.e. I S E: According to the MB counterfactual strategy, the agent's new probability function, P t ; at t should be her old probability function P 0 conditional on E. As a result, her unconditional new probability for the uncertain event A (that will occur at t+1) should be set equal to the conditional probability P 0 (A j E) rather than being determined "from scratch" as P t;E (A): Is the above counterfactual strategy without any problems? The answer is negative. The main objections raised against this strategy revolve around the concept of logical omniscience (or its lack thereof). Speci…cally, observe that in the description of the counterfactual strategy presented above, we de…ned H and E as the sets of theoretical and evidential statements, respectively conceived by the agent at t 0 : However, Bayesianism assumes something signi…cantly more than that. It assumes that the agent knows all the possible hypotheses, H ; all the possible evidential statements E and all the possible entailment relationships between H and E ; in other words the agent is assumed to be logically omniscient and her prior probability function, P 0 , is de…ned over the …eld L of all propositions, which is (in…nitely) richer than L. Why does Bayesianism require such an extreme condition? The answer is to ensure coherence of P 0 . Speci…cally, if we wish (as Bayesianist do) to ensure that P 0 is a proper probability measure then we must establish that to each logical consequence p`q the agent assigns probability equal to one, that is 8(p`q); P 0 (p`q) = 1: This is because if p logically entails q; it does so in every possible state of the world. Hence, it appears that coherence requires that the agent is capable of tracking all logical consequences at t 0 . In other words, the agent is not allowed to be unaware even of a single logical truth that exists within the underlying language L: As will be shown below only under this extreme form of logical omniscience the aforementioned MB counterfactual strategy is expected to work under all epistemic states that the agent may entertain at t.
Logical omniscience of the form described above is a condition that is usually met within the variant of Bayesianism, known as Global Bayesianism (GB). Garber (1983) de…ned GB as follows: "One popular conception of the Bayesian enterprise is what I shall call global Bayesianism. On this conception, what the Bayesian is trying to do is build a global learning machine, a scienti…c robot that will digest all of the information we feed it and churn out appropriate degrees of belief. On this model, the choice of a language over which to de…ne one's probability function is as important as the constraints that one imposes on that function and its evolution. On this model, the appropriate language to building into the scienti…c robot is the ideal language of science, a maximally …ne grained language L, capable of expressing all possible hypotheses, all possible evidence, capable of doing logic, mathematics, etc. In short, L must be capable, in principle, of saying anything we might ever …nd a need to say in science." (1983, pp. 110 emphasis added).
In what respect, does the MB strategy su¤er in the case of an agent whose prior probability function is P 0 rather than P 0 ? This question is equivalent to the following one: "Why is GB not realistic?". The problem is the following: Assume that at t the agent comes across a piece of evidence E that she had not conceived of as possible at t 0 : Assume that this piece of evidence is the agent's speci…c information (I s ) obtained at t. According to MB, the agent should go back to t 0 and conditionalize on E in order to generate her new probability function at t: However, an unpleasant surprise awaits the agent: At t 0 there is no probabilistic assignment on E simply because E was not in the domain of P 0 at t 0 : To make things even worse, assume that E was obtained at t because a new hypothesis, H ; was put forward at t which motivated the observations (or experiments) that led to E : Obviously, H was not in the domain of P 0 either. For example, consider as E and H the "de ‡ection of light by the sun" and "Einstein's general relativity theory", respectively. If a phycist of the late nineteenth century was asked to assign his personal probability to the event that the light is de ‡ected by the sun he would have been caught by surprise. His language, L at t 0 was simply not rich enough to accomodate either E or H (let alone the entailment H `E ). In such a case, counterfactual conditionalization seems impossible and the whole MB strategy runs into trouble.
The preceding analysis suggests that the MB counterfactual strategy is unrealistic to the same extent as GB, (as a description of the epistemic state of real people) is. This generates the following question: If the MB strategy is unrealistic within the GB framework, then is it possible that there is another framework, say it Local Bayesianism (LB), within which the MB strategy is likely to work? Before we answer this question, let us …rst describe, following Garber (1983) how LB might be de…ned: "Typically when scientists or decision makers apply Bayesian methods to the clari…cation of inferential problems, they do so in a much more restricted scope than global Bayesianism suggests, dealing only with the sentences and degrees of belief that they are actually concerned with, those that pertain to the problem at hand. This suggests a di¤erent way of thinking about the Bayesian learning model, what one might call local Bayesianism. On this model, the Bayesian does not see himself as trying to build a global learning machine, or a scienti…c robot. Rather, the goal is to build a hand-held calculator, as it were, a tool to help the scientist or decision maker with particular inferential problems. On this view, the Bayesian framework provides a general formal structure in which one can set up a wide variety of di¤erent inferential problems. In order to apply it in some particular situation, we enter in only what we need to deal with in the context of the problem at hand, i.e., the particular sentences with which we are concerned, and the beliefs (prior probabilities) we have with respect to those sentences." (1983, pp. 111, emphasis added).
More formally, assume that the investigator is interested in the set of spe-ci…c hypotheses (theoretical propositions) H i ; i = 1; 2; :::; n that cover full set of possibilites for the problem at hand (e.g. the coin is fair, the coin is biased towards H, the coin is biased towards T, etc). Related to the aforementioned theoretical propositions are the set of evidential propositions E j ; j = 1; 2; :::; m which are relevant for H i (e.g the outcome of the next toss is H, the outcomes of the next two tosses are H,T etc). No other proposition enters the agent's "problem relative language" L that is relevant for the speci…c problem at hand. More speci…cally, Garber (1983, pp. 111) de…nes L to be "just the truth-functional closure" of the H i , E j and propositions of the logical entailment between H i and E j ; that is propositions of the form "H i`Ej ". As a result, the agent's prior probability function is de…ned over the "local" more modest language L rather than the "global" ideal language L . This in turn implies that it is much easier for the agent to track all logical consequences "H i`Ej " (for some i and j) within L than within L , thus making local Bayseniasm more realistic than global Bayesianism (Bayesianism with a human face, as Je¤rey (1983) put it).
The description presented above makes clear that LB refers to well-de…ned, small-scale cases (characterized by a small number of hypotheses, evidential propositions and logical relations) which are usually referred to as "small worlds". In such a world, it is quite plausible to assume that the agent can consider all the possibilities (hypotheses, evidential propositions and their entailment relations) that are relevant for the problem in hand at t 0 . In such a case, no surprises await the agent at t, in the sense that his epistemic state at t is identical to that at t 0 : Binmore (2009) de…nes a small world as one "within which all potential surprises have been predicted and evaluated in advance of their occurrence." (2009, pp 8).
What are the implications of small worlds for the e¤ectiveness of the MB counterfactual strategy? Binmore (2009) comments on this question as follows: "Only in a small world, in which you can always look before you leap, is it possible to consider everything that might be relevant to the decisions you take." (2009, pp. 139 emphasis added). Indeed the "look before you leap" proverb is attributed to Savage who used it as antithetical to "cross that bridge when you come to it" that referred to the so called "large worlds". It is worth mentioning that Savage himself made quite clear that his own conception of subjective probability together with its axiomatization is relevant only for small worlds. This is because that Savage's framework is essentially static in the sense that it does not allow for the so-called "concept formation", that is the formation of a new hypothesis or a new idea sometime in the future. Commenting on Savage's approach, Suppes (1966) observes: "The important thing I wish to emphasize is that the theory provides no place for the decision-maker to acquire a new concept on the basis of new information received. The theory is static in the sense that it is assumed the decision-maker has a …xed conceptual apparatus available to him throughout time." (1966, pp. 21).Savage himself acknowledged the fact that the static nature of his theory makes it inapplicable in the case of large evolving worlds by referring to such an extention as "ridiculous" and "preposterous". Other personalists have suggested a similar course of action. As Teller (1975) observes "...personalists recommend that one simply ignore states, actions, and consequences, the possibility of which have not occurred to the agent or which seem, prima facie, to be irrelevant to the problem." (1975, pp. 170, emphasis added).
The aforementioned discussion may be summarized as follows: MB and CB refer to the agent's strategy or course of action in forming her probability function at t. GB and LB refer to the agent's epistemic status at t 0 as re ‡ected in the contents of her background information I B or her language, (L or L ): Whether MB or CB is appropriate depends on which epistemic status, GB or LB characterises the agent. Obviously there are four combinations for the agent to consider in her attempts to solve the problem of forming her subjective probability at t: (i) MB-GB: Ideal but Unrealistic (Infeasible) Solution.
(ii) MB-LB: The Appropriate Solution in Small Worlds.
(iii) CB-GB: A Feasible Solution in Large Worlds but with the kind of problems analyzed in the previous sub-sections.
(iv) CB-LB: Inappropriate or Ine¢ cient Solution. How does the above classi…cation bear upon AA? The preceding discussion has made clear that the presence of information asymmetry between F k and F 0 k is only a necessary condition for AA. A further condition is that the problem in hand falls into the category of large worlds. Put di¤erently, asymmetric information in small worlds is not capable of producing AA. Let us clarify this argument, which forms our basic thesis in this paper. Asume that the agent experiences an information asymmetry at t of the type discussed above, that is at t she has information, I S ; about the objective probabilities of the events in F k but she lacks similar information about the events in F 0 k . If the world is small, the agent can go back in time at t 0 ignore I S and assign probabilities in F k [ F 0 k in an informational-symmetrical way. To this end, the simplicity of the problem makes the implementation of the "principle of indi¤erence" or its successor, namely the "maximum entropy principle" perfectly possible. Once her prior probability is thus determined (counterfactually at t 0 ), the agent is allowed to bring I S back to the picture by conditionalizing on it. In such a case, the only way for the agent to produce AA is to deviate at t from her probabilistic commitements at t 0 , that is by exhibiting dynamic inconsistency. Interestingly, all the cases motivating AA that have appeared in the literature (including the original Ellsber's paradox as well as Schmeidler's two coin example) are "textbook cases" of small worlds. Hence, although we do not claim that AA cannot arise in large worlds, (because in such a context the counterfactual MB strategy does not work) we do argue that AA has been poorly motivated.

Small Worlds, Counterfactual Strategy and Ambiguity Aversion
Let us now analyze Schmeidler's two-coin example discussed in the introduction in the light of the analysis presented in the previous section. To do this, we must …rst describe the agent's epistemic background at t 0 : For simplicity we assume that concerning the coin A, the agent knows with certainty that one of only the following three hypotheses is true:

favors Hg
H A 3 = fCoin A favors Tg: To make things even simpler assume that in the case of H A 2 the objective probability of H is equal to 0.6, whereas in the case of H A 3 the objective probability of T is 0.6. Put di¤erently, the agent knows at t 0 that the objective probability distribution for coin A is either P A 1 or P A 2 or P A 3 . Similar assumptions are made for coin B, namely the agent knows that one of the following three hypotheses is true, favors Tg which give rise to the objective probability distributions P B 1 , P B 2 ; and P B 3 ; respectively (with P B 2 (H) = P B 3 (T ) = 0:6). Now, the agent has to decide about her prior probabilities of the hypotheses mentioned above. Since there is no direct information supporting one hypothesis over the rest, the agent is likely to subscribe the principle of indi¤erence (or the maximum entropy principle), according to which Alternatively, the agent might consult her past experience on similar cases (part of her background information) which suggests that it is more often for someone to encounter cases of fair coins than not. In any case, the important thing to notice is that there is no speci…c information at t 0 that causes an informational asymmetry between the set of hypotheses concerning coin A and that of coin B. Let us now calculate the probabilities that the agent assigns to the events/propositions D A H ; D A T and D B H ; D B T de…ned in the introduction. Assuming a minimal degree of probabilistic sophistication on the part of the agent, we have, = 0:5 1 3 + 0:6 1 3 + 0:4 1 3 = 0:5: In a similar fashion, we obtain, P 0 (D A T ) = 0:5 P 0 (D B H ) = 0:5 and P 0 (D B T ) = 0:5: So far so good, Now assume that at time t; t > t 0 the agent acquires an important piece of speci…c information for the problem in hand. In particular she is given a set of data, E 1 A ; consisting of the results of a very long series of tosses of coin A for which (almost) half of them are H.
The Principal Principle implies that (knowledge of the objective chance screens o¤ any admissible evidence) : Furthermore, given that the evidence E 1 A is assumed to be arbitrarily large, A ) = 0: The above relationships together with (11) imply that P t (D A T ) = 0:5: Let us now turn our attention to the events, D B H and D B T (related to coin B) and examine how the acquisition of E 1 A a¤ects their probabilities.
Now the crucial question is the following: Does the evidence on coin A a¤ect the prior probabilities attached to coin-B related events? Assuming independence between the sets H A = fH A 1 ; H A 2 ; H A 3 g and H B = fH B 1 ; H B 2 ; H B 3 g the answer is negative. Hence, Finally, the above relationships imply The important thing to notice is that the agent's new set of beliefs, P t ; obtained at t (after the speci…c information E 1 A has arrived) is coherent, that is it is a proper probability functions. More speci…cally, no violations of the additivity property, such as those discussed in the introduction, are observed.
Why did the preceding analysis not produce AA? Put di¤erently, why did the acquisition of E 1 A not make the agent to end up with a system of beliefs having the property P t (D B H ) < P t (D A H ) and P t (D B T ) < P t (D A T ) which would imply an ambiguity averse, probabilistically non-sophisticated agent? The answer is that the analysis presented above implies that the agent follows the MB counterfactual strategy in forming her probability function P t at time t. Indeed, the key point is to focus on how the agent formed her new probability function P t once she became aware of E 1 A : Speci…cally, the agent did not use E 1 A as a direct determinant (on a par with background information) of P t : Instead, she "went back in time" at point t 0 examined the probabilistic commitements, P 0 ; that had made at that point (in an epistemic state of not knowing with certainty that E 1 A has occured) and stuck to those commitements at t. In this setting, the only way for the agent to exhibit AA is to abandon her commitements made at t 0 and assign di¤erent probabilities at t, thus exhibiting dynamic inconsistency.
The preceding analysis implies that the only way for AA to emerge without causing dynamic inconsistency is to eliminate P 0 from the scene so that there won't be any basis for comparing the probabilistic comitements at t 0 with those adopted at t. Eliminating P 0 means that at t the agent realizes that she faces a new reality that had not (and could not have) anticipated at t, which obliges her to re-evaluate her full set of probabilistic beliefs (to form a new prior). Teller (1973) gives an example of when such a strategy is required: "Examples are the wildcatter's problem of where to drill an oil well and the manufacturer's problem of how much to produce. The personalists advise the wildcatter and the manufacturer to estimate their initial degrees of belief subjectively for any one such problem, but if a similar problem arises years later under considerably di¤ erent circumstances they are advised to make new evaluations rather than to try to conditionalize their old estimates on the vast body of intervening observations of uncertain relevance to the problem." (1975, pp. 170, emphasis added) However, the two-coin case analyzed above is quite di¤erent from the wildcatter's problem. Consider the following question: Is there any reason for the agent to re…ne her probabilistic judgements at t, because at t she came across a piece of surprising information, that is information that she could not have conceived back at t 0 ? The answer is negative. The information that the objective probability distribution of coin A is 50-50 is not surprising information at all. In fact the agent had anticipated such a case as indicated by the inclusion of H A 1 in agent's epistemic language L at t 0 . As a result, the MB strategy is perfectly feasible; in fact it follows quite naturally.

Conclusions
(In Bullet Points ) 1) There are two ways to view speci…c information, I S ; as a determinant of the agent's subjective distribution, P t ; at time t: (i) As conditioning information. This pre-supposes the existence of a subjective distribution, P 0 ; prior to receiving I S ; which plays the role of the vehicle through which I S a¤ects P t ; via P t ( ) = P 0 ( j I S ): (ii) As background information on a par with any other background information I B : In this case, there is no P 0 prior to t, and P t plays the role of the prior distribution for determining the posterior distributions for times after t, that arise through the acquisition of future information I f ; via P t+1 ( ) = P t ( j I f ): 2) Are there any reasons, theoretical or practical that favor one interpretation over the other? The answer is yes. There are good reasons, analyzed in detail in Section II, for prefering the …rst interpretation (MB) over the second (CB).
3) Does the choice of MB versus CB have implications for the main issue under scrutiny in the present paper, namely AA? The answer is yes. If we choose MB there is no room for AA. Being ambiguity averse in such a framework is equivalent to being dynamically inconsistent. On the other hand, under CB, AA can arise because only when the prior distribution is formed at the same time at which I S arrives informational asymmetries of the type discussed in the introduction can emerge. 4) Can we always choose MB over CB? The answer is negative. MB is applicable only in small worlds. In large worlds, MB may not be operational especially when the agent's epistemic state at the time that I S arrives is considerably richer than the one at the time prior to the arrival of I S : 5) Hence, AA may be thought of as a (potential) property of large worlds. To this end, the motivation of AA in the form of examples such as Ellsberg's classic urn-based or Schmeidler's coin-based ones is clearly poor since these examples are textbook cases of small worlds.