How to select observers

A number of problems in physics, mathematics, and philosophy involve observers in given situations which lead to debates about whether observer-speciﬁc information should affect the probability for some outcome or hypothesis. Our purpose is not to advocate for such observer selection effects but rather to show that any such effects depend greatly on the assumptions made. We focus on the debate about the existence of a “doomsday effect”—whether observer index information should cause one to favor possibilities with fewer observers, which has been argued to have implications for models of cosmology. Our central goal is to reconcile the apparent inconsistencies in the literature by introducing a formalism to lay bare assumptions made and address a key issue that has not been clearly articulated in such problems: whether the observer is selected by picking from or being in a set of worlds. In the former there generally are observer selection effects, and in the latter there generally are not. This leads us to differentiate what we call inclusive from exclusive selection and how they relate to the concept of a multiverse. Then we relax the assumption that all observers are equally typical and consider the problem of Boltzmann brains, showing that typicality can play a role in solving the problem. We then stress the need for scale-invariant questions, which causes us to analyze J. Richard Gott’s approach to the problem. This all allows us to analyze the doomsday and universal doomsday arguments. We ﬁnd that there is no doomsday effect, absent a set of assumptions we ﬁnd somewhat unreasonable. Then we use our formalism to resolve a debate in the philosophy community called the “Sleeping Beauty problem.” Finally, we conclude with a heuristic summary, free from equations, and point to possible future directions of this line of research. DOI


I. INTRODUCTION
Physicists usually shun observer-specific informationand for good reason.Our theories are based on invariances, such as those with respect to space and time, and should not depend on who is testing them.Emmy Noether showed that conservation laws are rooted in symmetries [1].Yet we accept boundary conditions and symmetry breaking because of the constraints of the real world.And sometimes just being an observer can bias our viewpoint.It took millennia for humans to realize that we were not the center of the Universe and that we are atypical collections of matter in being confined to the surface of a habitable planet.Some of the apparent coincidences which seem necessary for life to have evolved may be due to generalizing this notion of us being atypical [2,3].But our purpose here is to focus on one particular type of observer effect: that probabilities we assign to the selection of an entity may differ if the entity is an observer because the observer has the capacity to self-select.We will see that changing assumptions can completely change these effects, so, at a minimum, anyone invoking them, or decrying them, should carefully lay out all assumptions made.
Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license.Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.
The quintessential example is the "doomsday argument" [4], about which there is much debate [5][6][7][8][9][10][11]. Suppose you assign some prior probability p for case S, that the "world" of which you are a part (and we will define "world" in various ways) will persist only for a short time, with a relatively small number of "people" ever living in that world.The other possibility, L, is that it will persist longer, with more total "people," to which you assign probability 1 − p.But you realize that in your guess for p, you have neglected to take into account any possible observer selection effects (OSEs).The doomsday argument says that you should adjust p upward because the probability is small that you would just happen to live very, very early in the life of a world, and thus you are more likely to live in a short-lived world for which you would be more typical.Is that right?It depends on your assumptions.
Throughout most of the paper, we will be talking about probabilistic situations where there is a set P of "people" (entities capable of being observers, though not always the primary observer in the situation) from which one is selected, and we want to know the probability that the "person" belongs to a subset of P associated with some property, e.g., "born before the year 2100."A key question is whether the "person" self-selects directly from set P (which is generally embedded in enclosing sets such as worlds), which we call a "Be selection" (a Be for short), or whether they are selected in some other way, which we call a "Pick selection" (a Pick for short).In most of our scenarios, the latter entails more than one selection because in order to pick an element of set P one must generally first pick an element of one of the sets that encloses P (e.g., to pick a nut from a set of jars, one must first pick one of the jars).The posterior probabilities for Be and Pick selection differ: OSEs tend to arise in the latter but not the former.
Philosopher Nick Bostrom has written much about the doomsday argument [6,7,9].He, too, discusses two possible ways an observer could be selected, often using problems of prisoners, which make good toy models because they entail observers confined to specific enclosing sets (cells in cellblocks in prisons).We will assume through most of the paper what he calls the self-sampling assumption (SSA), which just means that you assume you are equally likely to be any member of the set of possible observers you define in your problem, i.e., it is an assumption of typicality.He also considers something called the self-indication assumption (SIA), which says you should weight the probability of your existence by the number of people in the world in which you exist [5,8,12].This is essentially a kludge factor, and why it has rightly been found to be problematic [9,[11][12][13].In fact, the SIA gives the wrong answer whenever there is a selection from an enclosing set, such as in the warden problem we discuss in Sec.III, or when we take theories to be mutually exclusive, as in Sec.VI.Nevertheless, we will see that the weighting factor associated with the SIA appears naturally with the SSA if we assume observers are Be selected rather than Pick selected.
So there are conflicting and problematic results and apparent misunderstandings in the literature, and much of this is due to there being no universal notation.Our goal in writing this paper is to resolve these issues.Central to doing so is our nested-set notation, which we hope will allow authors to make clear their assumptions on how observers are selected, so readers can judge for themselves whether the assumptions made, and the results they lead to, are reasonable.
The paper is structured as follows.In the next two sections, we consider the selection of observers within "worlds" (prisoners in cellblocks), first via a Be selection and then via a Pick selection, showing how OSEs arise in the latter.In the following two sections, we discuss what happens if we embed the worlds in an enclosing set E , and there is just one Be selection on P (an inclusive selection), or an additional Pick on set E (an exclusive selection), again with OSEs in the latter.If we take set E to comprise "everything," then we term the inclusive case the inclusiverse and the exclusive case an exclusiverse.The key difference between them is that in the former we assume that all hypothesized things exist, and in the latter we do not.This leads to a general principle: It is effects of the latter which lead to OSEs.Later we discuss whether it is possible to distinguish these two cases and relate them to the term "multiverse," but our purpose is to lay out how to calculate probabilities given certain assumptions, not to posit the nature of reality.Next we discuss spaces of theories, typicality, and the issue of "freak" observers in cosmology called Boltzmann brains and how our analysis can frame that problem.Then we consider an analysis by J. Richard Gott [14], which lets us phrase the doomsday argument in a scale-invariant way.We are then ready to fully address the doomsday argument and what has been called "universal doomsday."We show that while many sets of assumptions lead to no doomsday effect, it is possible to come up with a FIG. 1.Why the warden problem (with a Pick selection) leads to an OSE and the Prisoner problem (with a Be selection) does not: There are two cellblocks, S and L. Prisoners all simply ask themselves, "Which cellblock am I in?" and then observe their cell number to answer.There are more prisoners in the L cellblock to ask the question, which cancels the rank factor that a smaller faction of prisoners are in the first two cells in L than in S, so those in cell 2 are equally likely to be in either cellblock.The warden first must Pick a cellblock at random and then select a cell at random within that cellblock.If the selected prisoner is in cell 2, then it is more likely that the warden picked the S than the L cellblock because the number of prisoners per cellblock did not affect the odds that she picked that cellblock, and so the rank factor is not canceled as it was in the Be case.
set of assumptions, however implausible, which leads to one.Then we address a related problem in philosophy called the "Sleeping Beauty problem."Finally, we summarize our results and point to future directions.
In an effort to make the paper readable to the wider world, the summary is comprehensive of our results without equations.We have also put details of our nested-set notation and a table that summarizes our results into the Appendix.And in the body of the paper, we spell out many intermediate steps in our equations since some interested in the results here may include those less familiar with working out such steps.

II. TO BE: PRISONER PROBLEM
Imagine you are a prisoner and have the following information: The prison you are in has two types of cellblocks, small (S) and large (L), which contain nS and nL cells per cellblock, respectively.You want to estimate the probability that you are in an S cellblock.
Before we dive into a lot of notation, let us consider a simple numerical example, where there is one cellblock of each type, with nS = 2 and nL = 6 (see the left side of Fig. 1).You do not know your cell number at the outset, so you could be in either the S or L cellblock.Now you look at your door and learn your cell number.If it is greater than 2, then you know you are in the L cellblock.Let us assume that it is cell number 2, so you could be in either cellblock.What is the probability that you are in the S cellblock?Well, there are exactly two cells with cell number 2, one in each cellblock.And you have no reason to favor one over the other, so you should assign a probability of 1/2 for being in the S cellblock.Note that this is equal to the probability of picking the S cellblock at random.In other words, the posterior probability for being in cellblock S, given the cell-number datum that you could be in either cellblock, is the same as the prior probability of randomly picking cellblock S-there is no observer selection effect.Now let us formalize the problem for a general number of prisoners and cellblocks.You assign labels N S and N L to the number of cellblocks of each type, but all you know is that there is at least one cellblock (since you are in one), i.e., N ≡ N S + N L 1.You also know that the prison is full and that each prisoner was assigned a random cell in the prison, with exactly one prisoner per cell.Let the ratio of cells in L and S cellblocks be which is by assumption greater than 1.The bar just indicates we have normalized to the number of cellblocks.The total number of prisoners in all cellblocks of type J = S or L is n J , which is equal to the number of cells per cellblock of that type times the number of cellblocks of that type: Let us call the set of prisoners P (for "person," the set that will usually hold our observers) and the set of cellblocks W (for "world," since this problem is an analog to one of observers in worlds).W S and W L are the subsets of W containing all S and L cellblocks, respectively.Since there are only two types of cellblocks, the set W is the union of them: W = W S ∪ W L .You assign some prior probability for what the fraction of small cellblocks P(W S ) = N S /N might be [we assume that the probability of picking any given cellblock is simply 1/N, and these P(W S ) and P(W L ) are fixed inputs-we will explore varying ratios of them in Sec.IV].Note that P is nested within W , i.e., every element of P (a prisoner) is associated with a particular element of W (a cellblock).The compound set PW S contains the set of S cellblocks, and the set of prisoners in P who are in S cellblocks (see the Appendix for details on notation).
We will assume the SSA [7], SSA: One should reason as if one is a random sample from the set of all observers in one's reference class.
This is simply assuming typicality, that the probability of you being in a subset of a larger set is simply equal to the fraction of observers of the reference class (which we call set P) who are in that subset.For example, the probability to Be in subset P x of set P is just You learn one datum, your cell number.Divide the datum into two categories: d if your cell number is nS , and ¬d if it is > nS .The corresponding subsets of P are P d and P ¬d (P = P d ∪ P ¬d ).If your datum is ¬d, then you know for sure that you are in an L cellblock (because your cell number is greater than nS ).The case of interest is when the datum is d, where you could still be in either type of cellblock.The question we want to answer in the prisoner problem is What is the posterior probability that a prisoner is in an S cellblock, given that they match datum d?
For convenience we define the number of people matching datum d to be m ≡ n d and the number of people matching datum d within a cellblock type J to be m J ≡ n d,J , where J = L or S. All observers with cell numbers nS match datum d, so the number of people per cellblock matching datum d is m = nS , and this holds for both S and L cellblocks, so m = mS = mL = nS . (3) We want to calculate the probability of you being in a cellblock type S (i.e., in subset PW S of PW ) given the datum, d, that you are in a low cell number (i.e., in subset P d W of PW ), which we write at the conditional probability P(PW S |P d W ). We will calculate this using Bayes's law, so we need the likelihood of matching the datum given that we are in a cellblock type S, and the probability [15] to Be in cellblock type S, where P(W S ) is the prior probability to Pick a cellblock of type S (which, assuming random typical selection, is equal to our prior value for fraction of worlds, N S /N).
We need to pause here because Eq. ( 5), despite its simplicity, is the key to most of our results.We have simply taken the SSA at face value.Since the prisoner has an equal chance of being in any cell, the probability to Be in the subset of prisoners in S cellblocks is simply the fraction of prisoners in such cellblocks, n S /n, which as we show in Eq. ( 5) is equal to the prior P(W S ) weighted by the average number of prisoners nS per cellblock of this type.We should at this point note the competing assumption, the self-indication assumption [7]: SIA: Given the fact that you exist, you should (other things equal) favor hypotheses according to which many observers exist over hypotheses on which few observers exist.
This does giving the weighting factor seen in Eq. ( 5), but it is a kludge factor because it gives that factor regardless of how the observer is selected, which, as we shall see, is inappropriate whenever the first selection is from a set that encloses the observer.(Some may take the SIA to mean that this weighting factor should be applied where appropriatenot in any situation where you are an observer.If so, then a way to think of our formalism is that it shows when that weighting factor is appropriate.)In contrast, we derived the weighting factor in Eq. ( 5) simply using typicality (the SSA) and the recognition that we are selecting the observer directly from set P. The effect from how the observer is selected is made transparent by our nested-set notation.There are a number of places in the literature which simply refer to "P(S)" and let it equal to the prior probability for picking a world type S, when to be a prisoner requires P(PW S ) with its weighting factor nS /n.Failing to include this factor leads to erroneous support for a doomsday effect.
Here is another way to understand this weighting factor.If you use the information that you are an observer in a random cell before also applying datum d, then you are more likely to be in an L cellblock than your prior for the fraction of L cellblocks would suggest.For example, if P(W S ) = P(W L ) = 1/2, then there are ρ times as many observers in L cellblocks as in S cellblocks, and so the probability of being in a cellblock type L (before knowing d) is ρ times that of being in a cellblock type S.This factor of nS in Eq. ( 5) will exactly cancel a factor of 1/n S in the likelihood Eq. ( 4).[As we shall see in the next section, this factor is absent if there is a Pick on the world set W .We should also note that by our formulation of the problem we are assuming that the prisoner could be in both types of cellblocks.We will later consider the cases where there are mutually exclusive "universes" (Sec.V) and hypotheses (Sec.VI A).]So the posterior probability of you being in a cellblock type S given datum d is given by Bayes's law, where J = S or L, J mJ P(W J ) = m, and m = mS = mL .The right-hand side is the prior probability for picking a cellblock of type S-i.e., the probability before we have any observer information at all.As we noted before, the prior here to pick a world type S, P(W S ), is a fixed value N S /N, not updated by the datum.What is updated is our posterior probability to be in such a world.[Note that we can also write this more compactly using the shorthand notation described in Appendix, see Eq. (A16).]We can express the fact that there is no net observer selection effect by comparing the ratio of probabilities after (R P ) and before (R W ) observer information: In the prisoner problem, using observer information, which includes the effect of you being in a small cellblock, as well as the likelihood of you being in a low-numbered cell, you obtain the prior probability to Pick a cellblock type S. In short, in the prisoner problem, when your datum is d, there is no net observer selection effect (R P/W = 1).

III. TO PICK: URN AND WARDEN PROBLEMS
Now let W be a set of urns, and P a set of ping-pong balls in them.Each urn contains either a large (n L ) or small (n S ) number of consecutively numbered balls-defining subsets W L and W S .You pick an urn at random and a ball at random from the urn.Before picking the ball, in fact before you actually picked an urn, you had a prior probability that the urn you picked is of type S, P(W S ).After seeing the ball, what is the posterior probability that the urn is type S?, i.e., What is the posterior probability that you pick an S urn and then a random ball in it, given that the ball you pick matches datum d?
Again, let us first use a numerical example to build intuition.Suppose there are two urns, one S and one L, with nS = 2 and nL = 6.You pick a random urn and then pick a random ball from it (we shall see that this is the same as the warden problem on the right side of Fig. 1).If the ball number is greater than 2, then the urn you picked was the L urn.So let us assume the same datum as before, that it is ball number 2, which corresponds to datum d.Now, before you knew the ball number, there was an equal chance that you picked the S or L urn.But once you have datum d, your posterior probability of having picked the S urn has greatly increased because all the balls in the S urn match d, whereas that is true only of 1/3 of the balls in the L urn.In fact, while your prior for picking the urns was equal, your posterior probability of picking the S urn is 3 times that of picking the L urn (3/4 vs. 1/4).Though the setup seems the same as in Sec.II, the fact that there was an initial selection of the urn makes all the difference.
Let us now go into the details.Obviously, if the ball's number is >n S , then you will know that it is an L urn and that posterior probability is 0. So let us assume that the datum d you get is that the ball's number is nS .It is tempting to say that the situation is identical to the prisoner example and that we learn nothing about the urn.After all, both kinds of urns have the same number of balls with number less than nS .But the situation is different because in order to pick the ball from the urn, we first had to pick the urn.To denote that selection, we put a Pick sign " | " between sets (see Appendix for more on our set notation).So to Pick any ball from any urn is P | W , and to Pick a ball matching datum d from an S urn is ), the probability of picking a ball from an S urn given that we picked a ball matching datum d.
The probability of matching datum d given the urn is type S is exactly the same as Eq. ( 4) because if it is given that you picked an S urn, then the Pick has no effect on the likelihood, it is "neutered" (see Appendix) and we put a slash through the Pick sign to indicate this: and with mS = nS (grouping all the balls matching datum d together), However, the probability of picking a ball from an urn of type S is not the same as Eq. ( 5) because there is no weighting for the number of balls.The probability of picking an S urn and then picking a ball from it is same as the prior probability for picking an S urn, Because of this, there is no factor of nS in the numerator to balance the 1/n S rank factor in the likelihood, so Bayes's law does not just return the prior as it did in the Be case in Eq. ( 6): [For shorthand notation, see Eq. (A17).]For P(W L )/ρ small, this goes to 1.
The posterior probability for L given d is which, for equal priors, goes to 1/ρ for P(W L )/ρ small.As in Sec.II, the prior here is a fixed input N L /N that is unchanged by the datum.Our posterior is the probability of the urn that we picked to be type L. To see how data can update a multivalued prior with Pick selection, see Secs.V and X.
The ratios for P and W become There is thus a very strong selection effect when one has to first Pick the urn (R Of course balls are not people, so it is tempting to think that it is the nature of the elements of set P that causes the difference with the prisoner problem.To counter that, consider what we call the warden problem, where P is again a set of prisoners in cellblocks W .But this time, instead of the prisoner just being the observer within a cellblock, a warden selects a prisoner by first picking a random cellblock and then picking a random prisoner within the cellblock, all without noting which type of cellblock she has picked.So the question in the warden problem is What is the posterior probability that a warden picks an S cellblock and then a random prisoner in it, given that the prisoner they pick matches datum d? Then all follows exactly as in the urn problem, and the posterior probability we seek is The warden has a prior probability P(W S ) for having picked a cellblock type S, the likelihood that she gets datum d given that she picked a cellblock type S is one (i.e., P(P d | W |P | W S ) = 1), and, by Bayes's law, her posterior probability given datum d is given by Eq. (10), with a large selection effect, R P | /W = 1/ρ.
The reason the warden problem differs from the prisoner problem is that the warden has to first Pick a cellblock, whereas the prisoner is there without needing to be picked by anyone else.See Fig. 1.(It may help your intuition to imagine nL huge, say, 2000 so ρ = 1000.The prisoner problem is unchanged since if you satisfy d, then you are still in cell 1 or 2 of your cellblock, but in the warden problem she is certain to pick cell 1 or 2 if she picks the S cellblock but there is only one chance in 1000 that she she will do that in the L cellblock.) We note that if we try to use the SIA in this problem, we will get the wrong answer.If you are a prisoner and a warden picks your cell at random after having picked your cellblock at random, and you learn you match datum d, then you should conclude that you are likely in an S cellblock.But the SIA would have you weight your prior probability to be in a given cellblock by the number of cells, as in Eq. ( 5), falsely leading you to conclude that there is no OSE, whereas typicality (the SSA) gives you the correct unweighted prior of Eq. (9).
Just to highlight further, it is the Pick on the nesting set W that causes a change in the posterior probability.Consider the warden cafeteria problem, where all the prisoners are in a cafeteria, and the warden Picks a prisoner at random.If that prisoner is from a cell number nS , then what is the probability that they came from an S cellblock?Now the selection is directly from set P, or equivalently, from inside of the nested set PW , so that the posterior probability is P(PW S |P d W ), just as in the Be case-there is no observer selection effect in the warden cafeteria problem.A Pick directly from the observer set is the same as a Be on that set (see Appendix).What causes a change in the posterior probability is a Pick on a set in which P is nested, such as W .

IV. INCLUSIVE SELECTION
However many nested sets we have, there are two possibilities: Either there is just a selection on the innermost set (a Be, unless there is a way to directly Pick from it as in the warden cafeteria problem), which we call inclusive selection, or there is also at least one selection on one of the enclosing sets [a Pick in all of our examples because we do not consider any sets enclosed by (to the left of) P], which we call exclusive selection.The selection in the prisoner problem is inclusive and in the warden problem it is exclusive.
Suppose we have a larger enclosing set, E , in which P and W are nested.For the prisoner and warden problems, this could be the set of all prisons, each of which has their own small-to-large cellblock ratio.We can even take E to encompass everything that we deem possible-such as a set of universes in all possible configurations.Then we define two possibilities for the reality: The inclusiverse: All things we deem possible are realized.
An exclusiverse: Only some of the things we deem possible are realized.
The key question is whether all things to which we assign a nonzero probability actually occur (inclusive selection), or there are some mutually exclusive possibilities (exclusive selection).Perhaps a quantum example is useful.If one assumes that quantum theory is unitary and all pieces of the wave function with nonzero amplitude are realized, so that Schrödinger's cat is both alive and dead (as in the manyworlds case), then that is inclusive selection.If one assumes that the wave function collapses to a specific eigenvalue, so that Schrödinger's cat is alive or dead, not both, then that is an exclusive selection.In the rest of this section we study inclusive selection, though not its implications for reality.
Let us consider inclusive selection for the prisoner problem but with a much more modest set, where E is the set of all prisons we consider and the only selection is the self-selection of the prisoner.If we think that there are exactly two types of prisons, say, with all S cellblocks or all L cellblocks, then the key to inclusiveness is that we calculate probabilities under the assumption that both types of prisons exist-there is no Pick on the selection of E needed.We explicitly show the sum over subsets of E , e, so when we do the same calculation for the exclusive case, the difference will be apparent.For simplicity we will assume that the number of prisoners for any J = S or L cellblock is the same across all prisons, so nJ,e = nJ , and similarly we assume the number of prisoners per cellblock matching datum d is the same, mJ,e = mJ .The subsets E e differ only in their fractions of S and L worlds.The likelihood for the inclusive case comes out the same as in the Be case, Eq. ( 4 just the prior probability of picking a world of type S, and we again get R E P/W = 1 as in Eq. (7).There is no net observer selection effect for the prisoner problem in the inclusive case (R E P/W = 1).Generalizing, if we are considering a problem where observers are selected only by being, and there is no other selection-all allowed possibilities are realized, as in the inclusiverse-then there is no OSE.

V. EXCLUSIVE SELECTION
Let us analyze the prisoner problem with exclusive selection.The key difference from the inclusive case is that we must Pick a subset E e : Although we posit that there are multiple possibilities E e , only one of them is actually realized.As we said in the previous section, if E is the set of everything possible, and we take reality to correspond to a smaller subset, then we live in an exclusiverse.But we will focus on a more mundane set: For the prisoner problem, those subsets of E are prisons.
The defining characteristic of these subsets E e is the fraction of worlds of type S they contain, which we define as y.So the probability of picking an S world, and a world of type L, 1 − y = P(W L E e |W E e ), is the same for all elements of a given E e .That is, E e is completely specified by its y-in fact we will simply label these subsets by y.Again we assume for simplicity that the number of prisoners per type of world is independent of e: nJ,e = nJ and mJ,e = mJ .But note that the average number of prisoners per cellblock in a given prison, n,e varies from prison to prison: n,e = nS P(W S,e ) + nL P(W L,e ) The likelihood in the exclusive case is the same as in inclusive case Eq. ( 13) because the Pick of subset E e on the first term in the sum is neutered: Let us use Bayes's law again to obtain the posterior probability of you being in a cellblock type S or L given datum d in the exclusive case, which has the same form as the inclusive case Eq. ( 15) except with Picks on E , which we obtain from Eqs. ( 18)- (20):  , (22) where we use mS = mL of Eq. ( 3) and we rewrote the denominators to collect the y dependence.We are again interested in the ratio of L to S posterior probabilities, We want to normalize this to We can see immediately that if there is only one value Y for which That is because that is really the inclusive case-while there is a Pick on E , it is neutered, and all of the values (i.e., the one value) are realized.So for the exclusive case, there needs to be more than one allowed value of y.
So let us explore different assumptions for the function P( | y), which, to remind you, is our prior probability for elements of E with S-world fraction y.For simplicity, let us define the probability density, where now y is not a set of discrete values but all real numbers in [0,1].We can then write the sums in Eqs.(23) and (24) as integrals: R

A. Near a single point
Let us first explore the case where we take y to have a nonzero probability near a single point Y , in particular that p( | y) is constant over the range Y − σ to Y + σ , where of course σ is no larger than Y or 1 − Y so that the points are on the range 0 to 1: [ (x) is the step function, equal to 0 for x < 0 and 1 for x 1.] Plugging this into Eq.( 27), for the prior ratio probabilities or picking L worlds to S worlds, we get just as we obtained for a single point.(This is true because the integrand in the numerator and denominator of R P is more complicated because of the denominator of the integrands.In the limit of σ → 0, R and thus their ratio is Thus if p( | y) is nonzero within ±σ of a single point Y , then there is a small observer selection effect of order σ 2 .In the limit that ρ → ∞ [actually one must be careful when Y is near 1, so really we take So the closer we restrict our prior to be near a single point Y , the less R | E P/W differs from 1, and this behavior is independent of ρ.

B. Flat prior
The simplest prior assumption is that every value of y is equally likely, From Eq. ( 27) this gives equal probability of picking S and L worlds, which we also could have obtained from Eq. ( 29) for Y = σ = 1/2.The posterior ratio of being in L and S worlds, R and for their ratio we obtain, where we take the limit of ρ → ∞ (this approximation is good only for ρ 100).So for a flat prior, we get an observer selection effect which goes roughly as 1/ ln ρ, in between the original prisoner problem, R P/W = 1 = ρ 0 , and warden problem, If the point of choosing a flat prior is to minimize the effect of assumptions on the outcome, then it might make more sense to use inclusive selection instead of a flat-prior exclusive selection-to say that all values of y are realized rather than one of them is realized with equal probability for each.Assuming the latter leads to a small observer selection effect while the former does not.

C. Two separated points
To get a sense of how much the prisoner problem in the exclusive case can approach the warden problem, it suffices to consider a prior with nonzero probabilities at two points, Y ± σ , where 0 < Y < 1 and 0 < σ min (1/2, Y, 1 − Y ), so that both points lie in the range [0,1]: [δ(x) = 1 for x = 0 and is 0 otherwise.]Since the integrands in R | E W are linear the σ terms cancel, and we again get If we assume Y = 1/2, and define k ≡ 2σ , then R | E W = 1 and Eq. ( 37) reduces to Note that 0 < k 1.For k near 0, R | E P approaches 1-two points very close together is very much like the inclusive case.For Y = 1/2 and k = 1, i.e., when the two points are y = 0 and y = 1, In other words, the prisoner problem in the exclusive case where the prior is that the prison is either all L cellblocks (y = 0) or all S cellblocks (y = 1) has the same observer selection effect as the warden problem in Eq. (12).By insisting on an either-or-Pick on the enclosing set E , we have, in essence, turned a Be for the prisoner into a Pick on which top-level subset she is in.So we can go anywhere from no OSE, as in the prisoner case, to a warden-level 1/ρ OSE simply by adjusting our prior assumptions.In Fig. 2, we plot R | E P/W as a function of Y for different values of k, which we more generally define as For Y near 0 or 1, or k near 0, R and the exclusive case is like the inclusive one.The observer selection effect is maximized for (39).

A. Exclusive theory selection
Instead of taking E to be the top-level set, consider a set of theories, .This set of theories might include very different hypotheses about reality, or they might simply specify different enclosed subsets, such as, L : "All cellblocks are type L"' S : "All cellblocks are type S" (41) FIG. 2. How to interpolate between the prisoner (no OSE) and warden (1/ρ OSE) cases: For exclusive selection over an ensemble {E y } (y is the fraction of worlds of type S in that ensemble element) which consists of two separated points y = Y ± σ , we plot a measure of the OSE, R | E P/W (the ratio of the ratios of posteriors to priors for L and S worlds for the exclusive Pick over ensemble E ) versus Y for ρ = 10 (the ratio of the number of people per world of type L to that of type S).The OSE depends on how far apart the points are, which is characterized by k ∈ (0, 1] defined in Eq. (40).Contours top to bottom are for k = 0, 0.25, 0.5, 0.75, and 1.There is no OSE for k → 0 (akin to the prisoner case).The maximal OSE (minimal value of R These two theories could have been encoded in E : They are E y=0 and E y=1 , respectively.But we tend to approach theories differently from ensembles, notably that usually one assumes that only one theory is true, that we have to Pick a theory before proceeding further.This is exclusive theory selection, and the probabilities are the same as in Sec.V.For example, if our prior for the two theories in Eq. ( 41) are equal, R = P( L )/P( S ) = 1, then just as in Eq. ( 39).[This is assuming typicality (the SSA).Again, the SIA gives the wrong answer because it does not take into account selections on enclosing sets, here the Pick selection on mutually exclusive theories.]It is possible to have inclusive selection of a theory, where one assumes multiple theories are realized.For example, one could posit that prisons vary from country to country, so both theories in Eq. ( 41) would be realized somewhere.There is then no Pick on , and one recovers the probabilities in the inclusive section, where there were no observer selection effects (R P/ = 1).One can even have a seemingly fundamental theory be part of an inclusive selection.For example, the landscape in string theory allows different regions of the larger universe to manifest different low energy theories with their own fundamental constants.If one posits that one can be an observer in any region of the landscape that has observers, then that is inclusive theory selection.
As stated, the main point of this paper is to show that the conclusions one draws depend on the assumptions made.If we assume exclusive selection, such as the theories in Eq. ( 41) being mutually exclusive, then we will conclude that there are observer selection effects, but if we assume an inclusive case, such as half the prisons have only S cellblocks and half have only L cellblocks, we will conclude that there are no such effects.

B. Probing a multiverse?
Suppose we consider both possibilities about the selection from set P through set E : that it is inclusive, as discussed in Sec.IV, or exclusive, as discussed in Sec.V, and treat these as competing hypotheses, in or ex .If we treat these hypotheses as mutually exclusive, with a Pick on set , then the overall selection is exclusive.But let us focus on the rest of the selection, from P to E , which is inclusive or exclusive.We can then in principle use our data to alter our posterior probabilities for each hypothesis.Suppose we define E to be everything, so that the inclusive (exclusive) case corresponds to the inclusiverse (an exclusiverse).How do these terms relate to the term "multiverse"?If taken literally, then multiverse simply means that there are more realities than the one we perceive, either via something like parallel universes or just the universe being so large that realities similar to ours occur in some other part of it.That does not actually imply that all possible universes are realized.A set of a few parallel universes, which we will call a partial multiverse, is an exclusiverse, since not everything possible is realized.If all possibilities are realized, then to avoid ambiguity we will call it the complete multiverse.So The inclusiverse is the same as the complete multiverse: All things we deem possible are realized.
An exclusiverse is the same as a universe or a partial multiverse: Some things we deem possible are not realized.

The question of this subsection is
Can we determine whether we live in the inclusiverse or an exclusiverse simply by using a datum such as the date?
To get a handle on this, let us consider the prisoner problem again, where our selection in sets PW is a Be.Let PW again be embedded in a larger set E , which itself is considered in the context of one of two hypotheses, in : "Inclusive selection on E " ex : "Exclusive selection on E " (43) We need new notation to combine these hypotheses in a single probability, with a "controlled-Pick" on E , so that there is a Pick on E for hypothesis ex, but not for hypothesis in.For this we put a left arrow pointing from to the Pick on E : Using this notation, what we want to calculate is the posterior probability for hypotheses h = in or ex given datum d: If we define our prior probabilities for h = in and ex to be α and β, respectively, i.e., then our posteriors are simply Note that we also need priors for the probabilities of the elements of E .For simplicity, let us assume that the only ensembles with nonzero probability are y = 0 (all L-type cellblocks) or y = 1 (all S-type cellblocks), which we saw in Eq. ( 39) gives maximal OSE for the ex case.There is of course no OSE in the in case.For the inclusive case, let us assume equal probabilities for y = 0 and 1: but for the exclusive case let us allow them to vary, where p + q = 1.Our likelihoods are then We can then plug these likelihoods into Eq.( 47) to obtain the posterior probabilities.It is clear that they depend on p (with q = 1 − p).
For p = 1/2, so that the y = 0 and y = 1 weights in the ex case match those of the in case in Eq. ( 48), we obtain posterior probabilities, , Since α and β are 0 and ρ > 1 [so that (ρ + 1) 2 > 4ρ], the denominator for α (β ) is larger (smaller) than 1, and datum d seems to decrease (increase) our credence in inclusive (exclusive) selection on E , except in the trivial case where α or β is zero.This would seem to argue that if E is a set of universes (not just prisons), we could use observer data to alter our probability that we live in the inclusiverse.But there is a second prior in this problem, that of p (with q = 1 − p).We chose p = 1/2 to make the probabilities for y = 0 and y = 1 the same as those in the inclusive case.An equally reasonable hypothesis would be to set p equal to the value that gives the same value for datum d for each hypothesis, so that P d|in = P d|ex = 2/(ρ + 1).With a little algebra, we see that this holds for For this value of p, the denominators in Eq. ( 47) are 1 (since α + β = 1), and so for this value of p we gain no information about hypotheses in and ex from datum d.
What happened?When we thought, due to Eq. ( 51), that we had obtained information about hypotheses in and ex from datum d, what we really learned about was the probability of getting datum d based on two factors-whether the selection from E was inclusive or exclusive-and the priors we had for the elements of E in each case.To the extent that d tells us anything about these cases, it is about a combination of these factors.We cannot disentangle these factors here.In general, one cannot claim that data tell us about whether we are in the inclusiverse (the complete multiverse) or not unless one can show that all other factors which separate the inclusiverse from exclusiverse hypotheses are fixed.

C. Presumptuous philosopher
In the Introduction, we noted that some authors argued against the doomsday argument by assuming the SIA: that we should weight the probability of some situation by the number of observers in it.As we have discussed, this is essentially a kludge, adding the factor that we found in Be choices without the clear-cut mathematical rationale we presented (based on applying the SSA-typicality-properly).This is perhaps why it has been referred to as "controversial" [11,12].
Nick Bostrom argues against the SIA with the following problem [7,9].A philosopher is told that theories L and S have equal probabilities prior to taking into account any observer information.This is like the problem of exclusive theory selection we considered in Sec.VI A, except that there is no datum d favoring S over L. The philosopher states that there is no need to test which is right (and since this is exclusive selection, only one is right) because, by the SIA, L is ρ times more likely than S because there are ρ times as many observers in that case.
Bostrom is right that the philosopher is being presumptuous here, and this is a good argument against the SIA-that if one is to Pick between S and L , there should be no effect from there being more observers in the latter case, because we are picking a theory.This is simply an example of what we have found regarding the SIA-that it gives the wrong answer when there is a selection from an enclosing set, here .But there is no reason to have invoked the SIA in the first place.
In short, the presumptuous philosopher has no bearing on our results because it argues against the SIA, which we did not use.
We note, however, that if the philosopher correctly uses the SSA and is asked about an inclusive problem, whether she is more likely to be in a domain of the inclusiverse governed by theory L or S , she would be correct to answer that she is more likely to be in the former due the SSA weighting by number of observers.In that case she is not presumptuous at all [16].

VII. TYPICALITY
All of the probabilities we have discussed thus far assume that the selection, Be or Pick, is typical, that, for example, if the fraction of observers in some subset P a of P is n a /n, then the probability of selecting a person in that subset is also n a /n.Suppose we relax that assumption and allow atypical selection, where the probability of selecting a person from subset P a differs from n a /n-some values of a are intrinsically more likely to be selected than others [17].For example, observers at CERN are not typical of Earth's population-they are more likely to be scientists than the population overall.Srednicki and Hartle [18] describe an atypical selection in their Eq.(6.1): where q 1 is a posterior result, T is a given theory, D 0 is data, ξ is a "xerographic distribution," which is a set of copies A of q 1 at different locations meeting data D 0 , and ξ A is the probability weight of xerographic occurrence A which is not necessarily what we would obtain from a typical selection.We need to translate this all into our notation.

A. Atypical notation
Let us define ξ 0 to be a typical Be, a typical selection on the set P (embedded in set W ). We are interested in subsets P a of P for some property a of the people in P: Now let us define an atypical Be using ξ to mark the atypical selection point, which may not simply be a ratio of numbers of elements of set P. However, for a given atypical selection ξ on P, we will show that we can always find a new set P, with number of people per world ñ, on which a typical selection ξ 0 , gives the same answer.Here the tilde quantities are related to their counterparts by some scaling factors κ a and κ a|d : We claim that the atypical Be on P, ξ , is equal to the typical Be on P, ξ 0 , if we define κ a as the ratio of atypical to typical selection, where constants c and c d are independent of a.We have the freedom to vary c and c d because the overall numbers of people in P do not matter, just the ratios we are interested in.However, they do affect the values for ñ and m: using the fact that probabilities for even atypically selected people sum to 1.Note that we can choose to set c and c d equal 1 and have ñ = n and m = m, but we need not do this.Now we can show Eq. ( 59) does in fact hold, and we can write our atypical selection on P as a typical selection on P with number of elements defined by Eq. ( 58) with κ defined in Eq. (60).

B. Posterior probability
We can now write Srednicki and Hartle's Eq. (54) in our notation.We want the posterior probability P(PW K |P d W ) but with an atypical Be, i.e., P( ξ PW K | ξ P d W ): which we write as a typical Be on set P defined by Eqs. ( 58) and (60).This is the same expression as for a Be in Eq. ( 6) with the elements from set P. Note that if we condition on a subset a, the selection within that subset is typical (all atypicality comes from nontrivial weighting of the different subsets P a ), thus P( ξ P a W K | ξ P ad W ) = P(P a W K |P ad W ).

C. Atypical example
Let us see how this atypical notation works in an example using prisoners of two types.Suppose half the cellblocks are filled with humans (a = h) and half filled with zombies (a = z).Humans are distributed as in the prisoner problem, nhL = ρ nhS and mhL = mhS .Zombies have the same distribution in cells, nzL = ρ nzS , but let us assume that all zombies who can think well enough to formulate a question think they meet datum d, i.e., mzL = ρ mzS .If you think it is equally likely that you are a human or a zombie (because half the prisoners are humans and half zombies), and for simplicity you assume P(W S ) = P(W L ) = 1/2, then you calculate the typical Be posterior probabilities, Thus, unlike the prisoner problem, there is an observer selection effect R P/W = (1 + ρ)/2, favoring that you are in W L , because there are more zombies matching d in W L .
But suppose you think it is quite unlikely that you are a zombie, say, because zombies do not usually use Bayesian reasoning.For simplicity, you take κ h|d = 1 and set κ z|d to be some very small number κ-one zombie out of every κ thinks well enough to calculate the probabilities we have been discussing (the ratio of chances you are a zombie to you are a human is κ, not 1).Then you calculate the atypical Be, , (66) There is still an observer selection effect, R P/W = (1 + κρ)/(1 + κ ), favoring W L , but note that when κ → 0, R P/W → 1, because there is no OSE due to the human prisoners.If you assume you are not a zombie, then you take κ = 0 and all probabilities spring from P h -in fact if you are going to do that, you might as well drop the label h and ignore the zombies.

D. Redefine the conditional
Another way of addressing an atypical selection which is due to different subsets a meeting the conditional with different relative frequencies is to redefine the conditional so the weights are the same.For example, in the case above, we deweighted zombies by a factor κ because only that fraction of zombies could formulate the question.So why not limit the sets P and P d to the subset P Q of P of people who have formulated the Bayesian question in the first place?As we discuss in the Appendix, adding such a conditional is not just another label but actually redefining the set P as set [P Q ].Then all we need to do is define set P ≡ [P Q ], and typical selection on P gives the probabilities for those atypical people who ask the question.

E. Boltzmann brains
Normal observers are necessarily far from equilibrium and experience an arrow of time of increasing entropy [19].Fortunately, the observable Universe is in a relatively low entropy state [20,21].How did it get that way?Ludwig Boltzmann argued that a low-entropy "world" could arise as a stupendously rare fluctuation within a higher-entropy world [13,22].The prevailing theory of cosmology is more subtle: that our Universe began within a patch of smooth spacetime, which inflated for a time at an exponential rate [23] (for a review, see Ref. [24]).Though inflation has ended here, it has likely not stopped everywhere in the larger Universe.Further, our observable Universe has seemingly entered another era of exponential expansion and seems slated to approach de Sitter space (a spacetime with a positive cosmological constant and vanishing matter density) asymptotically.
If so, then the empty places greatly outnumber the places where normal observers can live.Further, de Sitter space is a thermal state (with a temperature which depends only on the cosmological constant: T = /12π 2 ) [25] and thus seems subject to worlds fluctuating into existence via stupendously rare fluctuations.And one may not need such a large fluctuation, the size of a galaxy or a planet, to create observers; one may need only "Boltzmann brains" [26][27][28], which are spontaneously formed configurations of matter that, for a brief period, are self-aware, including ones that think they are having the thoughts you are having now.Such events are still extremely improbable, occurring at a rate ∼e − S , where S is the reduction in entropy that the fluctuation represents.For a brain-sized object, the timescale to form them, τ BB , will be enormous-of order e 10 70 .(Note that the units do not actually matter with numbers this large-switching from Planck times to Hubble times changes the googol-sized exponents by only about 140.)But this is small compared to the timescale for a Hubble volume to fluctuate into existence, τ HV of order e 10 122 .This is time enough to form googolplexes of Boltzmann brains, far more than the number of normal observers [13].
One might ask why this is a problem.We do not seem to be Boltzmann brains.In fact, we need to assume that we are normal observers in order to do science.And if one conditions on the assumption that we are normal observers, then the probability of us being a freak observer is zero, no matter how common they are [P(freak|normal) = 0].The problem is that if freak observers outnumber us by a large-enough factor, say, a googolplex, there are many, many of them that think that they are experiencing any given moment that any normal observer does, and it is not safe to assume that you are a normal observer.So the problem is one of consistency: You need to assume that your observations reflect reality to do science, and thus it is a problem if the resulting science says that this assumption is very likely to be false.The problem is especially acute if there is an infinite volume of spacetime which could spawn Boltzmann brains, and only a finite volume containing normal observers.This possibility led Don Page to argue that the Universe must decay rapidly, via bubbles of vacuum decay [29], so as to avoid any infinite patches of spacetime, leading him to predict a lifetime of our Universe shorter than about 20 billion years [30].Many papers have been written with less drastic proposed solutions, such as having the physical "constants" vary over time [31].
We want to know whether our analysis of typicality has any impact on the Boltzmann brain problem.Since freak observers may be fooled into thinking that they are normal only for a small fraction of their "life," we use observer moments instead of observers.Let us assume that there are two types of observer moments per comoving Hubble volume, normal (n) and freak ( f ), with n f = ρ nn for some constant ρ which now can be any nonnegative real number, and n = nn + n f is the total number of observer moments per comoving Hubble volume.The probability to Be a normal observer moment is just the fraction of observer moments per comoving Hubble volume which are normal: which is not close to 1 unless ρ → 0. But what we really want is the fraction of observer moments in which the observer is self-aware and could ask a question like "Am I normal?" in the first place.The typical freak observer moment which superficially seems like a normal observer moment might not pass that test.Let us assume that freak observer moments are κ times likely as normal moments to do so.Then we are interested in the atypical selection P( ξ P n ), which is a typical selection on set P, scaled from P by κ on the freak observer moments, This probability can go to 1 even if ρ is large if κ is sufficiently small.But if ρ is huge, as the recurrence time of de Sitter space argues, then the probability of being in a normal observer moment is near 1 only if there is an argument that κ is zero.Boddy et al. [32] make such a case.They argue that if the theory is unitary ("many worlds"), de Sitter space is in a stationary state.Fluctuations do occur, including ones which correspond to Boltzmann brains, but they do not actually correspond to self-aware freak observer moments because nothing happens in a stationary state-there is no decoherence corresponding to the splitting of worlds.If true, then this is akin to setting κ = 0, since being a self-aware freak observer moment is not only atypical, it does not happen.Obviously if κ = 0, then P( ξ P n ) = 1 independent of how big ρ is.
How might this argument be affected by the fact that our Universe contains matter?Well, rarely, stable matter could play the role of an "environment" by interacting with a Boltzmann brain, causing decoherence.Such atypical Boltzmann brains might thus actually be self-aware.How rare is rare?An upper bound to the fraction κ of such atypical matterinteracting fluctuations is the fraction of Hubble volumes which contain even a single matter particle.Let us define the entropy of a Hubble-volume-sized fluctuation entropy change, so that the fluctuation time τ HV for Hubble volumes is ∼e S and the fluctuation time for Boltzmann brains τ BB is "about" e √ S (more accurately, ∼e S 0.57 ).Then the number of freak observers is huge: n f ∼ τ HV /τ BB ∼ e S .The number of normal observers per comoving Hubble volume is proportional to the volume of spacetime in which they can occur.A healthy upper bound on nn is S (e.g., 10 20 moments/lyr 3 s × 10 31 lyr 3 × 10 64 yrs × 10 7 s/yr), so that i.e., the number of freak observer moments is so vast that the number of normal observer moments is irrelevant.Then the probability of being normal vanishes: P(P n ) 0 to a very good approximation, yielding a seemingly serious consistency problem.But only fraction κ of freak observers actually can be self-aware by the argument above, where κ must be smaller than the fraction of Hubble volumes with any matter in them.de Sitter space expands exponentially fast, so soon there is fewer than one particle per Hubble volume.By the time of the first Boltzmann brains, the fraction of Hubble volumes with a single matter particle is This is exponentially smaller than ρ is big, and κρ does go to zero so that the relevant probability that we are normal observers, P( ξ P n ), goes to 1.In summary, by this argument Boltzmann brains are overwhelmingly plentiful, but those which are atypically self-aware are very rare and thus not a problem.That matter effects are negligible is unlikely to come as a surprise to those already convinced by the arguments of Ref. [32].We do think it is interesting that there is a typicality factor so strong that it overwhelms even an exponentially large factor like the ratio of freak to normal observers (κρ 1).

F. Scarce observers
Thus far we have assumed that observers in models are not rare.In fact, we have assumed that there is one observer per "cell."What if we relax this assumption and assume cells are filled only with probability p F ? Hartle, Hertog, and Srednicki show that there is a different kind of OSE called "first-person probabilities" [33].Consider a set of models K .If p F is small enough, then it is possible for there to be no observers in some or all of them (we do not necessarily think that assuming "scarce observers" is a reasonable hypothesis, we are merely considering the consequences of that assumption).First-person probabilities weight models by the probability, p 1 , that there is at least one observer in the model-one cannot be an observer in a model if there are no observers in it.If there are n J observer locations (e.g., cells in a prison block or Hubble volumes in a Universe) which contain observers with probability p F , then the probability that there are no observers in the model is (1 − p F ) n K , and the probability that there is at least one observer in the model is [33] Now the inclusive probability P(P K |P ) (i.e., multiple theories are realized-a theoryverse) is not affected by p 1 K because we are conditioning on there being one observer (the "P '), and the weighting by the number of observers in each model, p F n K , already takes that into account.So we have where n = J n J P( J ) is the average number of observer cells per model.Models with more observer cells are favored because it is more likely for an observer to be in such a model, as expected from our previous results.In a cosmological model this corresponds to volume weighting [34] where models with greater volume for observers are favored.What about the exclusive probability P(P | K |P | ), which is how one generally selects between competing models?Condition "P | " ensures that there is at least one observer in one of the models, but to ensure that a given model meets that criterion, we need to weight the models by p 1 K [33]: There are two interesting limits: where observers are common or rare.First, if p F n K is large for some models and tiny in others, then p 1 K are close to 1 for the former models, and they have observers.Define these models that certainly have observers by subset obs and normalization factor N ≡ J∈ obs P( J ).Then the probability becomes Note that models either "pass" (are in obs ) or "fail" (are not in obs ).If all models we consider pass ( obs = ), then N = 1, and we obtain the usual expression for a Pick probability.If, on the other hand, all the p F n K are small, so there are no models that certainly have observers ( obs = ∅), then p 1 ] and the Pick probability becomes This is the same as the inclusive probability!Even though we are Picking between mutually exclusive models K, there is nonetheless a volume weighting factor, not just a pass-fail selection, due to it being less likely that scarce observers are in a model with few places for them to be.So this "firstperson" effect of Hartle, Hertog, and Srednicki is somewhat orthogonal to the observer effect we have been discussing: Ours assumes observers in every "cell", p F = 1, and comes from the difference between inclusive and exclusive selection, while theirs assumes the limit where observers are scarce, p F 1, and is the same for inclusive and exclusive selection in that limit.
This "first-person" analysis can be used in the context of freak observers.Suppose we consider two models, S and L, which differ only in the volume of spacetime in which freak observers occur.We could assign probability p n for "you" to arise normally per unit volume of spacetime and p f for a "freak" observer that thinks they are you (i.e., after any typicality effects have been folded in).Let the volume of spacetime where normal observers can arise be m K , and the volume where freaks could arise be n K , which is usually much larger.We want the case where you exist within the model [ and that no freak versions of you exist, (1 − p f ) n K (as we argued before, you want to rule out cases where you might be a freak observer for self-consistency).Let us refer to this as "1n, 0 f ."Then the ratio of exclusive probabilities is where R ≡ P( L )/P( S ) and we have assumed m S = m L (i.e., that the models do not differ in the volume of spacetime available to normal observers).The last line follows for large n.
We can neglect n S for n S n L .Then there are two interesting limits.If p f n L is small, then freak observers are scarce, and the "first-person" ratio R P is only slightly smaller than the "third-person" one: This is a slight preference for S models over L ones, but for p f n L 1 the preference is negligible.The other limit of interest is when both models have problems with freak observers because p f n K is large.Then each theory is deweighted by the factor (1 − p f ) n K which goes to 0, but the factor for L falls much faster and we have, strongly favoring S over L. So under the criterion of "no freaks like me," if there are no models without significant probability for freak observers, then the ones which minimize the volume for them to spawn are strongly preferred.Of course, any model which has no freak observers would, by that criterion, be preferred over those.

VIII. GOTT ANALYSIS
J. Richard Gott III wrote about what seems to be an entirely different kind of observer selection effect [14].He argued that simply by knowing how long some finite-lifetime entity has been observed, one can bound the probability of it lasting a long time.For example, if you live at time t after the start of a civilization, his argument says that simply assuming you are a random observer implies that the probability of the civilization lasting 40t is only 1/40 or 2.5%.
There are a number of problems with this argument, as we shall see.The first is that Gott's analysis did not make use of a prior [35], which Gott then addressed [36].This point was echoed by Carleton Caves [37], who found that the prior probability for a world having lifetime T needed to obtain Gott's result is the Jeffereys prior, which goes as 1/T .However, as we shall see, this corresponds to a Pick selection.The prior needed to obtain the probability Gott finds to Be in a civilization lasting time T is not the Jeffereys prior, but a prior that goes as 1/T 2 [38,39].Caves argued that the analysis was also flawed because it assumed that the observer had to live only during the time span of the "world," and that once one relaxes that assumption, the effect goes away.(This is really about what set of observer moments it is reasonable for one to consider that the moment at hand is randomly drawn from.For Gott's example of the Berlin wall, one could assert that his observation of the wall was drawn randomly from possible moments during the existence of the wall when he could ponder the question of the duration of its existence rather than a random moment from his lifetime that predates and postdates the wall.It is then a question of whether that assumption is reasonable.It is certainly problematic in many cases.For example, it is hard to argue that the observer moment in which you ponder the lifetime of an architectural construction is randomly drawn from all the moments during its existence if you were born before it was built-for a longlived construction you are necessarily seeing only its earliest moments.)But it should not be a problem in the narrow case of interest to us: where we assign probabilities for the lifetime of the world in which we were born-we are necessarily alive only during the world in which we are born, and so random observer moments in our lifetime are necessarily within the time window of the world's existence.
We will first explain the Gott argument in his notation and then ours.Then we will show how to incorporate a prior, derive results for different priors, and determine which one gives Gott's results.Then we show that Gott's results do not actually represent an OSE, and we trace the source of the effect.Finally, we consider the exclusive case, where one lifetime is picked.

A. Gott's argument
Suppose we are a random intelligent observer of some "world" of lifetime T which has existed so far for time t.We do not know T and we want to know if knowing t tells us anything about T , other than T t.Gott gives a few examples [14], but they are of two types: things on which our existence does not depend, such as the time span for which the Berlin wall existed, and things on which it does depend, such as the civilization in which we were born.We will not consider the former further, except to note that the second critique of Caves may apply to those situations.Thus, since we assume we live during the world, we can without loss of generality define Gott's quantities as where we take as a precondition that t is in the range [0, T ].This world could refer to our planet (in which case t ∼ 10 9 years), the era of homo sapiens (t ∼ 10 5 years), our civilization (t ∼ 10 4 years), or civilization since Bayesian questions like this have been asked (t ∼ 40 years).One could even try to argue that it refers to the metastable electroweak vacuum (t ∼ 10 10 ).Now, going back to the original assumption, it is not at all clear that we qualify as a random observer in any of these "worlds," but nevertheless let us assume that we do.First, Gott argues each value of t in the range [0, T ] is equally likely.This is true if there is an equal number of observers at each time t in [0, T ] (unreasonable in most cases-really t and T are better thought of as the current and final tally of observers in the world) and one selects them at random.This can be loosely written, "P(t ) = const/T." Further, this means that t/T is a random number between 0 and 1, so Finally, if we sum up the probabilities for our expectation for the remaining time left for the world, T fut ≡ T − t, then we obtain that it is overwhelmingly likely to be of roughly of order t (neither much greater nor smaller than t), or, focusing on the upper end and using T fut ≡ T − t to write this more generally, where K > 1.Note that for K = 40, we get the probability of T fut = T − t being greater than 39t is 1/40, or 2.5%, in agreement with Eq. (85) (the upper and lower tails are equally probable).Further, note that these are scale-invariant probabilities: They depend on the ratio t/T independent of whether the scale is decades or millennia.
Gott seemingly found a way to argue that our datum t not only tells us something about our world's eventual lifetime T but argued that T is unlikely to be more than a few times t, no matter the scale.
Is this right?

B. Our argument
As usual, we have a set of observers P and a set of worlds W .As Gott does, we will for simplicity assume that the number of observers at each time is the same.We will use the compact notation outlined at the end of Appendix, i.e., where α and β can be "null," e.g., P(T |t ) ≡ P(PW T |P t W ) and Let us then define the probability density to Be in a world at time t (for a moment lasting dt): The probability density to Be in a world of lifetime T (one again needs a finite range [T, T + dT ]) and to Pick a world of lifetime T are p(T ) ≡ P(PW [T,T +dT ] )/dT, (89) Note that the probability density to Be in a world is weighted as before by the total number of observers who will ever live in the world, which by assumption is proportional to T , so What we are going to do is start with a prior probability density for our world having lifetime T , p( | T ), the likelihood density of being in our world at time t given that it will exist for time T , p(t|T ), and we will use Bayes's theorem to calculate the posterior probability density of our world living time T given our datum t, p(T |t ).
The likelihood density is, as Gott said, a constant, independent of t, p(t|T Note that if we integrate this probability density over all values of t in [0, T ], P((0 t T )|T ) = T 0 p(t|T )dt we get 1.This is essentially the same expression as Eq.(83) which we used to express Gott's words, except that here we are explicitly writing a likelihood density conditioned on lifetime T .
The key problem with Gott's analysis is that he jumps right to a probability for t/T without a prior.Let us examine three possible priors, and see which gives the results Gott found.We need the prior probability density for Picking a world of lifetime T , p( | T ), because it should contain all factors other than our existence.This is parallel to what we did in the prisoner scenario, though there we needed only probabilities P(W S ) and P(W L ), whereas here we need a function of T over its range.This brings up an important point: We need to define minimum and maximum plausible values of lifetime T for the world we are in, T − and T + respectively.They allow us to properly normalize our expressions, but T ± play a more subtle role, too, as we shall see.It must end up being the case that T + is greater than both t and T , and that T − be smaller than T , so if we really tried to define T ± without any idea of the timescales involved, we might fail in that.And our expectations for the timescale might change with t.For example, today we might see T + = 10 6 years as reasonable, but if civilization somehow survives for a million years, then that T + will be too low.This is less of an issue for T + because we will be able take it to infinity in our final expressions.But T − is trickier.
Three reasonable choices for our prior p( | T ) are constant, ∼1/T (Jeffereys), and ∼1/T 2 .The normalized priors to Pick a world of lifetime T ∈ [T − , T +] are as follows: which lead to corresponding probability densities to Be in such a world [again assuming the number of observers at each time is constant and Eq.(91)]: Next we plug the likelihood density p(t|T ) in Eq. (92) and our Be priors p(T ) in Eq. (94) into Bayes's theorem, We can calculate p(t ) by integrating p(t|T )p(T )dT over T .We need to be a little careful about the limits of integration because we have defined T t and T T − , but at the moment it is ambiguous whether t is greater than T − or not.So let us define the lower limit on T to be the maximum of the two: t m ≡ max (t, T − ).For the three different priors, we obtain three posterior probability densities for T ∈ [t m , T + ]: where the right-hand side is the limit where T + → ∞.Note that these are the same expressions as the priors in Eq. ( 93) with T − replaced by t m .In other words, the only effect of the datum here is the trivial replacement of the lower bound on T because it is necessarily at least equal to t.So if we quantify the OSE by taking the ratio of the posterior to the prior, then we obtain for the three priors, where again the right-hand side is for T + → ∞.In that limit, the first two priors yield R T = 1 even if we include the replacement effect of T − → t m .To evaluate the third prior, we need to discuss the value of t m .There are three possible values: (i) t < T − , so t m = T − , and our lower bound on T does not increase.
(ii) t = T − , so t m = t = T − , and our lower bound on T does not increase.
(iii) t > T − , so t m = t, and our lower bound on T does increase.
The first case means that prior to our using our datum t we assumed that the minimum value of T was larger, asserting that there is zero probability for our world to end between now, t, and T − .The third case means that prior to taking note of t, we thought that the lower bound on T was T − , and so datum updates our knowledge, raising that lower boundyet somehow we are still confident in our prior assumed probability density despite being wrong about its endpoint.The second case strikes us as the most reasonable, because we should already know that T − t and cannot know that T − > t, so we should assume T − = t.Nevertheless, let us consider all three cases.
For the first two cases, t T − , all three priors lead to R T = 1.For t > T − and the 1/T 2 prior, R T = t/T − , which is >1.This is an upward shift due to the fact that the posterior probability density is nonzero over a smaller range, [t, T + ], than the prior probability density [T − , T + ].We will call this a "boundary condition OSE."It is not due to the number of elements in the set of observers, P, as in OSEs we considered previously.Rather, it is simply due to raising the lower bound on T from T − to t.
So, given that there is only at best a boundary condition OSE here, can we reproduce Gott's result?We can.To compare to Gott's result, we have to integrate these functions of T from Kt to T + for fixed t (and assume Kt ∈ [T − , T + ]).This yields probabilities for T in the range of Kt to T + : where we again take the limit that T + → ∞.We see that for the constant and Jeffereys priors, the probability of T > Kt goes to 1.This is not surprising; if we assume the maximum on T is much greater than Kt, then the probability that T > Kt approaches 1, unless our prior falls very fast.For the prior p( | T ) ∼ 1/T 2 it does fall fast enough.If t T − , then t m = t so that and we have obtained Gott's expression in Eq. ( 86).(For t < T − , this integrated probability is larger.We shall see what that means shortly.)So even though there is only a boundary-condition OSE, we have reproduced the result of Gott, seemingly disfavoring long-term worlds.How is that possible?

C. Why does Gott seem to find an OSE?
To answer this, consider the situation before we know datum t and where we Pick a world at random.We know by assumption that with probability 1, T ∈ [T − , T + ] (integrate p( | T ) from T − to T + and we get 1).Suppose we ask what the probability is for this world to last K times its minimum, i.e., for T > KT − .We simply integrate p( | T ) from KT − to T + .This gives For fixed KT − and T + → ∞ this gives 1/K.In other words, the effect that Gott found has nothing to do with the datum t but just the rapidly falling prior to which his result corresponds.
Still, it is useful to define a metric which manifestly shows that there is no OSE.For that, let us define the ratio of probability densities integrated over T .Dividing Eqs. ( 99) by (101) we see that for the 1/T 2 prior, For t T − , the cases where we obtained Gott's result, we see that this equals 1-that the posterior probability is the same as we obtained using the prior lower bound, and there is no OSE of any kind.For the case t < T − this ratio is larger than 1 [note that the right-hand side cannot exceed . What that means is that from our prior, we assumed that large T worlds were disfavored, but on learning that t < T − , our expectation is less negative due to not having reached the lower bound in the world's lifetime, T − .So in the inclusive case, there is no 1/T OSE.For a fast falling prior we can obtain Gott's 1/K result, but it is not an OSE either, just a manifestation of the fast-falling prior we assumed.The only OSE that remains in any of these cases is if we assumed a fast-falling 1/T 2 prior, thinking that worlds with T > KT − were very unlikely, but then finding out that t < T − , making our posterior probability less dire than our prior.

D. Picking hypothesis T
Suppose that instead of Being in a set of worlds of various lifetimes T , we assert that there is precisely one world, with one future and one lifetime T * , and we have a set of hypotheses T for what T * is.This is an exclusive case, and we are interested in the posterior probability density, The key difference from our analysis above is that the prior that goes into Bayes's theorem is the Pick probability density p( | T ) instead of the Be probability density p(T ) [and the corresponding denominator p(t | )].The likelihood is not affected, as in the warden case, because the Pick is neutered.The upshot is that the posterior probabilities go as ∼1/T times those in the Be case in Eq. ( 96), which means there is an OSE for this pick-a-hypothesis-T * : Specifically, But as with R T , R | T is not an ideal metric of OSE, so we should consider the probabilities resulting from integrating over T : and we obtain the same 1/K expression as Gott, now for the 1/T prior and T − = t [the expression is the same as the Gott case, but his description of the problem seems like a Be and thus corresponds to Eq. ( 100)].
As we did in the Be case, we define an OSE metric as the ratio of integrated probability densities, which yields for the two priors we consider here, What this means is that there is a true OSE in the t T − Pick case for the 1/T prior which manifests itself as a factor of 1/K in that ratio of the integrated posterior to prior probability densities.In other words, the posterior probability density falls with T faster than the prior probability density due to an OSE, which manifests itself in R | T being smaller than one.If t < T − , then this is mitigated by the T − /t factor and is completely erased if Kt < T − , yielding R | T = 1.
For the constant prior case, there is an OSE in the ratio of probability densities (R | T ∼ 1/T ) but it is washed out when one integrates over T (the posterior probability density falls faster with T than the prior probability density, but both fall slowly enough that their integrated probabilities go to 1, hence their ratio, R | T , is also 1).
So in the exclusive case there is a real OSE but only if the prior falls fast enough and t is not much less than T − .

IX. DOOMSDAY ARGUMENT
We are now finally ready to discuss the doomsday argument.The question is Do observer selection effects increase the probability that our world will be short-lived?First, this is a very strange thing to ask.This would entail laying out all the factors which we might use to assign a probability for the world ending soon and separate out the datum of what year it is.But all of the factors are intertwined.For the purpose of the argument below, we need to make the somewhat unreasonable assumption that we can put all factors (e.g., our estimate for the probability of nuclear war) other than that datum into some prior-which is somewhat unreasonable because such a calculation usually depends on temporal information (e.g., the survival probability per year was surely lower in the early days of nuclear weapons than at other times).In any case, we make this assumption for the arguments below.
As it is usually stated, the question is whether the probability that we live in a short-lived world (world type S) or a longlived one (world type L) is changed given the information about the date (datum d).Clearly this is a Be selection-we are born in this world without the need for that world to be picked.So the zeroth-order analysis is that the case is like our very first example, the prisoner problem, where there was no OSE and thus no doomsday effect.The posterior probability of being in a short-lived world is just given by Eq. ( 6) and equals the prior probability of picking such a world, so that the ratio of posterior probabilities to their priors, R P/W , is 1: But we need to be careful just what our assumptions are regarding any larger sets PW are embedded in.For example, if we treat the world types as mutually exclusive hypotheses for short-lived and long-lived worlds, S and L , then there is a Pick at that level and there is an OSE akin to that in Eq. (10), , (112) Note that here we are saying that either hypothesis S or L is realized but not both.This is reasonable only if one assumes that there is only one relevant planet (the Earth) because there are no relevant exoplanets (we are not asking about the inhabitants of inhabitable worlds, just of the Earth), nor copies of the Earth nor multiple futures of this one Earth (in a partial or complete multiverse of some sort, such as in unitary quantum mechanics).Again, there is an OSE given these assumptions because we are saying that there are multiple hypotheses ( S and L ), but only one of them can be realized.
This also assumes that we are typical observers.This, too, can depend on assumptions or on how the problem is stated.For example, by saying that you are equally likely to be any human throughout history fails to take into account the fact that only a tiny fraction of humans throughout history might have asked the doomsday question, at least as stated.For example, humans before 1763 could not have phrased a question in terms of Bayes's theorem [40], and the question "Will our civilization last until the year 2500?" will become moot in 500 years.Similarly, the question "Will our civilization last another 100 years?" changes character as the centuries we survive accrue, since a century becomes a smaller and smaller fraction of the civilization's lifetime.We need to phrase the question in such a way that it would be just as reasonable for a current and future inhabitant of the civilization to ask it.
We argue that the question framed by Gott is actually best, because "Will our world last K times its present age?" is somewhat timescale invariant.There are still issues with assigning a starting point for the world, and a prior probability density for a world of lifetime T , p( | T ) (e.g., neglecting the problem of lumping all other factors into the prior in a time-independent way), but at least it is reasonable for future observers to ask that same question.So, to be specific, we should ask whether the current age of our world, t, should affect our estimate for the lifetime of the world, T .As we discussed in Sec.VIII the selection in PW is a Be, and there is just a boundary condition OSE: the effect of replacing the lower bound on T , T − , with t, for t > T − .We further argued that it is not reasonable to have chosen T − either greater or smaller than t, and that for T − = t, the prior and posterior probability densities are equal, so there is no OSE at all: We then integrate these probability densities over T to obtain the probability of Being in a world with T > Kt given t.As we said in Sec.VIII this goes to 1 unless the prior falls quickly, see Eq. (99).Even in the case of such a fast falling prior, the 1/K effect is not an OSE but just an artifact of that prior.We quantified that by taking the ratio of integrated probabilities in Eq. (102), which shows that there is no OSE at all in the Be case.Is there any somewhat reasonable set of assumptions which leads to a doomsday effect?Yes.If we assert, as we did in Sec.VIII D, that there is a unique lifetime for the world, T * , and we have hypotheses T for what that T * is, then there is a Pick on the nested set, P | , and there is an OSE given by Eq. (105): But even then, if we choose a constant prior probability density p( | T ), then the posterior probability that the world will last K times longer than it has so far goes to 1 as in Eq. ( 107).However, if we start with a 1/T prior, then the OSE is not washed out in Eq. ( 107), and the OSE survives in the ratio of integrated probabilities, Eq. ( 109): This is a doomsday effect.It says that given the assumptions above, even if we include our timescale in setting the minimum lifetime (T − = t), integrate our probability densities over T , and normalize to that integrated probability for the prior, there is an OSE in the Pick case for a falling prior-that our datum t, by itself, should cause us to reduce our posterior probability that our world will last substantially longer than it has.So, in summary, there can be a doomsday effect, but to have one requires a set of assumptions like this: (i) All factors other than the current age of the world, t, can be separated out into a prior, which is a simple function of the world's lifetime T .
(ii) You are typical of observers throughout the lifetime of the world, including in what question is being asked.
(iii) There is exactly one true value of the lifetime, T * , because you consider only one world with one fixed future-so you view the values of T to be mutually exclusive hypotheses for the value of T * , resulting in a Pick.It is not enough to assume an exclusiverse; it has to a be universe with only one manifestation of the world so that there is only one true lifetime T * .
(iv) The prior probability density falls as a function of T so that the integration over T does not wash out the OSE.
Absent a set of assumptions like these, there is no doomsday effect.All of these strike us as somewhat unreasonable, except the last.Thus, one can probably not argue that our "world," be it the era of Bayesian reasoning or of the stable electroweak vacuum, is doomed to end soon on the basis of datum t.

X. UNIVERSAL DOOMSDAY ARGUMENT
In addition to the doomsday argument, which concerns our world, some authors have discussed a "universal doomsday" argument [10,11], which says that not only does our datum imply that our world is doomed to die sooner than our priors for its lifetime, due to some OSE, but that all worlds are also doomed to die out sooner due to our datum.Some authors argue that "universal doomsday" can occur even when the doomsday effect is not present.This cannot be.If there is a doomsday effect due to a temporal datum, that lowered posterior probability can affect our posterior probability for the lifetimes of other worlds, but it should be clear that if there is no doomsday effect, if we gain no information from our datum about our own world, then our posteriors for other worlds must be unchanged as well.
What we are interested in is how the datum affects an ensemble of worlds, E , as we consider in the inclusive and exclusive cases of Secs.IV and V.In particular, here are the posterior probability densities for ensembles of type y given datum d, in the inclusive case where there is no doomsday effect, and in the exclusive case where there can be one: We ask whether these differ from the prior probability density for y, Universal doomsday is the claim that it does.If the probability distribution function for y changes, then so does our estimate for the average fraction y of worlds of type S. Our prior estimate is the average of y weighted by the prior p( | y), After taking our datum d into account, our posterior estimates for that average in the inclusive and exclusive cases are weighted by the posterior probability distribution functions p(y|d ) and p( | y|d | ), respectively, For reasons that will become clear in a moment, let us define metrics for universal doomsday, where the " | " is there in the exclusive case but not the inclusive case.
It turns out we have already come across these averages.The prior average fraction y in Eq. ( 121) is equal to the prior probability of worlds of type S: Note that if we assume that Ny / N = 1, i.e., that the ensembles differ only by fraction of worlds type S, y, not their number, then p(W S | E ) = p(W S E ), so that this is the prior probability of worlds of type S in both the exclusive and inclusive cases.What about y d and y d | ?They turn out to be simply equal to the posterior probabilities for being in an S world, given datum d, in the inclusive and exclusive cases, respectively: These are just the expressions for the posterior probabilities for worlds of type S. In fact we see that Thus, we see that the metrics for universal doomsday are exactly the same as for doomsday, and R E P/W = 1 for both doomsday and universal doomsday.So one cannot have one without the other.For the exclusive case, y d | = y , and R W , but the values for these metrics and R | E P/W for universal doomsday and doomsday are the same.There is a fundamental reason for this: Any doomsday effect, from our data on being in a world selected from ensemble E , can be written as a universal doomsday change in our weighting of the ensemble, i.e., taking p( | y) → p(( | )y|d ( | )).So universal doomsday and doomsday are two different ways of expressing the same effect or lack thereof.

XI. SLEEPING BEAUTY PROBLEM
Let us apply what we have learned to an observer thought experiment called the Sleeping Beauty problem [41], which has generated disagreement to the point that philosophers have separated into two camps called "halfers" [42][43][44] and "thirders" [41,[45][46][47]: Suppose Sleeping Beauty is put to sleep on Sunday.She is woken on Monday, questioned, then put back to sleep, and all her memories of that day are deleted.A fair coin is flipped.If it lands tails, then she is also woken on Tuesday and again questioned, put back to sleep, and her memory deleted.If it lands heads, then she is not woken on Tuesday.In either case she awakes on Wednesday after the experiment concludes.Beauty is aware of all of the above.She is asked each time she is woken for the probability that the coin flip results in "heads." So-called halfers argue that she should answer "1/2" (each time) because it is a fair coin and she learns nothing from being awakened, and the question is the same as "what is the probability you are in a heads world?" (i.e., a world where the coin landed heads).So-called thirders argue that she should say "1/3" because there is one observer moment associated with a head flip, which we will call Mon-H, and there are two associated with tails, Mon-T and Tue-T , and the question is effectively the same as "What is the probability you are in a heads observer moment?"There are a number of other papers advocating one side or the other, but none of them specify whether the situation corresponds to inclusive or exclusive selection, which we will see is key.A number of authors assume the SIA, which as we have pointed out is an unfortunate kludge that leads to the presumptuous philosopher problem.All authors seem to argue that if Beauty learns that it is Monday, her estimate for "heads" should go up.As we will see, that is not always true.There are also arguments about what wagers she should be willing to accept and whether that reasoning should affect her probability estimate, which we address at the end of the section.
For our formalism, we need two sets.We need a set of worlds, W = {W H , W T }, in which the coin came up H or T .For a fair coin, the probability of picking each world is the same: P(W H ) = P(W T ) = 1/2.Nested inside W is the set, P, of Sleeping Beauty observer moments, P = P Mon,H ∪ P Mon,T ∪ P Tue,T = {Mon-H, Mon-T, Tue-T }, where the first element belongs to P H (which is nested in W H ) and the other two to P T (nested in W T ).If Beauty does not know the day, then all three of these observer moments are indistinguishable to her.
First, let us look at Beauty's viewpoint within the inclusive case.The probability that she should assign for the coin coming up heads within the world associated with her observer moment is given by the Be probability for a heads observer moment, That is, in the inclusive case "she" is in all three observer moments, only one of which is a heads observer moment.
If she learns the day is Monday, then the set of observer moments is [P Mon ] instead of P, and her probability for "heads" increases because "she" could be in only two Monday observer moments: Thus, in the inclusive case, learning that it is Monday does increase her probability estimate that the coin came up heads, and both of these probabilities correspond to those of the thirder camp.Next, let us look at Beauty's viewpoint with exclusive selection.If she does not know the day, then her probability estimate is the same as that of an outside observer, such as the coin flipper, where a single world (coin flip) result is Picked first: In other words, if she assumes there is one world, it has a 1/2 chance of being an H world, and her being awake in an observer moment and not knowing the day brings her no new information.This is the halfer point of view.Now suppose she learns it is a Monday.One might think that this information should increase her credence in "heads." And in fact, if you were to Pick a single recording of a random day in the experiment (Mon-H in an H world, Mon-T or Tue-T in a T world), and the recording turned out to be from a Monday, you should increase your credence that the coin came up heads, as the halfer camp claims, P("Record Picked= Mon-H"|"Mon") but that is not what Beauty does.Instead, if the coin comes up tails, then she experiences both Mon-T and Tue-T , so the fact that one of them is on a Monday adds no new information.In our formalism, the way to see this is that the set of observer moments is [P Mon ] instead of P, and her estimate for the probability of heads is just So if Beauty assumes exclusive selection, learning that it is Monday does not increase her credence that she is in an H world because she is sure to experience a Monday whatever the coin flip.(The reader might note that if Beauty learns that it is a Tuesday, then she should assign zero probability to H, but that fact does not affect her probability for H in the case where she learns it is Monday because in a tails world "she" experiences both days.)This is good, because if she knows it is a Monday, then the amnesia drug is irrelevant, it is the same situation if you ask anyone what the odds a fair coin will come up heads, and there had better be no difference between inclusive and exclusive selection: They both conclude that the probability of heads is 1/2, as they do in Eqs. ( 131) and (134).Now, it is interesting to consider what happens if we run the experiment multiple times, once a week for w weeks.We will assume she does not know the day, so the amnesia drug does matter.If Beauty knows the week, then she can treat each of the w experiments like a copy of the original experiment, and she should come to the thirder (halfer) probability in the inclusive (exclusive) case.If she does not know the week, then the inclusive probability is unchanged, but something interesting happens in the exclusive case: We get the result Nick Bostrom calls a "hybrid model" [48].
In this exclusive situation, there is one fixed set of coin flips F = {F 1 , F 2 , . . ., F w } which actually occurs.The set of worlds can be broken into 2w subsets specifying exactly one flip, such as W F 1 , where the coin in week 1 came up heads for F 1 = H 1 and tails for F 1 = T 1 , and we do not specify what happened in the other weeks.We can also break W down into subsets with the flips in multiple weeks specified, including the 2 w subsets where they are all specified: W F 1 F 2 ...F w .There is a third way to partition the set W , by the total number of heads, h, in set F , W h .If w = 1, then we have P(P | W H 1 ) = 1/2 because she is in either W H 1 or W T 1 with equal probability.But, if w > 1, although she reasons she can experience exactly one sequence of coin flips, e.g., {H 1 , T 2 }, then she also reasons that in that world she should lump observer moment Mon 1 -H 1 with Mon 2 -T 2 and Tue 2 -T 2 , since she has no way to tell them apart.
So for sequences with half the flips heads, h = w/2, she will come up with a probability of 1/3 for the coin having been heads in a given observer moment.For a sequence with a total of h heads out of w flips, the probability of her being in a heads observer moment is h/[h + 2(w − h)].Thus she just needs to weight this probability by the probability that the sequence that occurs has h heads, P(W h ), which is 1  2 w ( w h ): For w = 1, this is 1/2, for w = 2 it is 5/12, which is midway between 1/2 and 1/3, and for w = 10, the probability of heads drops to about 0. Let us consider what happens if we ask Beauty to wager on whether the coin will come up heads or tails.Can she distinguish whether she is in a reality that corresponds to the inclusive or exclusive case?The answer is no, because they lead to the same result, though for seemingly different reasons.Suppose she is offered x:1 odds that the coin landed heads.We will consider the cases where she bets at every awakening, or only on Mondays.First, consider how Beauty would see the situation on Wednesday, after the experiment is over.Whether she is in the inclusive or exclusive case, she calculates that she has a 1/2 chance of being in a world where the coin came up heads and she won x on Monday and a 1/2 chance of being in a world where the coin came up tails and she lost 1 on both Monday and Tuesday, so she calculates her average winnings to be Thus, she will break even ( = 0) if she is given 2:1 odds.
If the betting occurs only on Mondays, then, whether she is in the inclusive or exclusive case, she calculates that she has a 1/2 chance of being in a world where the coin came up heads and she won x on Monday and a 1/2 chance of being in a world where the coin came up tails and she lost 1 on Monday.Thus Beauty after the experiment calculates her average winnings on Mondays to be and she will break even ( Mon = 0) on Monday bets if she is given even money, 1:1 odds.How can her winnings be the same for the inclusive or exclusive case when her credence for heads differs for them (if she does not know the day)?If she assumes she is in the exclusive case, then her reasoning during the experiment is exactly the same as afterward.She has a 1/2 chance of being in a world where the coin comes up heads and she wins x on Monday and a 1/2 chance of being in a world where the coin comes up tails and she loses 1 on both Monday and Tuesday.Thus she calculates her winnings for betting each day [on Mondays] to be Eq.(136) [Eq.( 137)].The exclusive case and Wednesday results are the same because they both refer to head and tail worlds.
If she assumes she is in an inclusive case, then "she" is in all three of the observer moments, {Mon-H, Mon-T, Tue-T }, and so if she bets in each, her winnings per observer moment are If she bets only on the two Monday moments, then her winnings per observer moment are But to compare apples to apples, we need to know what she thinks the winnings per world will be, which just changes the normalization factor for Eq. ( 138) by the number of observer moments per world, which is 3/2: = 3 2 moment = 1 2 (x − 2).For the Monday case, the number of observer moments and worlds is the same, so Mon = moment Mon = 1 2 (x − 1), and we again get Eqs.( 136)-(137).
So an inclusive Beauty calculates the same winnings per world as an exclusive Beauty.Inclusive Beauty needs 2:1 odds to break even because she wins in only one observer moment of three.Exclusive Beauty needs 2:1 odds to break even because although she has a 1/2 probability of a heads world picked out by the coin flip, whenever she is in a tails world she loses twice.What this means is that there is no practical difference between the inclusive and exclusive cases in this thought experiment and no way to tell them apart.
The question, "What credence do you assign to heads?" has answer "1/3" if Beauty sees herself as being in all three observer moments and "1/2" if she sees herself as living in an H world or a T world.So, in the end, the only difference between inclusive Beauty (thirder position) and exclusive Beauty (halfer position) is that the former sees "herself" in all three observer moments with equal probability and the latter sees "herself" in one of two worlds with equal probability.For the halfer, the person in Mon-T and Tue-T is the same, a temporal continuation of one being, but not the same person as Mon-H because they are mutually exclusive timelines.For the thirder, all three observer moments correspond to the same person, an inclusive viewpoint.Neither of these is inherently right or wrong; it is a matter of how we define "self"-we do not give an answer about which camp is "right" because they are each right given a reasonable set of assumptions.We can analyze the problem with either definition, but there is no physical difference between them, as shown by the identical betting odds for the halfer and thirder viewpoints.
Note that one can rephrase the single-run Sleeping Beauty problem as several equivalent problems, such as the sailor's child problem [49], but the answer is the same: For the inclusive case the probability is 1/3, and for the exclusive case it is 1/2, and there is no way to tell them apart with betting.
Finally, it is possible to construct a similar Gedankenexperiment where betting can distinguish between inclusive and exclusive cases.Motivated by Nick Bostrom's incubator problem [7], Scott Aaronson suggests the following scenario [16]: If a fair coin comes up heads, then Beauty H-One is cloned into existence; if tails, then Beauties T-One and T-Two are cloned into existence.If you find yourself to be one of these people, then what odds would you need to bet that the coin comes up heads?One needs to be extra careful when observers are created like this.In the exclusive case, if H, then you are H-One and you win x; if T , then you are either T-One or T-Two, and you lose 1, so x = 1, you are willing to take 1:1 odds.For the inclusive case, you need to specify your assumptions about personhood.H-One wins x, and T-One and T-Two each lose 1, but which of them are "you"?Here are three possibilities: (i) You are exactly one of the three.You have 1/3 chance of winning x and 2/3 chance of losing 1, so x = 2, you need 2:1 odds.
(ii) You are one person each world.If heads, then you are H-One; if tails, then you are one of T-One or T-Two.You have 1/2 chance of winning x and 1/2 chance of losing 1, so x = 1, you need 1:1 odds.
(iii) You are all three.You have 1/3 chance of winning x and 2/3 chance of losing 1, so x = 2, you need 2:1 odds.
So with the first and the third assumptions, the inclusive case differs from the exclusive one, whereas it does not for the second assumption.As we have stressed throughout this work, carefully specifying assumptions is crucial.

XII. HEURISTIC SUMMARY AND FUTURE DIRECTIONS
We fully recognize that some readers interested in the topic of observer selection effects are not used to as much math as we used.To that end, we provide a heuristic summary of our main results.We end by pointing to some directions in which this line of research may proceed.
Our central goal was to study the claim that there is a doomsday effect-that by taking into account one's temporal location in a world that datum leads one to conclude that the world will end sooner than one otherwise would have thought.Along the way, we built the tools needed to investigate that claim, laid out arguments about when the doomsday effect holds, and discussed related issues, such as the problems in cosmology due to Boltzmann brains.
Throughout the paper, we discussed probabilities of selecting "people" from some set P. Usually the people were the observers in the problem.The key distinguishing element about whether there is an OSE or not is if the selection is a Pick or a Be-whether one first picks a "world" that the person belongs to or whether no such picking is needed because the person just is in the world.
In Sec.II, we explored the latter via the prisoner problem.If you are a prisoner in a cell, then no one has to select that cell, cellblock, or prison for you to experience an observer moment there.You just are there.As a result, you are more likely to Be in a cellblock type L, which has more prisoners than a cellblock of type S, and that effect exactly cancels the effect of learning rank information d, which would otherwise favor you in being in a cellblock type S (see the left half of Fig. 1).
Contrast that to Sec.III, where we considered the warden problem, where a warden has to pick a cellblock before selecting a prisoner.This is the way things usually work when not selecting observers: When the entity being selected is in an enclosing set, such as a prisoner in a cell within a cellblock, to select them one has to pick the outer set, such as the cellblock, first.The effect of this Pick is to nullify the counteracting effect, seen in the Be case, due to the number of prisoners.The result is that the rank information d does tell you that if you are picked by the warden, you are more likely to be in a cellblock type S (see the right half of Fig. 1).
Actually, to be more precise, the issue is whether there is any selection beyond the one needed on the innermost (leftmost, in our notation) set and not whether that selection is a Be or Pick.If the selection on the leftmost set is the only one, then we call it inclusive selection.If there is a selection on one or more of the enclosing sets, then we call it exclusive selection.In most of the inclusive cases we considered the selection of the innermost set was a Be.This is unsurprising, because in order to physically select elements of a set within some set of "worlds," one usually must pick the "world" (urn, cellblock, or civilization) first.(We did give a counterexample, the warden cafeteria problem, where the warden directly picks a prisoner in the cafeteria, circumventing the enclosing set W (the prisoners are still labeled by the "world" that they belong to, just not constrained to be selected via that world).And it is also possible to have a Be selection on a set other than the leftmost set by making P an enclosing set for some other set which the observer picks from, and then the situation will necessarily be exclusive.) We then explored the concepts of inclusive and exclusive selection by extending our analysis of the prisoner problem to the largest physical enclosing set in the problem, which we call E .For our problem, this corresponds to a set of prisons containing various fractions of S and L cellblocks.In the inclusive case (Sec.IV), the only selection is on the leftmost set (a Be selection of set P).We then considered exclusive selection (Sec.V), where there is selection on E in addition to the Be selection on P. As in the prisoner problem, we found that there is no OSE in the inclusive case.In the exclusive case, there is an OSE, but its magnitude depends on our prior assumptions.One can find effects which range from nearly no OSE to an OSE as large as in the warden case (see Fig. 2).The larger the differential between the choices one picks from, the larger the OSE.We can generalize E to comprise "everything," a set of all possible universes.Inclusive selection then corresponds to the inclusiverse, which we also later called the complete multiverse, which simply means that we assume all possibilities are realized.Exclusive selection corresponds to an exclusiverse, where only some possibilities are realized.
Next, in Sec.VI, we added an enclosing set of theories, .We tend to view theories and hypotheses as mutually exclusive: One must pick one and then analyze the resulting scenario.But that Pick introduces an OSE because now the selection is exclusive, so one should be careful not to promote coexisting possibilities to hypotheses, such as "I am in an S cellblock."Instead, one should say that there are multiple physical cellblocks, and we are in one of them with some probability for being in an S cellblock.If we really want to have coexisting hypotheses, then we would need to have inclusive selection on , a "theoryverse" if you will.That is not as unreasonable as it seems.For example, the string landscape predicts multiple coexisting theories.Another avenue we took in this set-of-theories analysis was to ask if we can probe whether we live in the inclusiverse or an exclusiverse.It is not generally possible, because it is usually impossible to disentangle other effects.We also briefly discussed the presumptuous philosopher problem.It is not a problem for us because we do not make use of something called the selfindication assumption and argue against its use.(We noted in several places that if we use the SIA-where a weighting factor for observers is put in by hand instead of it arising naturally out of typicality and keeping track of how observers are selected-then we get the wrong answer when there is exclusive selection.The presumptuous philosopher problem is an example of this.) Thus far, we had assumed that whatever selection was done, was "typical," that is, corresponding to what one would get by random selection of a given subset of entities from a set.We relaxed that assumption and found that any atypical selection can be made typical by a simple redefinition of the relevant sets.This allowed us to address the question of Boltzmann brains, which are hypothetical freak observer moments which arise from very rare fluctuations.They are a problem in a stupendously large universe where it is possible for them to dominate normal observers, which are confined to a small subset of the spacetime.This is a consistency problem because we must assume that we are not freak observers for us to argue that we have a correct understanding of the world, so that understanding is inconsistent if it predicts that we are freak observers.We examined an argument by Boddy et al. [32] that there are no self-aware freak observers because at late times the Universe will be an empty exponentially expanding de Sitter space with no decoherence to split into "many worlds."We argued that there could be decoherence effects from diluted matter, but an upper bound on the typicality of that is so small that it counters the huge number of future freak observers such that, by this argument, there are essentially no self-aware freak observers.We also used the analysis of Hartle, Hertog, and Srednicki to demonstrate a "first-person probability" effect which is somewhat orthogonal to ours-that when models with observers are scarce, models with more places for them to be are favored, even with exclusive selection.Conversely, if all viable models allow potentially many freak observers, then those with fewer places for those freak observers to fluctuate into existence are favored.
We then considered the analysis of J. Richard Gott III in Sec.VIII, which seems to constitute a different kind of OSE.He argued that one can bound the probability of a world lasting time T using an observer's time t since the start of the world; this is strange because the selection seems to be inclusive: just the Be selection of the observer.One problem is that his original treatment did not include a prior, which is essential.We showed that one needs a fast falling (∼1/T 2 ) prior to reproduce his results.Then there is an effect, but it is not an OSE, rather just an artifact of the fast-falling prior.However, if we consider a scenario with a Pick selection of a unique lifetime for the world, and the prior falls with T , then there is an OSE.
All of this prepared us to address, in Sec.IX, the doomsday question, "Do observer selection effects increase the probability that our world will be short-lived?"The answer is "probably not."One must first write the question in a scale-invariant way, by which we mean that it makes just as much sense to ask at any timescale during the world.A question that could work is "Will our world last K times its present age?," which naturally leads to using the formalism we developed in Sec.VIII for the Gott analysis.There are scenarios where it is reasonable for the selection there to be exclusive, and it is possible to conclude that there is a doomsday effect but only under a set of assumptions akin to those listed at the end of Sec.IX.
Several papers have argued for a universal doomsday effect, which says that our data imply that worlds on average are probably more short-lived than we would have estimated without our data.We showed that universal doomsday and doomsday are inextricably linked because if our expectation for the fraction of short-lived worlds changes as a result of our data, so does our expectation for the lifetime of our world and vice versa.So the assumptions needed for a universal doomsday effect are the same as those needed for a doomsday effect.
We then applied our formalism to a somewhat different scenario called the Sleeping Beauty problem.Beauty is woken once or twice during an experiment, depending on a coin flip, and her memory of each awakening is deleted.What probability should she assign to the coin having come up "heads"?This would seem to be trivial but has led to philosophers dividing into two camps, "halfers," who would assign probability 1/2, and "thirders," who would assign probability 1/3.It turns out that they are both right.The problem is that the question is insufficiently clearly posed and each answer is right, given a particular question.If Beauty views "herself" as occupying the three equally likely observer-moments, the inclusive case, then she agrees with the thirders.If, on the other hand, she views "herself" as being in one of two possible timelines-in the one waking session of the "heads world" or the two waking sessions of the "tails world"-then she will agree with the halfers.These are both reasonable ways of interpreting who "she" is.They might also be interpreted as implying whether the world is a multiverse (in the inclusive case) or not (in the exclusive case), though this is an extrapolation-all she is really doing is assuming one or the other definition of self.Anyway, the two cases are physically indistinguishable.For example, we showed that both cases lead to precisely the same betting outcomes, though Beauty arrives at the same correct odds of winning in each case for different reasons.We also discussed multiple trials, and the creation of observers, which may help extend the formalism of the paper to more general problems.
So we have explored multiple ways in which it matters how observers are selected.The key factor is whether the selection is inclusive or exclusive.There can be an OSE in the latter case but not the former, at least for the problems we considered.Inclusive selection means that all events considered actually occur, though you may not experience them, such as prisoners being in an S and an L cellblock.Exclusive selection means assigning nonzero probabilities to some events which do not occur, such as picking an S or L cellblock.So Observer selection effects arise from assuming that there are some possibilities which are not realized.Among other things, to have a doomsday effect requires such an exclusive selection, which we wrote as, "There is exactly one true value of the lifetime, T * , because you consider only one world with one fixed future."It is thus crucial that one carefully lays out all of one's assumptions, because whether there is an OSE or not depends on them.
Finally, we lay out some possible future directions for this work.
A simple direction to go in is to relax some of the assumptions we made, such as ρ being constant across the ensemble of possibilities or that the subsets are nonoverlapping (see Appendix) to generalize our results.
Almost all of our analysis was classical.It would be interesting to explore further the quantum context.One consequence is clear: If quantum theory corresponds to something like the many-worlds interpretation, then we are in a multiverse with inclusive selection of events.If there is "wave-function collapse," so that there is only one reality, then there is an exclusive selection.But a comprehensive evaluation of our discussion in the quantum context may turn up interesting results.For example, what of quantum observers, which comprise superpositions of observer states?
Another avenue of inquiry is how to analyze a theoryverse, such as the string landscape.Is it reasonable to assume the inclusive case?In other words, should we sum probabilities of "observers like us" from different parts of the string landscape which contain observers similar to us despite operating with different physical laws?If so, then it is not the probability of a given vacuum in the landscape that matters but that probability times its effective number of observer moments.
Finally, while we discussed atypical observers, and the problem of Boltzmann brains, there is perhaps more to learn from studying what one might call "freak observers," any observer who happens to experience freakish conditions.There are many metrics for "number of observers" in addressing the problem of Boltzmann brains, and it would be useful to see if our results shed any light on them.Also, in a multiverse there are otherwise normal observers who happen to experience statistical fluctuations of many standard deviations who draw erroneous conclusions.How do we treat such observers, especially with the recognition that it is not impossible in a multiverse that we are one of them? by some scaling factors-see Sec.VII on typicality).Let us define P(A a ) to mean "the probability that a randomly selected element of set A belongs to subset A a ."Note that P(A) = 1, since an element selected from A belongs to A by definition.So P(A a ) = P(A a |A) because the conditional A just means that "an element was randomly selected from A," which is already part of the definition of A a .With these assumptions, So long as we are selecting from one set only, there is no ambiguity.But if we are selecting from compound set AB with set A nested in set B, there are two possibilities: Either we first select an element of B j of B, and then an element A i, j which corresponds to (is "in") element B j , which we call to Pick, or we directly select the element A i , despite being nested in set B, which we define as to Be.One has to pick a nut from a jar: Select a jar B j and then select a nut from within the jar.But if the elements of A are themselves observers, say, prisoners in specific cellblocks, then there is another way to select: You can be a prisoner in a cellblock without having to perform a cellblock selection-you are just there.(It is possible to Pick directly from set A even if it is nested in B if the correspondence between A i, j and B j is not really to be "in" it.For example, set B could correspond to a label, S or L, we place on each nut, and toss them all together and randomly select one.No jar selection is needed to do that, yet the nesting is preserved by the labeling.We mention this briefly in Sec.III with the warden cafeteria problem.) Be probabilities are simple, just the fraction of elements in the inner set meeting the criteria: ).Note that we have put a slash through the Picks in the first term of the righthand side.We will call such Picks neutered because we are conditioning on the fact that an element was chosen from subset B b , and thus no action is needed before selecting the element from A. Thus, the probability with a neutered Pick is the same as for a Be, e.For example, the probability of picking a small jar and then picking a cashew given that one picked a small jar, is the same as picking a cashew given that one picked a small jar.So the Pick probabilities are The astute reader may wonder why the selection on the leftmost set differs from the selection of the sets to its right.Actually, it does not, and we could put a " | " to the left of every leftmost set.But our notation assumes that there is a selection on the leftmost set.So really " | " means a selection done on a set other than the leftmost set.[Note that one can have a set to the left of an observer, and then one needs to insert a selection " | " to the left of the observers set, e.g., C | P, where C are cards and P are observers, and although that observer is Be selected (i.e., just is), this is exclusive selection since there is a selection other than on the innermost set.]Let us explore conditional probabilities, such as the ones we employed above, where there is one set of selections given another.Here are the nontrivial possibilities (keeping in mind that P(A a B|AB b ) = P(A a B b |AB b ), etc.): (i) P(A a B|AB b ): the probability that we select an element of type a from A nested in B given that we select an element of A that corresponds to an element of B of type b.
(ii) P(AB b |A a B): the probability that we select an element of A that corresponds to an element of B of type b given that we select an element of type a from A nested in B.
(iii) P(A a | B|A | B b ): the probability that we select an element of B and then select an element type a from A which is associated with that element of B given that we select an element of B of type b and then select an element of A associated with that element of B.
(iv) P(A | B b |A a | B): the probability that we select an element of B of type b and then select an element of A associated with that element of B given that we select an element of B and then select an element type a from A which is associated with that element of B.
For example, P(A | B S |A c | B) is the probability to pick a small jar and then pick a nut from that jar given that we pick some jar and then pick a cashew from it.There are actually only three nontrivial possibilities because the first and the third are equal since the selection in the third is In Eq. (A10) we showed that P(A a | B b ) is not in general equal to P(B b ), because the selection of an element of type a adds a nontrivial weighting factor.That is because there is an implied conditional A | B: We take it as a given that we pick some element of B and then some element associated with that element from the whole set A, i.e., P(A a | B b ) means . But sometimes we want to redefine the set A we select from so that it is some subset of qualifying elements.For example, if our jars contain peanuts, cashews, and pebbles, but our selection process ensures that only nuts are picked, then we are really concerned with the subset A nut of cashews and peanuts.To help clarify such situations, we write redefined sets with square brackets since one selects some element of [A re ] with certainty.Now, one might object that there is a lot of redundant information in the above notation, namely the set labels A and B. We think it is important to retain those labels if there is any confusion about which sets are considered, which subset labels correspond to which set, and which sets have a Pick on them-an issue if there are more than two nested sets.But if there are only two nested sets which are the same throughout some calculation, and the subscript labels are unique to a set, we can use a compact notation by omitting the set names while TABLE I. Summary of our major results using compact notation where the list of sets provides a key for the location of the Picks.Worlds J = S or L. For three or more sets we use a double-Pick mark to avoid ambiguity.For "probing a multiverse" h = in or ex (it is probably advisable not to use compact notation for four sets with controlled-Picks).The weighted averages are f (y) ≡

P
(A a ) = P(A a |A) = can thus replace (N b /N ) in Eqs.(A2) and (A3) with P(B b ).For example, if A is the set of cards in a deck, then P(A clubs ) = 1/4 and P(A aces ) = 1/13.
P(AB) = P(A)= n n = 1, P(AB b ) = P(A ,b ) = n ,b n = n,b n P(B b ), P(A a B) = P(A a ) = n a n = na n , P(A a B b ) = P(A a,b ) = n a,b n = na,b n P(B b ).(A8)Pick probabilities are weighted by the selection that first must be made on set B. We use a superscripted vertical bar | to indicate a Pick from the set immediately to its right.It is akin to a conditional within the statement, e.g., "A a | B b " means "we pick an element of type b from set B and then from the elements of A corresponding to that element of B we select an element of A that is in subset A a ."This is the same as saying "we picked an element in A a from A given that we picked an element of B b from B." If there are no subset labels indicated to the left of a Pick, then the situation is as if we are ignoring that set.So P(A | B b ) = P(B b ) because after we pick an element type b from B with probability P(B b ), it is certain that the element we pick from A is from subset A (which is just the whole set A). (We assume that there is some such element of A, i.e., A a,b = ∅.)If there are subsets specified to the left of the Pick, such as in P(A a | B b ), then we can write it as a product of conditional probabilities defined below, P(A a | B b |A | B) = P(A a B b |A B b )P(A | B b g.,P(A a B b |A B b ) = P(A a B b |AB b ) = na

P(A | B) = 1 P
(A | B b ) = P(B b ), P(A a | B) = b P(A a | B b ) = b na,b n,b P(B b ), P(A a | B b ) = P(A a B b |A B b )P(A | B b ) = na,b n,b P(B b ).(A10) neutered:P(A a B|AB b ) = P(A a B|A B b ) = P(A a,b ) P(A ,b ) = na,b n,b , P(AB b |A a B) = P(A a,b ) P(A a ) = na,b na P(B b ), P(A | B b |A a | B) = P(A a | B b ) [A re  ].This new set then has subsets [A re ] a,b , and we can write the number of elements in these as [n re ] and [n re ] a,b , and so on.Now set [A re ] acts like A did in Eq. (A10),P([A re ] | B) = 1, P([A re ] | B b ) = P([A re ] | B b |[A re ] | B) = P(B b ), P([A re ] a | B) = P([A re ] a | B|[A re ] | B) = b [n re ] a,b [n re ] ,b P(B b ), P([A re ] a | B b ) = P([A re ] a | B b |[A re ] | B) = [n re ] a,b [n re ] ,b P(B b ), (A12)
35.For larger and larger w, P(W h ) is approximately a narrower and narrower Gaussian centered on h = w/2, and the probability for Beauty's heads observer moments gets closer and closer to 1/3.In other words, exclusive selection with a large number of indistinguishable trials becomes indistinguishable from inclusive selection.