On the cost of misperception: general results and behavioural applications Article

In a choice model, we characterize the loss induced by misperceptions of payoﬀ-relevant parameters across a distribution of decision problems. When the agent cannot avoid misperceptions but has some control over the distribution of errors, we show that strategies that minimize loss from misperception exhibit systematic biases, akin to some documented in the behavioural and psychological literatures. We include illusion of control, order eﬀect, overprecision, and overweighting of small probabilities as illustrative examples.


Introduction
Within economic discourse, the idea that the human mind is imperfect and that some information is necessarily lost during any decision process dates back at least to Simon (1955). Under this premise, an agent faces, apart from the standard action choice, a problem of error management. In economic decision making, some mistakes in perception of payoff-relevant parameters are costlier than others, and this asymmetry has an impact on the frequency of different types of perception errors. In this paper, we relate error management to known observed behavioural biases. In order to understand the direction of biases arising under second-best perception strategies, we need first to understand the associated costs of under-or over-estimating payoff-relevant parameters.
Our agent first receives some statistical information on a state of nature which can be either high or low, and this information translates into an objective belief p that the state is high. In a choice stage, she recalls this information imperfectly and ends up with a subjective belief q that the state is high. She then chooses an action that is optimal under her subjective belief q within a fixed choice set. Due to belief distortion, her choice may differ from the optimal one. We ask how large is the payoff loss incurred from the misperception of p for q.
We derive a simple and intuitive characterization of the loss from misperceiving p as q. This loss can be expressed as an integral formula that depends only on the second derivative v of the value function. The value function v(p), for each probability p of the high state, specifies the payoff of the optimizing agent who has correct perception.
This characterization makes the loss from misperception simple to compute in several applications in which the direct computation is cumbersome.
Equipped with this loss characterization, we study a class of error-management problems in which the agent chooses a distribution of perception errors that performs well across all decision problems that she encounters in her environment. We follow principles from the ecological rationality literature, in that our agent is unable to reoptimize the perception strategy in each encountered decision problem separately but, instead, must choose a perception heuristic that fits her environment. 1 The model is as follows. Based on all available information, the agent forms an objective probability p of a high payoff state. She then memorizes a probability m in the perception stage, and recalls a perturbed value q = m + ε in the choice stage, where ε is a noise term with symmetric density. Then, she encounters a random decision problem drawn from her environment and chooses an action that she perceives as optimal under her belief q and the encountered utility function.
One feasible perception strategy is to memorize the true probability value. Under such a strategy, the recalled probability is in expectation equal to the true one. It turns out, however, that this unbiased perception strategy is generically suboptimal, and the agent benefits from memorizing m distinct from the true probability p.
We characterize the direction of the optimal perception bias based on monotonicity properties of the second derivative v of the value function. When v is decreasing in a neighborhood of the true observed probability, we show that the agent memorizes a probability higher than the true one and thus exhibits an upward perception error bias.
When v is decreasing, her memory exhibits a downward bias. In applications with Gaussian distributions of payoffs, the value function v is easy to compute, and the direction of monotonicity of v is easy to characterize. Our results then allow us to make clear-cut predictions on the direction of the bias in these environments.
We detail here one of our applications to illustrate why second-best perception strategies may be biased. Consider an agent who chooses between two binary lotteries. Each available lottery is defined by the rewards it pays in the high and low states of the world. The true probability of the high state is p, but the agent perceives this probability to be q = m + ε, and chooses the lottery that maximizes the perceived expected reward. The agent's environment contains a whole range of such decision problems, where each decision problem is represented by all four lottery rewards. We model this by assuming that all four rewards are drawn iid. from a standard normal distribution. The perception strategy m(p) is chosen before the agent observes the realization of these draws, hence independently of the lottery rewards.
A misperception of q instead of p causes a loss only when the optimal choices at p and q differ. This happens precisely when there exists a probability s between p and q such that the agent is indifferent at s between the two lotteries. The likelihood that such a tie arises at s depends on s in an intuitive way. Since the expected value of each lottery is a convex combination of the two standard normal draws, its variance is lower the closer s is to 1/2. Thus, the likelihood of a tie at probability s is a single-peaked function of s attaining its maximum at 1/2. This implies that misperceptions of p for a value q towards the direction of 1/2 are more likely to distort choices than misperceptions in the opposite direction: undervaluation of one's own ability to predict the binary state leads to suboptimal choice more often than the symmetric opposite error. Since the agent can shift the distribution of her perception error by controlling the memorized probability value, it is optimal for her to memorize a value further away from 1/2 than the truth, hence to exhibit an overprecision bias.
The above intuition indicates that the optimal perception strategy depends crucially on at which probabilities the agent is likely to experience a tie between the available actions.
Given a draw of the lottery rewards, the value function is piecewise linear and has a kink (or non-differentiability point) at a probability leading to the tie. Ex ante, across all the decision problems, v (q) thus reflects the likelihood that the agent encounters a decision problem in which such a tie occurs at q. This explains why v (q), the main object of our analysis, captures the likelihood of a tie occuring at probability q.
Apart from the overprecision application, we illustrate our methodology in three additional examples: illusion of control, order effect, and probability weighting. In each application, we make natural assumptions about the distribution of utility functions encountered by the agent in her environment and derive the optimal bias that arises. We relate these biases to stylized facts from psychology and behavioural economics. In the illusion-of-control example, the agent overvalues her impact on her own well-being. In the order-effect application, she exaggerates the quality difference among two available objects observed in the first of the two periods. In the probability-weighting example we revisit the model of Steiner and Stewart (2016), who derive the overweighting of small probabilities as a second-best perception heuristic.

Related literature
Rate-distortion theory studies optimal communication via a noisy channel, Shannon (1948Shannon ( , 1959. One of the primitives of this theory is an exogenous map that specifies a loss to each input and output of the communication. A popular loss function is the square error. In our paper, the loss function is derived from the agent's environment and is equal to the average welfare loss caused by the distortion. 2 Once we establish the relevant loss function, we let the agent engage in optimal error management: she avoids the costlier types of the errors. Beyond information theory, error management has been studied in several scientific disciplines such as biology and psychology. See Johnson et al. (2013) for an interdisciplinary literature review and Alaoui and Penta (2016) for a recent axiomatization of the cost-benefit approach to error management within economics. We contribute to this literature a formal characterization of loss from misperception based on a statistical description of the agent's 2 An appropriate definition of the loss function is a nontrivial and important part of the data compression problem. See the discussion of the loss function for compression of images in Wang et al. (2004). environment. Sims (1998Sims ( , 2003 has introduced an exogenous information-theoretic capacity constraint to economics. The subsequent literature on rational inattention studies the information acquisition and processing of an agent who, as in our model, is unsure about a payoff-relevant parameter but who, unlike our agent, knows her payoff function. In that setting (assuming signal cost is nondecreasing in Blackwell informativeness), the agent acquires only an action recommendation and no additional information beyond that needed for choice. In our model, the agent does not know her payoff function when she processes information and thus forms beliefs beyond those needed for the mere action choice in any given problem. The resulting optimal information structure can then be naturally interpreted as a perception of the payoff parameter. 3 Our assumption that the perception strategy is optimized across many decision problems has appeared in the literature on the evolution of the utility functions. This literature studies the performance of decision processes across a distribution of the fitness rewards to the available actions. Depending on the constraints assumed, the optimal decision criterion is either the expected utility maximization, as in Robson (2001), or its behavioural variants, as in Rayo and Becker (2007) and Netzer (2009) Two exceptions who, like us, focus on the probability perception are Herold and Netzer (2010), who propose that biases in probability perception serve as a correction of another behavioural distortion; and Compte and Postlewaite (2012), who conclude that an agent with limited memory benefits from ignoring weakly informative signals.
We contribute to the branch of the bounded rationality literature that emphasizes limited memory. Mullainathan (2002) studies the behavioural implications of exogenous imperfect memory usage. Like us, Dow (1991), Hirshleifer and Welch (2002) and Wilson (2014) study the behavioural implications of an agent who optimizes her usage of limited memory, but they examine effects that are different from ours. In his survey, Lipman (1995) focuses on memory-related frictions.
This paper generalizes the model of Steiner and Stewart (2016), who study optimal probability perception in a choice between a binary lottery with random rewards and a fixed outside option. The contribution of this paper beyond Steiner and Stewart is threefold: (1) We significantly enlarge the class of settings by allowing arbitrary payoff distributions 3 Within the rational inattention literature, Woodford (2012a,b) studies the behavioural biases resulting from perception frictions; in these studies he assumes the mean square error loss criterion. and arbitrary action sets; (2) we characterize the perception loss function from the more fundamental value of information function, and (3) we deliver new behavioural insights.
Two of our derived biases, the illusion of control and overprecision, are classified among so-called positive illusions (Taylor and Brown, 1988). Such positive belief biases are often rationalized by assuming that the agent derives felicity from holding favourable beliefs about herself; see for example Brunnermeier and Parker (2005), Caplin andLeahy (2001), andKöszegi (2006). In contrast, belief distortions in our model are purely instrumentalthey guide choice-and arise in the absence of any felicity benefits.
The paper is organized as follows. Section 2 introduces the model, and our loss characterization is presented in section 3. We develop general results on the direction of biases in section 4, and applications in section 5. We show an example of a sophisticated Bayesian agent in section 6. Section 7 concludes.

A Model of Misperceptions
An agent faces a randomly drawn decision problem. She chooses an action a ∈ A = {1, . . . , n}, n ≥ 2, and receives payoff u(a, θ) where θ ∈ {0, 1} is a payoff state. The objective probability of the state θ = 1 given all agent's information is p. The agent's perception of the probability distribution over θ is imperfect. Correspondingly, we distinguish between an objective probability p and a subjective probability q of the state 1. The objective and subjective probabilities of the state 0 are 1 − p and 1 − q respectively. The payoff functions u(a, ·) are extended from θ ∈ {0, 1} to all beliefs in [0, 1] through the expected utility formula u(a, p) = pu(a, 1) + (1 − p)u(a, 0).
We study a two-stage decision process in which a perception stage is followed by a choice stage. In the choice stage, the agent knows the realized payoff function u that defines the decision problem she faces, mistakenly perceives the true probability p as q, and chooses an action a * q,u = arg max a∈A u(a, q) that maximizes her perceived payoff.Finally, her realized payoff is u a * q,u , p . 4 The payoff function is uncertain in the preceding perception stage. Accordingly, the payoff function u = u(a, θ) a,θ and the probability p of state 1 are random variables with supports R 2n and [0, 1], respectively, both admitting atomless density. We assume u and p are independent to ensure that the agent does not draw inferences about θ from the observation of the payoffs u, and vice versa. The payoff of the optimizing agent who misperceives the objective probability p as q, averaged out across the draws of the payoff function u, is The value function v(p) = V (p, p) represents the expected payoff of the optimizing agent who perceives the objective probability p correctly. The expected loss due to misperceiving the true probability p as q is then Our aim is to characterize how the distribution of payoff functions u translates into the loss function L.

Loss Characterization
Our main technical contribution is a characterization of the loss function L in terms of the value function v. 5 Lemma 1. Assume v is twice differentiable on [0, 1]. Then, the expected payoff loss due to misperceiving the probability p as q is As defined, the loss function L depends on the multidimensional distribution of underlying payoff functions u, and it depends on two parameters, p and q. Lemma 1 shows that L is entirely characterized through the second derivative v of the value function. It is remarkable that the underlying distribution of u acts on L only through the one-parameter function v . The lemma is particularly useful in settings where the loss function L cannot 5 We have stated the loss characterization result in Lemma 1 under the assumption that v exists. When the distribution of the utility functions exhibits atoms, then v has kinks and thus it is not twice differentiable. Yet, with the use of the generalized functions of Schwartz (1957), Lemma 1 extends to those cases. Assume for illustration that with probability one the agent's payoff is 1/2 if she correctly announces the realized state and 0 otherwise. Then v(p) = max{p − 1/2, 1/2 − p}/2, v (p) is the Dirac delta function δ(p − 1/2), and L(p, q) = q p v (s)(s − p)ds = |1/2 − p| if 1/2 is between p and q, and the loss is 0 otherwise. be easily computed directly, but the value function v can. This is the case for instance when the joint payoff distribution is Gaussian since then the value function admits an analytical characterization; see section 5.
In section 4, we allow the agent to control the distribution of perception errors, and show how monotonicity properties of v translate into the direction of second-best perception biases. Intuitively, higher values of v (q) correspond to higher marginal sensitivity of the loss L(p, q) to a marginal variation in q.
We illustrate Lemma 1 in figure 1. The left graph represents u(a, p) for all actions a under a particular realization of the payoff function u. The agent with perception q chooses the action associated with the full line, whereas the optimal choice at the objective probability p is the action associated with the dotted line, which entails the loss L u (p, q) = u(a * p,u , p) − u(a * q,u , p) under the payoff function u. If we let v u (p) = max a u(a, p), then the loss L u (p, q) appears as the difference at p between the function v u and its linear approximation by its tangent at q. When averaging among all payoff functions u, the right graph depicts the expected loss as the error of the linear approximation of the expected value function.

The role of v
The second derivative of the value function plays a central role in our analysis. Here, we offer intuitive explanations of its role in our results.
Let us start with an interpretation of v (p). For each realized utility function u, the value function v u (p) is linear in the neighborhood of p unless v u (p) has a kink (or nondifferentiability point) near p. This implies that v (p) is determined by the stochastic occurrence of these kinks. Assume u has a density ρ over R 2n . We define the intensity of the kink of v u at p as the difference between the right and left derivatives be the set of utility functions u that have a non-differentiability point between p and q. Then, for a small ε, where p u is the non-differentiability point of v u (p) between p and p + ε. 6 The above expression shows that v (p) admits a natural interpretation as the product of the expected intensity of a kink at p times the "density" of the kinks at p.
We point out a geometric representation of the loss L u (p, q) under payoff function u.
This loss is positive when the optimal choices at p and q differ, and hence when the agent is indifferent between the first-best and the second-best action at some s between p and q.

Figure 1 illustrates a situation in which such indifference arises at two values s and s
between p and q. In this case, the loss L u (p, q) is the sum of two parts, L s and L s , where each term is identified with a tie arising at probabilities s and s . Each of these terms equals the distance of the associated tie (s or s ) from the true p, multiplied by the intensity of the corresponding kink in v u . The ex ante expected loss L(p, q) is thus an integral q p v (s)(s − p)ds, where the weight v (s) captures the likelihood of u(·, s) having a tie, compounded by the intensity of the corresponding kink of v u at s.

General characterization of v
We now offer a characterization of v (p) for general payoff distributions that formalizes the intuitive discussion of v (p) from the previous subsection. For each action a, we let u a be the vector (u(a, 0), u(a, 1)) ∈ R 2 and recall that for p ∈ [0, 1], u(a, p) = pu(a, 1) + (1 − p)u(a, 0).
We let u = (u a ) a∈A ∈ R 2n denote the vector of all utility values, and for any pair of actions a, b we let u −ab = (u a ) a =a,b ∈ R 2(n−2) . Given a utility level ν ∈ R, we let D ab (ν) = {u −ab : ∀a = a, b, u(a , p) < ν}, and introduce vectors w p = (p, p − 1), e = (1, 1).
Finally, denotes the Euclidian norm in R 2 . We assume that u admits a density ρ and let ρ ab denote the marginal density of (u a , u b ) ∈ R 4 .
To make sure that v (p) is finite, we impose a bound on the density of u. We say that the density ρ satisfies the tail condition if there exists a mapping ξ : Proposition 1. If the vector u of the utility values admits a density ρ and ρ satisfies the tail condition, then v is twice differentiable and for every p ∈ [0, 1]: Expression (2) formalizes the intuition from the previous subsection. It integrates over actions pairs a, b and utility realizations such that the tie arises between a and b at p, u(a, p) = u(b, p) = ν, and all other actions a = a, b are inferior at p, u(a , p) < ν. Each

Second-Best Perceptions
The first-best perception strategy, which minimizes losses, sets q = p for all p. We assume that perception is constrained to be noisy and study second-best perception strategies.
To model the noise, a perception strategy associates to each objective probability p a distribution over subjective perceptions q. A strategy assigns a message m(p) to each objective probability p ∈ [0, 1], where we interpret the message as the physiological stimulus encoding the probability p. The message m triggers a perception q = m + ε, where ε is a random variable with atomless density g(ε) taking values in [−σ, σ] for some parameter σ ∈ (0, 1/2). Thus, a perception strategy is a map m : We normalize the error term ε to have zero mean. Since the simple additive error specification is not well suited for modeling perception of probabilities near the boundaries of the probability 7 Equation (2) integrates over the square of the kink intensity because the kink intensity contributes to v (p) both directly and also via its impact on the density of the kinks in the neighborhood of p.
interval, we ensure that every perception q is in the [0, 1] interval by restricting m to A particular example of a perception strategy is the unbiased perception m(p) = p for p ∈ [σ, 1 − σ]. This strategy does not exhibit a systematic error since E[q | p] = p. The unbiased strategy is a benchmark to which we compare the optimal perception strategy m * (p) that solves a naive perception problem for each p, 8 (3) Note that (3) is equivalent to the maximization of the expected payoff E ε V (p, q).
We summarize the two-stage decision process as follows. The agent observes the objective probability p and memorizes a probability m(p) before she learns the payoff function u. At the choice stage, the agent recalls probability q = m(p) + ε (but does not recall p and ε), and chooses an optimal action a * q,u under her perception q and payoff u. The next result presents a simple first-order condition. It implies that the secondbest perception strategy is unbiased in a generalized sense once the perception error q − p is weighted by v (q). Compared to the naive unbiased perception strategy m(p) = p that minimizes the common but ad hoc mean-square-error loss criterion, the optimality condition weights errors by their expected impact on the agent's performance, where all these considerations are summarized by the weight v (q). In section 5, we provide four illustrative applications in which v (q) has intuitive monotonicity properties that allow for qualitative predictions of the second-best biases.
Proposition 2. Let random variable q = m * (p)+ε be the stochastic perception when m * (p) solves the naive perception problem (3) and the true probability is p.
To gain some intuition about this result, consider a message m ∈ [p − σ, p + σ]. 9 A marginal increase in m translates to a marginal increase of the perception q = m + ε.
This marginal change in q affects the agent's payoff only if it affects her choice-that is, when a tie arises in between the first-best and the second-best action at q. The first-order 8 A solution to the naive perception problem exists for each p since the objective is continuous in the message m and the message space is compact.
condition thus minimizes the expected perception error weighted by the likelihood of a tie at q = m + ε, adjusted for the intensity of the kink, that is, weighted by v (q).
Note that in the very particular case where v is quadratic and v is constant, the condition of Proposition 2 is satisfied by the unbiased perception strategy m(p) = p.
In the general case, we derive the direction of the second-best perception bias from the monotonicity properties of v . The result formalizes the intuition that the agent unable to avoid errors altogether avoids the costlier ones.

If, for all
The agent considered in this section is naive in that she does not fully use the information retained up to the choice stage. A sophisticated agent could improve her performance by forming the bayesian posterior E[p | q] based on her perception q, and choose an optimal action under this posterior belief. In section 6, we study such a sophisticated agent in an example. We show there that the optimal sophisticated perception continues to be closely related to the monotonicity of v , and that the qualitative results about the direction of the bias derived for the naive agent extend to the sophisticated one.

Applications
In this section we use our model to provide microfoundations to behavioural stylized facts.
In each of our four applications, we specify simple distributional assumptions on the payoffs u, derive v and use its monotonicity properties to find the direction of the second-best perception biases. Most, though not all, of our results hold for the Gaussian environment since it facilitates analytical derivation of the value function v.

Illusion of control
The term Illusion of control, introduced by the psychologist Langer (1975), refers to the overestimation of one's ability to impact one's own well-being. As an extreme example, casino visitors may overestimate the relevance of their choices over payoff-equivalent lotteries.
In this example, the agent is either of the low type, θ = 0, or of the high type, θ = 1. At the choice stage, she knows the payoffs u(a, θ) assigned to all available actions a ∈ A and both types θ ∈ {0, 1}. She is uncertain of her type, and instead of the objective probability p, she attaches the subjective probability q to being the high type.
We assume that the choice of the high type has a greater impact on her well-being than the choice of the low type in most (but not all) of the decision problems that the agent encounters in her environment. This may be the case if the high type stands for a higher talent, and actions of talented people have relatively larger consequences in typical (but not all) life situations.
The assumption may also hold if the payoff consequence of each action of the low type is influenced by luck to a larger extent than the payoff of the high type, so that the low type has relatively low ability to affect her expected well-being. 10 We capture the relatively higher control of the high type over her well-being by assuming that the payoff function is generated by the following process: whereũ a and η a are independent normal random variables, iid. across actions, and τ 1 > τ 0 > 0 are two fixed parameters. Then, the payoff differences u(a, 1) − u(a , 1) tend to be larger for the high type 1 than differences u(a, 0) − u(a , 0) for the low type 0. Accordingly, we say that the high type has more control than the low one. See appendix B for an extension of this application to general distributions.
The following lemma relates control in each state with the direction of monotonicity of v .
Lemma 2. If the high type has more control than the low type, then v is decreasing.
In this example, we are able to compute directly the analytical form of the value function v, which makes the derivation of v simple.
Proof of Lemma 2. Letũ a ∼ N µ, ω 2 and η a ∼ N ν, ω 2 . Then is the expected maximum over n iid. draws from this distribution. The expectation of the maximum of n iid. draws from N μ, σ 2 equals: where f and F are the standard normal density and distribution. Therefore, v(p) = is the latter integral. The linear term does not affect v , so we have that v (p) = cσ (p).
Using the equality σ(p) = c 1 + c 2 p + c 3 p 2 for some constants c 1 , c 2 , and c 3 , we obtain: Since the value function is the expectation over the upper envelopes of linear functions and thus convex, and since σ(p) is positive, it must be the case that c −c 2 2 +4c 1 c 3 4 is positive. By the assumption that τ 1 > τ 0 , the function τ (p) and hence also σ(p) are increasing. Thus, v (p) decreases in p.
When v is decreasing, the first point of Proposition 3 applies, and the agent systematically overestimates her control over her own well-being.
Corollary 1. If the high type has more control than the low type, then the second-best perception is biased upwards: m * (p) > p for all p ∈ (0, 1 − σ).
A misperception of p as q causes a loss only for draws of the payoff function u under which a tie arises in-between p and q. In this application, the variance of u(a, s) is increasing in the belief s. Larger values of this variance correspond to more spread between payoffs, hence to lower likelihoods of a tie at probability s. Therefore, upward perception errors lead to less frequent losses than downward errors of the same magnitude. 11

Order effects
Psychologist Baron (2000) defines an order effect as order-dependent weighting of the observed pieces of evidence when the order of the evidence presentation is normatively uninformative. Primacy effect arises when the agent overweights her first impression and the recency effect refers to overweighting the last impression. Page and Page (2010) document the order effect in a field study of talent judgement and they argue for the relevance of the effect in hiring practices. Here we offer a stylized model in which the order effect arises as a second-best perception strategy for an agent with imperfect memory.
The agent chooses an object a ∈ {1, 2}. The quality of each object a is a sum x a 1 + x a 2 of two components, x a 1 and x a 2 . The agent observes the first components of both objects in round 1 and the second components in round 2. For instance, the agent may be a firm that chooses one job candidate out of two applicants; it screens each applicant in two tests, each test reveals one of the two applicant's qualities, and the firm's goal is to choose the applicant with the higher sum of their qualities.
We let ∆ t = x 2 t − x 1 t denote the quality difference in round t, and refer to ∆ 1 as the first impression. The pair of second components (x 1 2 , x 2 2 ) is drawn from a joint density x 1 2 , x 2 2 , independently of the first components. We assume that the density ψ(∆ 2 ) of the difference ∆ 2 = x 2 2 − x 1 2 , expressed by the formula is single-peaked and symmetric around zero. This is the case, for instance, if x 1 2 and x 2 2 are drawn iid. from a Gaussian distribution, but also for many other symmetric densities . The distribution of the first components plays no role in the analysis.
The agent chooses an object at the end of round 2, after she has observed all four components x a t . She flawlessly perceives the second-round components, but has a distorted recollection of the first impression ∆ 1 . Accordingly, we distinguish between the objective value ∆ 1 and the perceived value∆ 1 . The agent chooses object 2 if∆ 1 + ∆ 2 > 0, and object 1 otherwise.
Let us consider the loss from misperceiving ∆ 1 as∆ 1 , as a function of ∆ 2 . If both ∆ 1 + ∆ 2 and∆ 1 + ∆ 2 have a same sign, a same object is chosen in both cases and the loss is null. If ∆ 1 + ∆ 2 and∆ 1 + ∆ 2 have opposite signs, the wrong object is chosen under ∆ 1 , and the corresponding loss is |∆ 1 + ∆ 2 |. The expected loss of perceiving ∆ 1 as∆ 1 is thus independent of the particular realizations of the first components (x 1 1 , x 2 1 ) such that ∆ 1 = x 2 1 − x 1 1 , and can be expressed as: The next lemma characterizes this loss through an integral formula that is similar to the characterization in Lemma 1.
Lemma 3. The expected loss from misperceiving the first impression ∆ 1 as∆ 1 is: The proof in the appendix consists of mapping the setting of this example with continuous payoff state ∆ 1 to our main setting with binary payoff state θ. We achieve this by a simple rescaling of the problem. The substantial part of the proof consists of showing that the relevant weight v is, after rescaling, equal to the density ψ.
We provide a simple heuristic for the loss expression in (6). A misperception of ∆ 1 for ∆ 1 leads to a loss if it reverts choice, which happens if for some∆ 1 in between ∆ 1 and ∆ 1 , the agent is indifferent among the two objects. Given∆ 1 , such an indifference occurs when ∆ 2 = −∆ 1 , hence with likelihood ψ(−∆ 1 ) = ψ(∆ 1 ). The loss arising from a mistake is |∆ 1 + ∆ 2 | = |∆ 1 −∆ 1 |. Hence, the expected loss from the misperception is given by the integral in (6).
Equipped with the loss formula of Proposition 3, we now study second-best noisy perception. We assume that the subjective perception∆ 1 is formed essentially through the same process as in our main model: The agent observes the true first impression ∆ 1 in round 1, memorizes a valuem(∆ 1 ), and recalls∆ 1 =m(∆ 1 ) + ε, where ε is drawn from a density g(ε) that is symmetric around 0 and has support on [−σ, σ]. Letm * (∆ 1 ) be the optimal perception strategy that minimizes the expected loss, i.e.: Our next result shows that the second-best perception avoids the relatively costly types of error by a systematic exaggeration of the first impression.
Proposition 4. The second-best perception exaggerates the first impression: The proof in the appendix consists of applying Proposition 3 to the rescaled problem.
The direction of the bias is intuitive. Ties between the two objects arise relatively often, across all ∆ 2 , when the first impression ∆ 1 does not strongly favour any of the two objects.
Thus, an underestimation of the absolute value of the first impression is more likely to lead to the suboptimal choice than the overestimation of the same size. The primacy effect arising in Proposition 4 is the consequence of our assumption that the agent recalls the early evidence imperfectly and the late evidence errorlessly. Under the opposite assumption, a recency effect would arise.

Overprecision
Moore and Healy (2008) define overprecision as an overvaluation of the precision of one's own belief. Daniel et al. (1998) is an early study of the implications of overprecision to financial markets. In our example, the agent bets on a binary event. Each bet a ∈ {1, . . . , n} pays a reward u(a, θ) in state θ ∈ {0, 1}. The agent observes the 2n rewards u(a, θ) a,θ and chooses a bet a * ∈ {1, . . . , n} that maximizes the perceived expected payoff u(a, q) under the subjective probability q of the state being θ = 1. All rewards u(a, θ) are independently drawn from the standard normal distribution.
The further away q is from 1/2, the more confident the agent is of her ability to predict the correct state. We show that the second-best perception systematically exaggerates this ability. To this end, we first establish monotonicity property of v .
The proof relies on an explicit computation of v and v , which the Gaussian framework makes possible.
Proof of Lemma 4. The expected payoff u(a, p) for each action a at probability p is an iid.
draw from N 0, p 2 + (1 − p) 2 , and v(p) is the expected maximum over these draws.
As in the proof of Lemma 2, v(p) is proportional to the standard deviation of each draw: for some positive constant c that depends on n only. The second derivative is v (p) = c 1 which has the properties stated in the lemma.
Proposition 3 together with Lemma 4 imply the direction of the bias.
Since the variance of the expected payoff u(a, p) for each a increases with the distance of p from 1/2, ties between actions arise more often at beliefs close to 1/2 than at beliefs further away from 1/2. Therefore, an error in the perception of p in the direction towards 1/2-the undervaluation of one's ability to predict the state-is costlier than an opposite error of the same size. The second-best perception avoids the costlier errors by systematically overvaluing one's information about the state.

Overweighting of small probabilities
We revisit here Steiner and Stewart (2016), who provide a microfoundation for overweighting of small probabilities akin to the one in prospect theory (Kahneman and Tversky, 1979). The purpose of the revisit is twofold. First, we show how the problem from Steiner and Stewart can be solved by the general method from this paper. Second, we contrast their model with the setting from the previous subsection in which underestimation of small probabilities arises, and clarify the forces driving the opposite biases in these two frameworks.
As in Steiner and Stewart (2016), we consider an agent who chooses a ∈ {1, 2}, where the payoff for action 1 is a lottery that pays u(1, 1) with probability p, and u(1, 0) with probability 1 − p. Action 2 is an outside option that pays a fixed payoff u(2, θ) =ũ ∈ R.
draws from the standard normal distribution. 12 Steiner and Stewart interpret the outside option valueũ as a maximum over many alternatives, and thus focus on values ofũ larger than typical draws of the lottery rewards.
Steiner and Stewart solved this problem by explicitly computing the loss L(p, q) and by optimizing over the messages directly. Here, we apply the methodology of this paper.
We compute v(p), characterize monotonicity properties of its second derivative, and use Proposition 3 to derive the direction of bias.
12 See Steiner and Stewart (2016) for an extension to general distributions. Our first step is to establish that, for values of the outside optionũ larger than a certain threshold, the function v has the opposite monotonicity property than that in the previous application.
The proof, shown in the appendix, relies on a computation of v from an explicit derivation of v.
Proposition 3 implies that overvaluation of small probabilities arises in this setting.
Why does the agent from the previous application undervalue small probabilities, whereas the agent of this application overvalues them? The difference is caused by the distinct patterns of the ties. Figure 2 depicts v for both settings. In the overprecision setting, a tie between lotteries is likely when the lottery values have a low variance, which happens when p is close to half. This contrasts with the prospect-theory setting, where a tie between the lottery and the high outside optionũ is likely when the lottery value has a high variance, and hence v is U-shaped. Consequently, the perceptions exhibit the opposite biases. The two results suggest that the bias in the perception of probabilities depends on whether the perceived probability affects payoffs to all actions, or idiosyncratically affects the payoff only to one action.

A Sophisticated Agent-an Example
The agent from section 4 is naive in that she does not fully utilize the information available to her at the choice stage. This section studies the sophisticated perception strategies of an agent who loses some information during the decision process, but who is then fully capable of utilizing the information retained until the choice.
The agent assigns prior probability 1/2 to the state θ = 1. At the perception stage, Let us denote the density of the belief p by ϕ and assume that it has support on [0, 1] and is symmetric around half: ϕ(p) = ϕ(1 − p) for all p ∈ [0, 1]. We interpret the symmetry as a neutrality assumption abstracting from biases driven by an asymmetry of ϕ.
Denote the two Bayesian posteriors by q(p * ) = E [p | p < p * ], and q(p * ) = E [p | p ≥ p * ]. 13 The partition of the probability interval is reminiscent of the coarse reasoning model in Jehiel (2005). Other models that assume biased beliefs in strategic interactions include Eyster and Rabin (2005) with their application to the winner's curse bias, and Pavan (2014) who derives a compromise effect in a model with bounded recall. One distinction is that our agent optimizes over the information loss in a range of nonstrategic decision problems.

Proposition 5 (sophisticated perception). If
and ϕ(p) is symmetric around 1/2, then the optimal threshold probability p * * is less than 1/2 and the agent attains the high posterior q (p * * ) with a higher probability than the low posterior q (p * * ).
The symmetric result holds when v (p) < v (1 − p) for all p < 1/2; in this case To obtain some intuition for the last result, observe that v (p) can be interpreted as a local value of information. To see this, consider an agent with a belief p who receives a weakly informative signal upon which she updates her belief to π in a neighborhood of p.
The Bayesian constraint E π = p and the second-order Taylor expansion of v imply that the value of the received information is E v(π)−v(p) ≈ v (p)V ar(π)/2. By assumption (7), the local value of information is higher in the lower half of the probability interval than in the upper one. The optimal binary partition then discriminates between the probabilities in the lower half of the probability interval at the expense of not distinguishing among the probabilities in the upper half.
Since the studied agent is Bayesian, the expected terminal belief, E q, is necessarily equal to the prior belief 1/2. Bayesian rationality thus implies that the relatively frequent occurrence of the high terminal belief is compensated by adjustments of the values of the two terminal beliefs. Proposition 5 has implications to perception in the following sense.
Assume, for instance, that the distribution of payoffs is as specified in subsection 5.1. Let half of the population have the high control over their own well-being, and the other half the low control. Each agent receives incomplete information about her own type. Further assume that the agents are unable to perceive a continuum of probabilities, and can only distinguish a "high" and "low" probability of being the high type. Since v is decreasing in this case, we predict that the agents who perceive themselves as "highly" likely to be of the high type are more common than those who evaluate the chance of being the high type as "low".

Discussion
We have studied the cost of misperception in a model of decision making under uncertainty.
Our main technical result, Lemma 1, shows that the loss function can be entirely characterized from the value function of the underlying decision problem. In our applications, the decision problem is specified by a random payoff function, which makes our environment rich enough. Note however that our loss formula applies to all settings, whether or not the payoff function is random, as long as the value function admits a second derivative. We presented a series of applications of this loss formula to settings in which the agent has imperfect memory, and hence faces an error-management problem; these examples allowed us to microfound several well-known behavioural biases.
Microfoundations for behavioural biases can contribute to normative discussions of debiasing. Our model suggests that a bias relative to the precision-maximizing perception may be second-best optimal and thus presence of a bias is not enough to justify an intervention without further arguments. As the standard maladaptation argument goes, an intervention into the decision process may be beneficial if the agent's environment has changed since the biases have evolved. Interestingly, the model suggests that an intervention may be justified even if the agent's environment has not changed since adaptation took place. This is the case if an intervening outsider knows more about the agent's decision problem than the perception designer-evolution-has known.

A Proofs
Proof of Lemma 1. Note that v(p) = max q V (p, q) = V (p, p). Applying the Envelope theorem to Using the linearity of the expected payoff with respect to probability and (8), we obtain and thus, is the error of the linear approximation of the value function v(p) when it is approximated at q. Expression (1) follows once we integrate by parts: The next result is an auxiliary lemma used in the proof of Proposition 1. For any action a, we let u −a = (u a ) a =a ∈ R 2(n−1) . Given a and u a , we let the measure Q ua on R 2(n−1) be given by Q ua (S) = S ρ (u −a , u a ) du −a for every borel set S ⊆ R 2(n−1) , and let U p a = {u −a : ∀b = a, u(b, p) < u(a, p)}.
Proof of Lemma 7. Fix action a and u a . Let the sets B p b = {u −a : ∀a =a, b, u(a , p) < u(b, p)}. Given p, p 0 ∈ [0, 1], since the sets B p 0 b b =a partition R 2(n−1) up to a set of Q ua -measure 0, we have: In order to evaluate each term of the sum we first write: We make a change of variables: u b = u a − r(cos θ, sin θ), and observe that u(a, p) > u(b, p) is equivalent to θ p < θ < θ p + π where θ p = arctan − 1−p p . This leads to: We differentiate this expression with respect to p at p = p 0 , and use the fact that on the boundary, for θ ∈ {θ p 0 , θ p 0 + π}, one has u (a, p 0 ) = u (b, p 0 ), and thus 1 B p 0 b = 1 D ab (u(a,p 0 )) : rρ (u −ab , u a + r(cos θ p 0 , sin θ p 0 ), u a ) dr − r>0 rρ (u −ab , u a − r(cos θ p 0 , sin θ p 0 ), u a ) dr du −ab d dp p=p 0 θ p = u −ab ∈D ab (u(a,p 0 )) r∈R rρ (u −ab , u a + r(cos θ p 0 , sin θ p 0 ), u a ) drdu −ab × d dp p=p 0 θ p .
With a new change of variable x = s + t we obtain: We group the terms a, b and b, a in the sum to get: It remains to prove that v (p) is finite. For a > b, let v ab (p) be the term: With w ⊥ p = ((1 − p), p) and σ p = p 2 + (1 − p) 2 we rewrite this as: With a change of variables, X = σ p (x + ν), Y = σ p (t + ν), this gives: which is finite under the tail condition since Xξ(X) is bounded by ξ on (0, 1), it is bounded by X 2 ξ(X) for X > 1, and X≥0 X 2 ξ(X)dX is finite by assumption.
Proof of Proposition 2. By Lemma 1, the first-order condition for the interior optimal m gives (4).
Proof of Proposition 3. We prove point 1, point 2 being symmetric to it. For p ∈ (σ, 1−σ), we recall from footnote 9 that any message m < p − σ is dominated by p − σ. We now show that any message of the form m = p − η with η ∈ (0, σ) is dominated by p + η. Thus we want to show that: Given the symmetry assumption on the distribution g of ε, this is equivalent to: It is sufficient to prove that, for every ε ∈ (0, σ), Let us first examine the case in which ε < η. By applications of Lemma 1 we have: where the final inequality uses assumption (5) and the fact that both η − ε and η + ε are positive.
Let us now consider the case in which ε ≥ η. In this case, Lemma 1 gives: where again the final inequality follows from (5).
We have proved that no message m < p can be optimal. It remains to show that m = p is not optimal either. Using the symmetry of g and (5), we obtain: Thus, the message m = p cannot satisfy the first-order condition (4) that is necessary for optimality. Since the optimal message exists, it must be that m * (p) > p.
Proof of Lemma 3. We first relate the current setting, which has a continuous payoff state ∆ 1 ∈ R, to our main model with binary payoff state θ ∈ {0, 1}. We do this by rescaling the problem. For each ∆ 1 and∆ 1 , we fix ∆(∆ 1 ,∆ 1 ) < min{∆ 1 ,∆ 1 }, and ∆(∆ 1 ,∆ 1 ) > max{∆ 1 ,∆ 1 }. For brevity we omit the arguments of ∆ and ∆ and let p, q ∈ (0, 1) be given We define the utility function u by: if a = 2 and θ = 0, if a = 2 and θ = 1, and, as in our main model, let L(p, q) be the loss from the misperception of the objective probability p of the state θ = 1 for the subjective value q. The definition of L(p, q) does not depend on x 1 1 since it enters payoffs for both actions additively. We show that both loss functions are isomorphic in the sense that L(p, q) =L ∆ 1 ,∆ 1 .
We now compute the value function v of the rescaled decision problem with random utilities (u(a, θ)) a,θ . Let us denote ∆ 1 (s) = s∆ + (1 − s)∆ for s ∈ [0, 1]. We have: where the first summand stands for the contingencies in which the agent optimally chooses the first object and receives x 1 1 + x 1 2 , and the second summand stands for contingencies in which the agent optimally chooses the second object and receives x 2 1 +x 2 2 = x 1 1 +∆ 1 (s)+x 2 2 . It follows that: where the last equality follows from the assumed symmetry of ψ.
Finally, we use Lemma 1 to compute the loss in the original decision problem: where we used the change of variables = ∆ 1 (s) to establish the last equality.
As we have shown in the proof of Lemma 3, in the rescaled problem, the value function for all positive α, and symmetrically, when ∆ 1 < 0, then v (p + α) > v (p − α). Then, by Proposition 3, for ∆ 1 > 0 we have m * (p) > p, and symmetrically, for ∆ 1 < 0 we have m * (p) < p, as needed.
Proof of Lemma 5. Recall that F and f are the cdf and the pdf of the standard normal random variable, respectively. Then, where σ p = p 2 + (1 − p) 2 . We have used that the lottery value pu(1, 1) + (1 − p)u(1, 0) is N 0, σ 2 p in the second line. There, the first summand stands for contingencies in which the lottery value is belowũ and the agent obtainsũ. The second summand stands for the choice of the lottery. The last line follows from the substitution z = x/σ p .
Proof of Proposition 5. Let denote the Bayesian posterior formed at p when the threshold probability is p * .
The agent's problem is To utilize our characterization of the loss L(p, q), we express the objective in two equivalent forms that both feature the loss function.
In step 1, we show that the agent's problem is equivalent to The expression E p L (q (p; p * ) , 1/2) is the difference in expected payoffs between the partially informed agent and an uninformed agent. The more useful the partial information retained by the forgetful agent is, the higher is this difference. To see that (10) is equivalent to the agent's problem, write where in the second line, we have used v (1/2) = V (1/2, 1/2) = E p V (q (p; p * ) , 1/2). The last equality follows from the linearity of the function V (p, q) with respect to the first argument, and from E p q (p; p * ) = 1/2 (since the posteriors q (p; p * ) satisfy the martingale property).
In step 2 we prove that the agent's problem is equivalent to The expression E p L (p, q (p; p * )) represents the difference in expected payoffs between the fully informed agent and the partially informed agent. The more useful the partial infor-mation is, the lower is this difference. To see this equivalence, write E p v (q (p; p * )) = E p V (q (p; p * ) , q (p; p * )) where in the second line, we have again used the linearity of V with respect to the first argument, and that E p [p | q (p; p * ) = q] = q for both Bayesian posteriors q ∈ {q(p * ), q(p * )}.
In step 3, we use the objective (10) to rule out optimality of p * > 1/2. We show, using (10), that for any p * > 1/2, the agent is better off using a threshold of 1 − p * rather than p * . Let us rewrite the objective in (10) as where the fractions are the expressions for Pr(p < p * ) and Pr(p ≥ p * ), respectively.
It remains for us to prove that p * = 1/2, which we do in step 4. 15 Let us write the 15 We are grateful to an anonymous referee who has improved this part of the proof.

B Illusion of Control-General Distributions
We consider binary action sets in this section and utilities u(a, θ) =ũ a + τ θ η a , τ 1 > τ 0 > 0, a ∈ {1, 2}. We dispense with the assumption of Gaussian payoffs and provide sufficient conditions on general payoff distributions for the monotonicity of v .
The proof of the lemma is at the end of this subsection. Let us provide here a heuristic argument that emphasizes the connection between the likelihood of ties and the second derivative of the value function. Let τ (p) = τ 1 p + τ 0 (1 − p), and let ∆(p) = δ + τ (p)η be the payoff difference between the two actions, and p * be the random belief at which the agent is indifferent between the two actions; that is for each δ and η, p * solves ∆(p * ) = 0. For each η, the conditional density of p * is χ − τ (p)η d∆ dp = χ − τ (p)η |(τ 1 − τ 0 )η|. Thus, the unconditional density of p * is ∞ −∞ χ − τ (p)η |(τ 1 − τ 0 )η| φ(η)dη.
The intensity of the kink of the value function is |(τ 1 − τ 0 )η|. The expression (14) for v augments the last integral by the intensity of the kinks.
The last lemma allows us to establish the monotonicity of v under a regularity condition without computing v.
Lemma 9. If the density χ(δ) is single-peaked with its maximum at 0, then v is decreasing.
The difference δ =ũ 2 −ũ 1 has a single-peaked density whenũ 1 andũ 2 are iid. draws from many common densities, including the normal or Pareto densities. Additionally, we provide a condition on the joint distribution ofũ 1 andũ 2 that is sufficient for the single-peakedness of χ(δ). Recall that a function h : R 2 → R is supermodular if for any x, y, x , y ∈ R 4 : h max{x, x }, max{y, y } + h min{x, x }, min{y, y } ≥ h(x, y) + h(x , y ).
The supermodularity assumption is related to the definition of affiliated random variables (Milgrom and Weber, 1982). Random variables are affiliated if the logarithm of their joint density is supermodular. Our assumption applies to the density instead of the log-density.
The following result concludes this section by summarizing conditions under which upward bias obtains.
Corollary 4. If u(a, θ) =ũ a + τ θ η a , τ 1 > τ 0 > 0 and the joint density g is both symmetric and supermodular, then v is decreasing and the solution of the naive perception problem (3) is biased upwards: m * (p) > p.
Proof of Lemma 9. The result follows from (14): it follows from τ 1 > τ 0 that τ (p) is increasing in p, which implies that χ − τ (p)η is decreasing in p for all p and η.