STARTING SMALL TO COMMUNICATE

We analyze a repeated cheap-talk game in which the receiver is privately informed about the conﬂict of interest between herself and the sender and either the sender or the receiver controls the stakes involved in their relationship. We focus on payoff-dominant equilibria that satisfy a Markovian property and show that if the potential conﬂict of interest is large, then the stakes increase over time, i


INTRODUCTION
We analyze a repeated cheap-talk game in which the receiver is privately informed about the conflict of interest between herself and the sender and either the sender or the receiver controls the stakes involved in their relationship.For example, the sender could be a manufacturer who is better informed about demand and makes retail-price recommendations to a retailer, who decides on the actual retail price and may have different objectives than the manufacturer. 1The retailer might decide on how much to buy from the manufacturer, in which case we would have a situation where the receiver controls the stakes, or the manufacturer might decide on how much to deliver to that particular retailer, in which case it is the sender who controls the stakes.
Or the receiver could be an investor who first decides how much to save and then how to allocate these funds after receiving advice from an informed financial advisor.The investor is privately informed about her preferences, which might include risk preferences or some behavioral bias.This would be a situation in which the receiver chooses the stakes involved in the relationship.Alternatively, it could be the advisor (sender) who chooses the stakes by controlling which investment opportunities to present to the investor in each period.For example, he may present more important ones (or a big bunch of them) at the beginning and less important ones later, or vice versa.
In situations where the sender is a subordinate in a hierarchical relationship, we could also think of the problem faced by the receiver as the optimal design of the importance of the decisions made by the sender over time, i.e., the sender's career path.At what level of the hierarchy should the receiver start the sender and how should she go about promoting him?Is it best to start him at a very low rank and keep him there for a long time, or should the career of the sender progress at a steady pace?What is the role of potential conflict of interest between the sender and the receiver in the optimal design of the career path of the sender?
In our model, each period a sender observes a payoff relevant state of the world and communicates this information to the receiver.The receiver observes the message sent by the sender and makes a decision.State of the world and the decision jointly determine the payoffs, which are revealed at the end of the period.Overall payoff of each player is equal to the weighted sum of period payoffs, where the weight of each period is determined by the size of the stakes (or the importance of the decision) in that period. 2 The sender would like the decision to match the state of the world while the receiver might be biased.More crucially, the sender's preferences are common knowledge while that of the receiver is her private information.
We assume that the information on the state of the world is "soft," i.e., it cannot be verified, and that the messages are costless.This makes the communication phase in each period a "cheap-talk" game, i.e., the sender may lie and this has no direct costs for him.We also assume that the decisions of the receiver are not contractible.This could be due to legal reasons or because the decisions are impossible to reproduce before courts. 3Our third crucial assumption is that states of the world are independently distributed across periods.This implies that the sender decides how much information to reveal each period without having to worry about its informational implications for the future states.Finally, we assume that the receiver's preferences are similar for each decision, i.e., she either shares the preferences of the senders or is biased in the same manner for all the decisions.Therefore, our model is more suitable for situations in which the decisions are related, such as a series of investment decisions, or budgetary decisions for different departments, etc.An alternative, and perhaps more plausible, way to interpret the model is that the receiver makes the same decision in each period and the state of the world observed by the sender corresponds to the optimal decision for that period. 4e assume that the receiver is either an unbiased type, who myopically chooses the decision best suited to the state given her beliefs in each period, or a biased type who acts strategically.The unbiased type resembles a commitment type that is common in the reputation literature.But, unlike a standard commitment type who always plays the same action, the unbiased receiver plays a best response to her beliefs in any given period.Furthermore, in standard models of repeated games and reputation the discount rates are fixed, while in our model either the sender or the receiver chooses them strategically.Our aim is to characterize the perfect Bayesian equilibria of the resulting extensive form game with incomplete information.
In order to gain some intuition about the major forces at work in the model, note that the sender would like to receive his favorite decision, i.e., the unbiased decision which matches the state, in each period.Therefore, if he believes that the receiver is going to make the unbiased decision with high enough probability, then he has an incentive to reveal the state of the world truthfully.The biased receiver, on the other hand, would like to make a decision that is best for her, i.e., the biased decision, in any period and for that reason she would like to receive accurate information.However, if she makes a decision that is different from the decision that would be made by the unbiased commitment type, she would be revealed as biased and receive no information in the future.This introduces reputation concerns in the sense that she may masquerade as the unbiased receiver and act against her own interest today, in order to receive better information in the future.
It is clear that the receiver benefits from truthful communication.It turns out that, ex-ante, the sender also benefits from truthful communication, irrespective of whether the receiver is biased or not.In fact, if he could commit to a communication strategy before learning the state in the stage game, he would commit to full revelation. 5Therefore, the stage game exhibits both conflict of interest, because of the possible bias, and common interest, because of the common preference for truthful communication.
These considerations imply that the sender (or the receiver) may choose the stakes in a strategic manner in order to utilize the reputational incentives and facilitate communication.In particular, if relatively larger stakes are left for the future, then the biased receiver may choose to play the unbiased action early on in the game and this may enable truthful communication.
As is usual in cheap-talk games, our model exhibits multiple equilibria.In order to circumvent this problem, we focus on equilibria that satisfy a Markovian property and yield the highest equilibrium payoffs for all the players, which we call the payoff-dominant equilibria.We show that if the potential bias is large and the initial reputation of the receiver bad enough, then there is a unique payoff-dominant equilibrium outcome in which the size of the stakes increases over time.In other words, the time path of the stakes exhibits "gradualism" or "starting small," which has been a recurring theme in economics in different contexts. 6Our paper contributes to this literature by providing an alternative rationale for these phenomena, which is based on utilizing reputational incentives to maintain a healthy communication in a relationship.
We show that the time path of the stakes is chosen in such a way that the biased receiver is indifferent between the biased and the unbiased actions in each period.She mixes between these actions in a way that builds her reputation (after each successive unbiased action) at just the right speed in order to facilitate communication in every period.If her reputation were to evolve faster, then truthful communication would fail in the current period, while if it were to evolve slower, it would fail in the future.Sender reveals the state truthfully in every period as long as he has always observed the unbiased action in the past. 7nterestingly, the sender does not prefer a screening equilibrium in which the biased receiver reveals herself at the beginning of the game.This is because once the biased receiver reveals herself, the sender can no longer communicate with her and, as we have mentioned before, ex-ante the sender prefers to communicate even with the biased receiver.The biased receiver does not prefer to be screened either because, if her initial reputation is bad, she will receive information neither in the period she reveals herself nor afterwards.The unbiased receiver could potentially benefit from being revealed but only if that helps her receive more information.In the payoff-dominant equilibrium outcome, she receives complete information in every period except perhaps the first period in which the biased receiver plays a mixed strategy.However, if the biased receiver reveals herself by playing the biased action with probability one in the first period, rather than mixing, then the sender cannot communicate in that period.Thus, screening does not increase the amount of information provided to the unbiased receiver either.
Our results further imply that as the potential conflict of interest between the sender and the receiver increases, initial stakes become smaller but they grow faster.This is due the fact that as the bias becomes larger, the future must become relatively more important in order to provide sufficient incentives to the biased receiver to play the unbiased action in the current period.We also show that, both the sender and the receiver prefer to spread the total stake in their relationship over as many periods as possible.If the potential bias is large, this would mean that the stakes remain very small for a long period of time and then increase quickly towards the end of the relationship.
Finally, we should note that since the sender fully reveals the state in every period as long as he observes the unbiased action, a payoff-dominant equilibrium is also the most informative one on the equilibrium path.8Therefore, payoff-dominance is potentially a reasonable equilibrium selection in our context.9

THE MODEL
A sender and a potentially biased receiver play a repeated cheap-talk game for N periods.Since it is more convenient to do so, we will count the periods in reverse, so that the first period is labeled N , the second N − 1, and so on.In each period i , the following stage-game is played: 1.The sender (or the receiver) chooses the parameter δ i ∈ [0, 1] for period i .The parameter δ i represents the proportion of the total stakes deferred to subsequent periods and 1 − δ i represents the proportion made in period i .Since there are no subsequent periods in the last period, we set δ 1 = 0.
2. Nature chooses the state of the world θ i ∈ {0, 1}.We assume that each state is equally likely and that states are independent across periods.
3. The sender observes the state θ i and chooses a message m i ∈ {0, 1}.10 4. The receiver observes the sender's message and chooses an action a i ∈ R without observing the state of the world.We define the parameter γ i as the importance of the decision or the proportion of the total stakes made in period i .More precisely, 2 , where β ∈ {0, b} and b > 0. The parameter b measures the divergence of the preferences of the sender and the receiver, or simply the "bias" of the receiver.We assume that nature chooses β = b with probability p ∈ (0, 1) before the game begins and privately informs the receiver.
The payoff of each player over the N periods is simply the sum of the payoffs from each period.
The state of the world, the messages, and the decisions of the receiver are unverifiable and hence cannot be contracted upon.Furthermore, as the payoff functions imply, the messages have no direct payoff consequence.This implies that the communication between the sender and the receiver is "cheap-talk" and that outcome contingent contacts cannot be written.After a period is over, the sender and receiver observe their payoffs and therefore the receiver learns the state in that period and the sender learns the receiver's action.
We assume that the unbiased, type β = 0 receiver is a commitment type who, in each period, plays the action that is perfectly aligned with the sender's preferences.More precisely, fix a period i and let λ ∈ [0, 1] be the probability assigned by the receiver to the event that θ i = 1.Define the best period action for type β ∈ {0, b} as follows: We refer to a 0 (λ) as the unbiased action and to a b (λ) as the biased action.The unbiased receiver is a commitment type (or an automaton) who plays action a 0 (λ) in each period, i.e., she picks the myopic best response of a receiver who has zero bias and therefore chooses an action that is perfectly aligned with the sender's preferences.The biased receiver, in contrast, is rational and chooses her period action strategically.
The main question analyzed in the paper is the equilibrium allocation of the stakes or the importance of decisions over time.We assume that any δ i ∈ [0, 1] can be chosen at the beginning of each period i and that at least one period has a positive weight.In other words, the importance of period i decision can be fine tuned in any desired way.This could be motivated in three different ways: (1) The parameter γ i is the proportion of the total stakes in the relationship that is assigned to period i ; (2) There is a large set of decisions and each period a subset of these decisions is chosen; (3) There is a large set of decisions with varying importance and each period one decision from this set is chosen.
We will analyze two different versions of the model: (1) Sender chooses the parameter δ i (sendergame); (2) Receiver chooses the parameter δ i (receiver-game).In what follows, we will first describe the sender-game and then explain how the receiver-game differs from it.
Let o i = (δ i , θ i , m i , a i ) denote a period i outcome and O i the set of all possible period i outcomes.
For any i < N , let H i be the set of all histories before decision i is made, i.e., sequences of the type , denotes the probability of sending message 1 after history h, δ i , and θ i .The receiver moves after histories of the type (h, δ i , θ i , m i ) where h ∈ H i .For any history h ∈ H i , a period i information set for the receiver is given by In other words, before making a decision in period i , the only thing that is not known by the receiver is θ i .Let the set of all period i information sets be I i .Receiver's belief that θ i = 1 is given by λ i : I i → [0, 1].Since the unbiased receiver is a commitment type, we will only describe strategies for the biased receiver.Biased receiver's (mixed) strategy is given by α i : I i → ∆ (R), where ∆ (R) denotes the set of all probability distributions with support in R. For ease of exposition we will sometimes write λ i (h, δ i , m i ) and α i (h, δ i , m i ) for any h ∈ H i , δ i ∈ [0, 1], and constitutes an assessment and we focus our attention on perfect Bayesian equilibria (PBE) of the game that satisfy Properties 1 and 2 that we define below.
Property 1. Fix an assessment σ, a period i , a history h ∈ H i , and an outcome , then σ is a PBE only if p j ( ĥ) = 1 in any period j = i − 1, . . ., 1 and history ĥ ∈ H j that follows o i .This property implies that equilibrium beliefs put probability one on the receiver being the biased type (i.e., the strategic player) after histories that contain an action which is different from the unbiased action.It is automatically satisfied in a sequential equilibrium because the unbiased action is the unique action that is available to the unbiased type.However, we work with perfect Bayesian equilibria because there are certain difficulties in defining sequential equilibria for games with infinite action sets.11Property 2. For any i = N , . . ., 1 and h, h The first part of this property states that past history matters only to the extent that it changes the reputation of the receiver, while the second part states that the receiver plays a symmetric strategy.Together, they imply that strategies do not depend on the past communication behavior of the sender. 12Since the sender's past communication behavior has no effect on current and future payoffs or the states of the world, this is a Markovian property in the sense that strategies are independent of payoff irrelevant histories. 13In particular, this restriction eliminates punishments in the form of "no information revelation" or "playing the biased action" after histories in which the sender did not communicate truthfully.In addition to its Markovian nature, this property is also implied by "renegotiation-proofness," because even after histories in which the sender has lied, both parties have an incentive to choose a continuation equilibrium in which there is full communication.In Section 3.2 we will comment on how our results change when this restriction is removed.
From this point on, we restrict attention to PBE that satisfy Properties 1 and 2. Hence, when we say that an assessment σ constitutes an equilibrium we mean that the assessment is a PBE and the assessment satisfies Properties 1 and 2.
Remark 1. Receiver-game differs from the above described game in that it is the receiver who chooses δ i at the beginning of each period i .Therefore, τ i is the receiver's strategy and specified as τ i : 14 The sender's belief in period i is a mapping p i : H i × [0, 1] → [0, 1], and hence different δ i may lead to different beliefs.Finally, Property 2 changes as follows: For any i = N , . . ., 1 and

PRELIMINARIES
As is usual in cheap-talk games, there are many equilibria of the game defined above, even under the Markovian restriction introduced in Property 2. In this paper, we focus on the payoff-dominant equilibria, i.e., equilibria that yield the highest equilibrium payoffs for the players.For expositional reasons, we will also restrict attention to equilibria in which for any i ∈ {N , N − 1, . . ., 1}, h ∈ H i , and The sender sends message m = 0 after observing θ i = 0, i.e., µ i (h, δ i , 0) = 0 and (2) The receiver puts positive probability only on the biased and unbiased actions, i.e., α i (h, δ i , m) ∈ ∆ a 0 (λ i (h, δ i , m)), a b (λ i (h, δ i , m)) .We should, however, note that restriction (1) is without loss of generality in terms of equilibrium outcomes and restriction (2) is satisfied in any equilibrium. 15  Given these restrictions, we simplify notation and describe period i strategies by functions τ i : and q where τ i (h) determines the choice of δ i , µ i (h, δ i ) determines the probability that the sender sends message 1 after δ i and θ i = 1, and the function q i (h, δ i ) is a distributional strategy (see Milgrom and Weber (1985)) for the receiver that determines the total probability with which the receiver plays the biased action.In the receiver-game, the only change is in τ i , which becomes τ i : H i × {0, b} → [0, 1] so that τ i h, β is the δ i choice of type β receiver after history h.
Fix an assessment σ, a period i , a history h ∈ H i , and δ i ∈ [0, 1].Let Pr(m) be the total probability that the sender sends message m ∈ {0, 1} in this assessment in period i after (h, δ i ). 16Let q = q i (h, δ i ) and λ (m) = λ i (h, δ i , m).We can then write the sender's and the biased receiver's ex-ante costs in that period as follows:

Cost of miscommunication
These costs are composed of two components: The first component (cost of bias) comes from the fact that the receiver plays the biased action with total probability q after both messages.The second component (cost of miscommunication) comes from the fact that the sender may not provide full information and is equal to the expected conditional variance of the state of the world.If, for example, the sender's message provides no information on θ i , then λ (m) = 1/2 for m ∈ {0, 1} and the cost of miscommunication is equal 1/4.If it is perfectly informative, then λ (1) = 1 and λ (0) = 0 and the cost of miscommunication is equal to zero.The cost of the unbiased receiver is simply the cost of miscommunication.

One-period Model.
To fix ideas, we begin our analysis with the simple case of N = 1.Let q be the total probability with which the biased action is played and note that sequential rationality implies that the biased receiver plays the biased action with probability one after both messages, i.e., q = p.As it is the case in cheap-talk games, there is always an equilibrium in which the sender's message provides no information about the state, the so called "babbling equilibrium."Instead, suppose that the sender communicates truthfully in equilibrium and, without loss of generality, assume that type θ = 0 sends message m = 0 and type θ = 1 sends message m = 1.Therefore, after message m ∈ {0, 1}, the unbiased action is m and the biased action is m + b.Optimality of the strategy of type θ = 1 sender implies that the cost of sending message m = 1 is less than the cost of sending message This is equivalent to qb ≤ 1/2 or q ≤ q, where q ≡ 1 2b .
Optimality of the strategy of type θ = 0 implies that which is always satisfied.Conversely we can show that if q ≤ q, then there is an equilibrium with full revelation.If q/2 < q < q, then there is also an equilibrium in which type θ = 1 completely mixes, but since this plays no major role in our analysis we relegate its proof to the Section 7 (See the proof of Lemma 2).Therefore, if q > q there is no information revelation in equilibrium.Note that truthful communication becomes an issue only when q < 1, or equivalently b > 1/2.We summarize this discussion in the following lemma for easy reference.
Lemma 1.Let N = 1 and p ∈ [0, 1] be the probability that the receiver is biased.In any equilibrium, the receiver plays the biased action with probability one, i.e., q = p.If p > q, then the sender's message provides no information.If p ≤ q, then there is an equilibrium where the sender truthfully reports the state and if q/2 < p < q, then there is an equilibrium where the sender's report is partially truthful (i.e., the sender sends the same message with positive probability in both states).These are the only equilibria.
Note that the equilibria described in Lemma 1 are Pareto ranked: the more informative equilibrium yields a strictly higher expected payoff to both the sender and the receiver.In fact, in the truthful equilibrium, the receiver's cost is equal to zero whereas the (ex-ante) cost of the sender is pb 2 .In the partially informative equilibrium, receiver's cost is (1/2 − pb) ∈ (0, 1/4), whereas the sender's is equal to pb 2 + (1/2 − pb).In the babbling equilibrium, expected costs of the receiver and the sender are 1/4 and pb 2 + 1/4, respectively.
Remark 2. We should note that the observation in Lemma 1 is true for any period in which the communication incentives of the sender depends only on the total probability with which the biased action is played in that period.Property 2 ensures that this is indeed the case and hence we have Lemma 2, whose proof is in Section 7.
3.2.Two-period Model.In this subsection, we provide some intuition for our general result by analyzing the two-period version of the model.First note that in the two-period game, overall costs of the receiver of type β and the sender are Note that, if the total probability of playing the biased action in the first period is q * 2 (p), then Bayes' rule implies that the probability assigned to the biased receiver in the next period (after the unbiased action) is . This implies that p 1 = q if p > q and p 1 = p if p ≤ q.In other words, q * 2 (p) is the smallest probability of playing the biased action in the first period that makes full revelation in the last period possible.
For concreteness, let us assume that it is the receiver who chooses δ 2 .We will later comment on the model in which the sender chooses δ 2 .We say that a period 2 outcome is cooperative if δ * 2 is chosen and the unbiased action is played.Define assessment σ * as follows.Both types of the receiver choose δ * 2 and the sender believes that the receiver is biased with probability one after any other δ 2 .Unbiased receiver always plays the unbiased action given her beliefs.The biased receiver mixes after δ * 2 so that total probability of the biased action is q * 2 (p), and plays the biased action with probability one (total probability p) after any other δ 2 .Type θ 2 = 0 sender always sends message m 2 = 0. Type θ 2 = 1 sends message m 2 = 1, i.e., sender fully reveals the state, if and only if δ * 2 has been chosen and it is incentive compatible to reveal the state, i.e., q * 2 (p) ≤ q.In the last period, the biased receiver plays the biased action with probability one and the sender reveals the state truthfully if and only if period 2 outcome is cooperative and it is incentive compatible to reveal truthfully, i.e., p 1 ≤ q.Beliefs are derived from Bayes' rule whenever possible.
Fix δ 2 ∈ [0, 1], and assume that after this choice of δ 2 , the probability assigned to the biased receiver is p 2 and that the biased receiver plays the biased action with total probability q 2 ∈ 0, p 2 after both messages.Sender's ex-ante total cost conditional on δ 2 is Cost after biased action where p 1 = p 2 − q 2 / 1 − q 2 by Bayes' rule, the cost of miscommunication in the last period is equal to 1 4 and c 1 after the biased and unbiased actions, respectively, and c 2 is the cost of miscommunication in the first period.
Under σ * , sender's cost is equal to the biased receiver's total cost is and the unbiased receiver's cost is (3.4)where c * 2 = 0 if q * 2 p ≤ q, and c * 2 = 1/4 otherwise.We first show that σ * is an equilibrium.Let us start with the last period.The biased receiver plays the biased action with probability one in the last period, which is the unique sequentially rational behavior.The sender reports truthfully if and only if the previous period's outcome is cooperative and p 1 ≤ q, which is also sequentially rational as we have shown above (Lemma 1).Now let us consider the first period behavior.After histories that start with δ * 2 , the sender assigns probability p to the biased receiver and the receiver plays the biased action with total probability q * 2 p .This implies that communicating truthfully is sequentially rational if q * 2 p ≤ q. 17 If the receiver plays the unbiased action, then she induces a cooperative history and posterior belief p 1 = q (As we have shown before, this follows from the definition of q * 2 (p)).Therefore, she learns the state perfectly in the next period and plays her best period action.This implies that the total cost of playing the unbiased action is (1 − δ * 2 ) b 2 + c 2 , where c 2 ∈ [0, 1/4] is the cost of miscommunication.If the receiver plays the biased action, then her cost is equal to the same cost of miscommunication c 2 in the current period, but she induces a history that is not cooperative and receives no information in the next period, which costs her 1/4.Definition of δ * 2 implies that (1 , the receiver is indifferent between the biased and the unbiased actions and hence playing the biased action with total probability q * 2 p is sequentially rational.Now consider histories that start with δ 2 = δ * 2 .The sender assigns probability one to the biased receiver, which is trivially consistent with Bayes' rule, and provides no information.Since δ 2 = δ * 2 , the history is not cooperative irrespective of what the receiver does, which implies that the sender will provide no information in the next period.Therefore, it is sequentially rational to play the biased action with probability one after any 2 leads to no information revelation and each type of receiver playing their best period actions in both periods.Therefore, the total cost of the receivers is 1/4, which is greater than or equal to the cost of choosing δ * 2 for both types of the receiver, given in equations (3.3) and (3.4).This proves that choosing δ * 2 is sequentially rational for both types and completes the proof that σ * is a perfect Bayesian equilibrium of the two-period game.
Remark 3. If it is the sender who chooses δ 2 , then we only change the assessment by specifying that the sender chooses δ * 2 .The proof that this is an equilibrium is the same except that we have to show it is optimal for the sender to choose δ * 2 .If the sender chooses δ 2 = δ * 2 , there is no information revelation and the biased receiver plays the biased action with probability one in both periods.Therefore, the cost of the sender is equal to pb 2 + 1/4 while the cost of choosing δ * 2 is at most We will now show that the outcome of σ * is the best equilibrium outcome for each player irrespective of who chooses δ 2 .We first state a few preliminary facts that will be useful later on.
Fact 1.If δ 2 < δ * 2 , then in any equilibrium the biased receiver plays the biased action with probability one in the first period.
Proof of Fact 1.The cost of playing the biased action in the first period is at most The intuition behind this fact is simple: When the future is not important enough, the future 17 The key to this observation is the fact that the sender's continuation payoff depends only on whether δ 2 = δ * 2 and whether the receiver plays the unbiased action.This implies that it is sequentially rational to provide full information if and only if the total probability of the biased action in the first period is smaller than or equal to q. reputational benefit of playing the unbiased action cannot outweigh its current cost.Therefore, the biased receiver plays the biased action.
Fact 2. In any equilibrium, the probability assigned to the biased receiver in the last period after the unbiased action is at most q.Therefore, if the belief on the biased receiver is p 2 ≥ p, then the total probability of the biased action is at least q * 2 p in the first period.
Proof of Fact 2. Suppose, for contradiction, that in equilibrium p 1 > q in the last period after the unbiased action.Lemma 1 implies that the sender will not provide any information in the last period even after the unbiased action, which, in turn, implies that the biased receiver plays the biased action with probability one in the first period.But then Bayes' rule implies that p 1 = 0 after the unbiased action, which contradicts the hypothesis that p 1 > q.Therefore, we conclude that p 1 ≤ q in any equilibrium.
If p ≤ q, then q * 2 p = 0 and hence the total probability of the biased action in the first period q 2 ≥ q * 2 p in any equilibrium.Assume therefore that p > q.Definition of q * 2 , Bayes' rule, and We will first show that the equilibrium that we specified, i.e., σ * , yields the highest equilibrium payoff for the receiver if p > q.
Claim 1.If the sender chooses δ 2 and p > q, then σ * is optimal for the biased receiver.
Proof of Claim 1. Fix an equilibrium σ = σ * and suppose that the receiver plays the biased action with total probability q 2 > q in period 2 in σ.As we have shown before (Lemma 2), this implies that the sender provides no information in that period.Since the biased receiver plays the biased action with positive probability by assumption, this implies that the biased receiver's cost is 1/4.Biased receiver's , which is at most 1/4.Suppose now that q 2 ≤ q and note that Fact 2 implies 0 < q * 2 p ≤ q 2 ≤ q < p.This implies that the cost under σ * is equal to δ * 2 /4.The cost under σ is at least δ 2 /4 because the receiver plays the biased action with positive probability in period 2. Furthermore, Fact 1 and q 2 < p imply that δ 2 ≥ δ * 2 .We conclude that δ 2 /4 ≥ δ * 2 /4, i.e., the cost under σ is greater than or equal to the cost under σ * .Claim 2. If the receiver chooses δ 2 and p > q, then σ * is optimal for the biased receiver.
Proof of Claim 2. Fix an equilibrium σ = σ * and let δ 2 be played with positive probability by the biased receiver in σ such that after δ 2 the posterior on the biased receiver is p 2 ≥ p. 18 Let q 2 denote the receiver's strategy after the history where δ 2 has been chosen and note that the biased receiver's total cost conditional on any δ 2 played with positive probability must be the same.Once we replace p with p 2 , the rest of the proof is exactly the same as the proof of Claim 1.
Intuitively, p > q implies that the biased receiver must play the biased action with positive probability in the first period, because otherwise her reputation next period will be p > q and she will receive no information from the sender, which makes playing the unbiased action suboptimal in the first period.Therefore, her total cost is (1 − δ 2 ) c 2 + δ 2 /4, i.e., she accrues the highest information cost in the last period and hence she would like to lower δ 2 .But if δ 2 goes below δ * 2 , Fact 1 and p > q imply that she receives no information in the first period.Therefore, the best for her is the minimum δ 2 that may possibly allow information revelation in the first period, i.e., δ * 2 .
Claim 3. If the sender chooses δ 2 and p > q, then σ * is optimal for the unbiased receiver.
Proof of Claim 3. If q * 2 p ≤ q, then σ * is optimal for the unbiased receiver because her cost under σ * is equal to zero.If, on the other hand, q * 2 p > q, then her cost under σ * is equal to (1−δ * 2 )/4.Fix any other equilibrium σ and suppose that the receiver plays the biased action with total probability q 2 in this assessment.Fact 2 implies that q 2 ≥ q * 2 p > q.This implies that the unbiased receiver's cost under σ is at least , that is, the unbiased receiver's cost under σ is greater than or equal to the cost under σ * .
Assume now that δ 2 > δ * 2 and suppose, for contradiction, that the cost under σ is smaller than the cost under σ * , i.e., (1 1 implies that the cost of playing the biased action is larger than the cost of playing the unbiased action for the biased receiver.This contradicts the assumption that the biased receiver plays the biased action with positive probability under σ.
Unlike the biased receiver, the unbiased receiver prefers a large δ 2 because she accrues no information cost in the last period.However, if δ 2 is larger than δ * 2 and the unbiased receiver's cost is smaller than her cost in σ * , then the biased receiver would be better off with playing the unbiased action, which contradicts Fact 2.
Claim 4. If the receiver chooses δ 2 and p > q, then σ * is optimal for the unbiased receiver.

Proof of Claim 4. If q *
2 p ≤ q, then σ * is optimal for the unbiased receiver because her cost under σ * is equal to zero.If, on the other hand, q * 2 p > q, then her cost under σ * is equal to 1 − δ * 2 /4.Fix an equilibrium σ = σ * and let δ 2 be played with positive probability by the biased receiver in σ such that after δ 2 the posterior on the biased receiver is p 2 ≥ p.Let q 2 denote the receiver's strategy after the history where δ 2 has been chosen.Fact 2 implies that q 2 ≥ q * 2 p > q.Therefore, the sender provides no information after δ 2 .
If the unbiased receiver also plays δ 2 with positive probability, then her cost is at least (1 − δ 2 ) /4 and the rest of the proof is exactly the same as the proof of Claim 3. If the unbiased receiver does not play δ 2 , then the sender provides no information after δ 2 and the cost of playing δ 2 for the biased receiver is 1/4.Let δ 2 be played by the unbiased receiver with positive probability and c 2 be the cost of miscommunication after δ 2 .Optimality of playing δ 2 for the biased receiver implies that the cost of playing δ 2 must be smaller than the cost of playing δ 2 and then the biased action, i.e., 1/4 ≤ 1 − δ 2 c 2 + δ 2 /4, and hence c 2 = 1/4.This implies that the cost of the unbiased receiver is at least 1 − δ 2 /4.Once δ 2 is replaced with δ 2 , the rest of the proof is the same as the proof of Claim 3. In particular, if δ 2 > δ * 2 and the unbiased receiver's cost is smaller than her cost in σ * , then the biased receiver would be better off with choosing δ 2 and playing the unbiased action rather than choosing δ 2 and playing the biased action.
The key observation in the model where the receiver chooses δ 2 is that when q * 2 p > q, the information cost must be 1/4 after some δ 2 chosen with positive probability by the biased receiver.
This also implies that the information cost is 1/4 after any δ 2 chosen by the unbiased receiver since, otherwise, the biased receiver would rather choose δ 2 .
Remark 4. When p ≤ q, equilibrium σ * is not optimal for the receiver.However, there is an equilibrium which is receiver-optimal and has the same property of starting small.In this equilibrium δ 2 = 1, the sender provides full information in both periods and the biased receiver plays the unbiased action in the first and the biased action in the last period.In other words, the players effectively play a one-period game with the same prior.The costs of both the biased and the unbiased receivers are equal to zero in this equilibrium, while the cost of the biased receiver is at least δ * 2 /4 in equilibrium σ * .
Before we prove that σ * is also optimal for the sender, we will establish that in any equilibrium σ and conditional on any δ 2 and prior belief p, the continuation cost of the sender, which we will denote C S σ|δ 2 , p , is greater than his cost in σ * conditional on the same belief, i.e., C S σ * |p .
Proof of Fact 3. Fix any equilibrium σ and assume that δ 2 < δ * 2 in σ.Fact 1 implies that the biased receiver plays the biased action with probability one in the first period, i.e., q 2 = p.The sender's cost conditional on δ 2 in such an equilibrium is at least pb If p ≤ q, then q * 2 p = c * 2 = 0, which implies that C S σ|δ 2 , p − C S σ * |p > 0. If p > q, then c 2 = 1/4.Therefore, the sender's cost is at least p b 2 + 1/4 , which is strictly greater than his cost in σ * : where the second equality follows from the definition of q * 2 p , the third from the definition of q, and the last inequality from q * 2 p < p and b > 1/2 .Assume now that δ 2 > δ * 2 .If p ≤ q, then q * 2 p = c * 2 = 0 and the sender's cost under σ * is δ * 2 pb 2 .The sender's cost in any equilibrium with δ 2 > δ * 2 is at least δ 2 pb 2 , which is strictly greater than δ * 2 pb 2 .Intuitively, under strategy profile σ * , the sender minimizes the weight of the last period (where the biased receiver plays the biased action with probability one) subject to the constraint δ 2 ≥ δ * 2 , which provides incentives to the biased receiver to choose the unbiased action in the first period.
If p > q, then the sender's cost is where q 2 ≥ q * 2 p by Fact 2. Note that C δ 2 , q 2 , c 2 , c 1 ≥ C δ 2 , q 2 , c 2 , 0 since deleting the communication costs in the last period could only decrease the sender's cost.Also, note that the func- 4 is strictly increasing in q 2 : the sender would rather have the receiver play the biased action later rather than sooner. 19Therefore, we find that Intuitively, the sender would prefer not to put much weight on the last period, where the biased receiver plays the biased action with probability one.Therefore, , which is strictly increasing in q 2 and hence strictly greater than C S σ * |p .
Claim 5. σ * is optimal for the sender.
Proof.If the sender chooses δ 2 , or the receiver chooses δ 2 and both types pool on the same δ 2 , then σ * is optimal for the sender by Fact 3.More generally, suppose that the biased and unbiased receivers play mixed strategies.In particular, consider an equilibrium σ where supp [τ] = δ 2 , δ 2 .Let p(δ ) = p and p(δ 2 ) = p , i.e., p and p are the receiver's reputation levels after δ 2 and δ 2 , respectively. 20  Bayesian consistency implies that τ δ 2 p +τ δ 2 p = p.Fact 3 implies that There are two cases to consider.Case 1: We have q * 2 p ≤ q or we have q * 2 p > q and q * 2 p > q.Alternatively, Case 2: We have q * 2 p > q, q * 2 p ≤ q and q * 2 p > q.We begin with the analysis of Case 1.If q * 2 p ≤ q or if q * 2 p > q and q * 2 p > q, then a direct computation implies that τ δ 19 Note that Bayes' rule implies that the total probability of the biased action is fixed and equal to the prior probability of the biased receiver: p = q 2 + 1 − q 2 p 1 .Therefore, if q 2 decreases p 1 increases and vice versa. 20The argument we provide below is based on the convexity of C (σ * |p) in p and by Jensen's inequality generalizes to the case where the supp [τ] is an arbitrary set.We comment on this further in the proof of the main result in Section 7.
Case 2: We have q * 2 p > q, q * 2 p ≤ q and q * 2 p > q.In this case, the argument that was used in Case 1 cannot be directly applied but we will show, after a small modification, that an argument again based on the convexity of C S (σ * |•) can be used to establish the result.Note that q 2 p ≥ q * 2 p and q 2 p ≥ q * 2 p > q.This is because if q 2 p ≤ q < q * 2 p , then q 1 p > q.This is so because q * 2 p is constructed such that the receiver's reputation in period 1 is exactly q, hence if q 2 p < q * 2 p , then the receiver's reputation in period 1 exceeds q in period 1 after the sender observes the unbiased action in period 2. However, if q 1 p > q, then the sender will not communicate in period 1 and the biased receiver will not play the unbiased action in period 2, contradicting q 2 p < q * 2 p .The biased receiver's cost after δ 2 is equal to 1/4 because the sender does not communicate in period 2, and the biased receiver plays the biased action with positive probability.Hence, for any other δ 2 in the support of τ we must have 1 4 because the biased receiver chooses δ 2 with positive probability.Therefore, c 2 = 1 4 , i.e., the sender does not communicate in the first period after observing δ 2 either.
Case 2 a: Suppose δ 2 < δ * 2 .Fact 1 implies that the biased receiver plays the biased action with probability one in the first period, i.e., q 2 = p .The sender's cost conditional on δ 2 in such an equilib- where the first inequality follows because q 2 (p ) ≥ q * 2 (p ) and the second because the expression is decreasing in δ 2 whenever q * 1 p = q.Therefore, On the other had, if q * 1 p < q, then C S (σ we cannot replace δ 2 by δ * 2 as we did above because p < q and the right-hand side is not necessarily increasing in δ 2 .We instead proceed by picking two new reputation levels q and p such that τ δ 2 q + τ δ 2 p = p.Note that p > q ≥ p and τ δ 2 p + τ δ 2 p = p, therefore p < p ≤ p .We will show Intuitively, by picking these two new reputation levels, we shift some of the probability of playing the biased action from period 2 to period 1 if the probabilities of playing the biased action is given by q * 1 (•) and q * 2 (•).This decreases the sender's expected cost.More precisely, under q * the probabilities of playing the biased action are equal to τ δ 2 p + τ δ 2 1 − q * 2 (p ) q and τ δ 2 q * 2 (p ) in periods 1 and 2, respectively, if the reputation levels are p and p .In contrast, these probabilities are equal to τ δ 2 ) q + τ δ 2 1 − q * 2 ( p) q and τ δ 2 q * 2 ( p) in periods 1 and 2, respectively, under the new pair of reputation levels.Moreover, the total probability of playing the biased action sums to p under both pairs of reputation levels.Therefore, a direct computation shows that the total probability of playing the biased action goes down by p − q 1− q τ δ 2 in period 2 and increases by the same amount in period 1.However, as we have seen before, such a change decreases the sender's cost for any choice of stakes.
Therefore, we find To summarize, if the initial reputation of the receiver is bad, i.e., p > q, then in the unique payoffdominant equilibrium outcome: (1) The relative size of the stakes in the first period is chosen so that the receiver is indifferent between the biased and the unbiased actions for that period, given that the sender will communicate truthfully after the unbiased action and will provide no information otherwise; (2) The receiver mixes in such way that her reputation next period is just good enough to make truthful communication possible; (3) A larger share of the stakes is left to the future, i.e., starting small is the unique payoff-dominant equilibrium outcome.
Remark 5.In order to understand the role of reputational concerns, it might be instructive to compare the equilibrium we have presented above with the scenario in which the sender learns the receiver's type.Assuming that b > 1/2, once the sender learns that the receiver is biased, he will provide no information and the receiver will play the biased action in each period.Therefore, the cost of the sender will be b 2 + 1/4.If, on the other hand, the sender learns that the receiver is unbiased, then the sender's cost will be zero in every period.Therefore, if the sender knows that he will learn the type of the receiver, his ex ante cost is p b 2 + 1/4 .As we have shown above, this cost is higher than his cost in our equilibrium (see the inequalities in (3.5)).Intuitively, in our equilibrium, the probability of the biased action is smaller in each period, which directly benefits the sender, and also leads to higher probability of truthful communication in the future.This is the main reason why the sender does not prefer to screen the receiver early on in the game.
Remark 6.We should also note that, for our results, it is not necessary that the sender and the receiver share the same δ 2 .The main role of δ 2 is to provide incentives to the biased receiver to play the unbiased action with positive probability.It is easy to show that if the sender has a fixed δ 2 and the choice variable is the size of the stakes for the receiver, then our main result still goes through, i.e., σ * is still the payoff-dominant equilibrium.
Remark 7. In the analysis above we have restricted the search for payoff-dominant equilibrium to those in which the Markov property, i.e., Property 2, holds.It is easy to adapt the arguments in the text and show that, in any payoff-dominant equilibrium, the receiver plays the biased action with the same probability after any message sent with positive probability.In other words, the symmetry part of the property is not binding for our results in the two-period model.
If, however, we allow the sender to coordinate her communication strategy across periods, then the incentive compatibility constraint for telling the truth is relaxed.When the Markov property holds, the binding constraint for telling the truth in the first period is given by q 2 b 2 ≤ q 2 b 2 + 1 − 2q 2 b, or q 2 ≤ q, where q 2 is the total probability with which the agent plays the biased action.Without the Markov restriction, this constraint is given by which is equivalent to Otherwise, the non-Markov payoff-dominant equilibrium is exactly the same as the Markov equilibrium.In particular, q 2 = 1 − 1 − p / 1 − q in any payoff-dominant equilibrium, both Markov and non-Markov.Since, q(δ) < q for any b > 1/2, truth-telling is optimal for a larger set of prior probabilities in the non-Markov payoff-dominant equilibrium.

THE MAIN RESULT
In this section, we will show that the main results we have obtained in the two-period version of the model go through in the general model with N periods: There is a unique sender-optimal equilibrium outcome and this outcome is also optimal for the receiver as long as the receiver has a sufficiently bad initial reputation.In other words, for sufficiently bad initial reputation levels, there is a unique equilibrium outcome that Pareto dominates all other equilibrium outcomes.If the potential conflict of interest is large enough, i.e., b > 1/2, this equilibrium outcome is characterized by "starting small," i.e., increasing the stakes over time as long as the receiver does the "right thing."We also show that as the receiver's potential bias b increases, the initial stakes become smaller but they grow faster.
In order to facilitate the definition of the payoff-dominant equilibrium we first need some preliminary definitions.Let δ * 1 = 0 and define δ * i recursively as For any p ∈ [0, 1], let21 Define the set of period i histories H * i as follows.H * N = { } and, for any i = N − 1, . . ., 1, a history (2) for all j = N −1, . . ., i +1, period j outcome, o j = δ j , θ j , m j , a j , is such that δ j = δ * j and a j = m j .In other words, a history belongs to H * i if in each previous period j , δ * j has been chosen and the receiver played the unbiased action after each message believing that the sender was telling the truth (except in period N , where she believes the sender is telling the truth if and only if doing so is sequentially rational for the sender).
We will define the payoff-dominant equilibrium assessment σ * = τ i , µ i , q i , p i , λ i for the game in which the sender chooses the allocation of decisions and will describe later how it is different in the receiver-game.After each history in H * i , the sender chooses δ * i and after any other history he chooses δ i = 0, i.e., If h ∈ H * i and δ * i has been chosen in period i , then the receiver's total probability of playing the biased action is equal to q * i p i (h) , where q * i is defined in (4.3); otherwise, the biased receiver plays the biased action with probability one, i.e.,22 The sender communicates truthfully in period i as long as the total probability of the biased action in that period is less than or equal to q, the history belongs to H * i , and δ * i has been chosen.Since we assumed type θ i = 0 sender sends message m i = 0, this implies that the probability with which type In any period i , the unbiased action is given by a 0 (λ i (h, δ i , m)), where Beliefs on the receiver's type are defined as follows: p N ( ) = p and Note that if the players play according to σ * up to and including period i + 1, which implies that i is chosen or if in period i the receiver plays an action that is different from the unbiased action, i.e., plays a i = m i .In that case, the sender assigns probability one to the event that the receiver is biased, provides no information, and terminates the game by choosing The assessment σ * in the receiver-game differs from the above definition as follows: Replace the definition of τ i given in (4.4) with and add In other words, both types of the receivers choose δ * i as long as the history is cooperative and the sender's belief does not change after observing δ * i , while it assigns probability one to the biased receiver after any other δ i .
Theorem 1.The assessment σ * is a perfect Bayesian equilibrium and induces the unique senderoptimal equilibrium outcome.If p > 1 − (1 − q) N −1 , then σ * is also a receiver-optimal equilibrium and therefore payoff-dominant.
In the equilibrium σ * described above, the sender (or the receiver) leaves a proportion δ * i of the total stakes in the relationship to subsequent periods on the equilibrium path.This proportion leaves the receiver exactly indifferent between the biased and the unbiased actions in period i , given that in each subsequent period, the sender communicates truthfully after observing the unbiased action in all prior periods and provides no information otherwise.
In order to further describe the equilibrium, let us first focus on the case in which σ * is payoffdominant, i.e., p > 1−(1− q) N −1 .In this case the receiver plays the biased action with total probability equal to 1−(1−p)/(1− q) N −1 in period N and plays the biased action with total probability q thereafter.The sender reports the state truthfully in every period except possibly the first period, i.e., period N in our notation.In period N , total probability of the biased action may exceed q and if this is the case the sender communicates no information.In other words, if the receiver has a sufficiently bad initial reputation, then informative communication may fail in the first period, i.e., period N , but communication is fully informative thereafter.
If p ≤ 1−(1− q) N −1 , then the sender-optimal equilibrium in not receiver-optimal anymore.In the sender-optimal equilibrium σ * , the receiver plays the unbiased action with probability one until the game reaches period k, where k is the first period (largest integer) such that p > 1 − (1 − q) k−1 .The receiver plays the biased action with total probability equal to 1−(1−p)/(1− q) k−1 ≤ q in period k and plays the biased action with total probability q in all subsequent periods.The receiver's reputation remains constant and equal to 1−p until the game reaches period k and then monotonically increases in each period to reach exactly 1 − q in the last period of the game.The sender reports the state truthfully in every period after observing the unbiased action.
Figure 1 plots the importance parameter (γ i ), reputation of the receiver (1 − p i ), and the total probability with which the receiver plays the biased action (q i ) for each period i , when the bias is equal to 1, the prior on b is 0.9, and total number of periods is 10.Note that q = 1/2 and hence k = 4.
Consider an equilibrium σ * * where in all periods N through i * (p) + 1, the stakes are chosen to be zero, i.e., δ i = 1, the sender communicates truthfully, and the receiver plays the unbiased action with probability one.Starting in period i * (p), the equilibrium σ * * then follows σ * .Lemma 6 in the Appendix proves that this equilibrium is receiver-optimal irrespective of the identity of the player who chooses the stakes.In this equilibrium, players act as if the game effectively begins in period i * p .Notice that this equilibrium also entails starting small.The unbiased receiver is indifferent between this equilibrium and σ * because the sender communicates truthfully in every period.On the other hand, the biased receiver's payoff in equilibrium σ * * strictly exceeds her payoff in equilibrium σ * .This is because the stakes are chosen to be zero in exactly those periods in which the unbiased action is played with probability one in equilibrium σ * .That is, the receiver maintains her reputation at zero cost in periods N through i * (p) + 1.
Remark 9.In Remark 7, we have commented on how the results would change in the two-period model if we remove restrictions imposed by the Markov property, i.e., Property 2. For N > 2, we conjecture that any payoff-dominant non-Markov equilibrium involves similar behavior to that in the Markov equilibrium, except that truth-telling constraints are relaxed, i.e., in (4.6) q is replaced with a properly defined qi (δ i , δ i −1 , . . ., δ 2 ) > q in a similar way as in the two-period model.
We should also note that, if the sender is a short-run player, i.e., the receiver faces a series of senders each of whom interacts with the receiver for only one period, then our main results go through in the general model without imposing this Markovian restriction.
Remark 10.We have assumed that the unbiased receiver is a commitment type who plays her myopic best response in each period.The equilibrium we have specified remains an equilibrium even if the unbiased receiver is a fully strategic player.However, there may be other Markovian equilibria where the unbiased receiver plays actions other than the unbiased action with positive probability.

Allocation of Stakes.
We are now ready to answer the main question that motivated our model: What is the equilibrium path of the stakes or the importance of the decisions?Proposition 1. Suppose that play unfolds according to equilibrium σ * and denote by γ * i the proportion of the total stakes in period i in this equilibrium.If b > 1/2, then the equilibrium path is characterized by progressively larger stakes, i.e., γ * As b increases, the initial stakes become smaller but they grows faster, i.e., γ * j /γ * i is increasing in b for all j < i .
Let a = 4b 2 .The proof of Proposition 1 shows that The proportion of the stakes in the first period, i.e., period N , is given by , and then each subsequent proportion is just a times the previous one.If a > 1, i.e., b > 1/2, this implies that each period receives more weight than the previous one.More precisely, the growth rate of γ i is equal to a − 1 > 0, i.e., the greater the potential bias of the receiver, the higher the growth rate of the stakes.If, on the other hand a < 1, i.e., b < 1/2, then the size of the stakes decreases over time.
In order to gain some intuition for this result let us consider the two-period model where the first period's proportion is γ 2 and the last period's γ 1 .If the biased receiver plays the biased action in the first period, then her cost is γ 1 /4.If she plays the unbiased action, then her cost is γ 2 b 2 , assuming that after playing the unbiased action she receives full information as she does in our equilibrium.
Therefore, the receiver is indifferent between the biased and the unbiased actions if and only if γ 1 /4 = γ 2 b 2 or γ 1 = aγ 2 .
Figure 2 plots the evolution of the stakes over time for three different bias parameters, two of which are greater than 1/2 and one is smaller than 1/2.Observe that, when the potential bias is larger, the stakes are initially smaller but they grow faster.This is because the larger the bias the more important the future must become in order for the biased receiver to play the unbiased action.Since the total size of the stakes is normalized to one, this implies smaller stakes at the beginning and a higher growth rate.
Finally, we can show that the equilibrium costs of the players are decreasing in the number of periods N .
Proposition 2. Cost of the sender strictly decreases in N in any sender-optimal equilibrium.Cost of (both types of) the receiver is decreasing in N in any receiver-optimal equilibrium.
This result implies that, if the sender or the receiver had a choice over the number of periods over which to spread the total stakes available in their relationship, then they would choose as many periods as possible.Of course, this neglects any cost of time, which would act as a countervailing force.

RELATED LITERATURE
The main issue addressed in this paper, i.e., dynamic allocation stakes, has been analyzed in prisoners' dilemma type environments before.Watson (1999) and Watson (2002) analyze an infinitely repeated prisoners' dilemma type game with incomplete information and variable stakes over time.
In the stage game, the "low" type player prefers to "betray," which benefits herself, injures the other player, and ends the game, while the "high" type prefers cooperation as long as the other player also cooperates.He shows that starting with small stakes supports perpetual cooperation between the high types as an equilibrium outcome.Watson (2002) assumes that players commit to the way stakes change over time while Watson (1999) characterizes an equilibrium in which the stakes satisfy a renegotiation condition.Andreoni and Samuelson (2006) study a twice repeated prisoners' dilemma game with incomplete information and variable stakes.Players are conditional cooperators in the sense that, in the stage game, a type-α player prefers to cooperate if she believes that the other player cooperates with at least α probability. 23They characterize the equilibria of this game with exogenously given stakes and show that "starting small" leads to the best payoffs for the players. 24The main point of departure of our model from these papers is that, in contrast to a prisoners' dilemma game, our stage game is a game of strategic communication that exhibits common interest as well as conflict of interest.Furthermore, we assume that one of the players has the authority to determine the stakes involved in their relationship and analyze how they are determined in equilibrium. 2523 They allow α to be negative or greater than one, which corresponds to unconditional cooperators or defectors, respectively.
24 Andreoni and Samuelson (2006) also test their theory experimentally and find empirical support for their predictions.Andreoni et al. (2016) extend this paper so that players choose the stakes themselves in the experiment.They show that the subjects indeed choose the payoff maximizing strategy of starting small.Other papers that feature gradualism as an optimal or equilibrium outcome include Marx and Matthews (2000), Blonski and Probst (2004), and the loan model in Section 6 of Sobel (1985). 25We also show that gradualism is not always the best equilibrium arrangement for the sender.Indeed, if the potential conflict of interest between the receiver and the sender is small enough, then the opposite arrangement of "starting big" turns out to be optimal for the sender.
Our paper is also related to Aghion et al. (2004), which shows that, if control rights are not contractible, then transferring them unconditionally to the agent and learning her willingness to cooperate could be the optimal arrangement for the principal.In their model, the principal would like to learn the agent's type and the "bad" agent has no incentive for reputation, while in our case the "bad" receiver has an incentive to maintain a "good" reputation and the sender does not prefer to screen the agent types because doing so hinders communication. 26  In each period of our model, the sender and the receiver are involved in a cheap-talk game, which has been introduced by Crawford and Sobel (1982).They analyze the equilibrium communication behavior between an informed but biased sender and an uninformed receiver and show that the informativeness of equilibrium decreases in the degree of the sender's bias.There are two main differences between Crawford and Sobel (1982) and our model: (1) The degree of preference divergence between the sender and receiver is the private information of the receiver; (2) The game is repeated, where in each period a new state of the world is realized but preferences remain the same.
Morris ( 2001) also differs from Crawford and Sobel along those two dimensions.The main difference is that in Morris (2001) the bias is the private information of the sender whereas in our model it is the private information of the receiver.Morris (2001) finds that the unbiased sender, who prefers to inform the receiver about the state of the world, may choose not to do so in the first period in order to be regarded as unbiased and hence better inform the receiver in the future.In contrast, in our model, the biased receiver may mimic the unbiased receiver in order to maintain a good reputation and receive better information in the future.Furthermore, we analyze the equilibrium allocation of the stakes over time whereas it is exogenously given in Morris (2001).Morgan and Stocken (2003) analyzes a one period cheap-talk game with a sender with uncertain preferences, whereas Sobel (1985) and Benabou and Laroque (1992) are earlier papers that analyze repeated cheap-talk games, except that they assume that the unbiased (or good) sender always tells the truth.Li and Madarász (2008) extend Morgan and Stocken (2003) so that the bias can be in either direction and compare equilibria under known and unknown biases, while Dimitrakas and Sarafidis (2005) allow the bias to have an arbitrary distribution.Our model differs from these papers in that we assume the bias is receiver's private information and that the cheap-talk game is repeated.
Another related paper is Ottaviani and Sørensen (2001) in which a sequence of privately informed experts, who are exclusively concerned about their reputation for being well-informed, offer public advice to an uninformed receiver.They show that reputational concerns may lead to herding by the experts. 27Our model can also be framed as a model of sequential cheap-talk with multiple experts (senders) but we have a receiver who is privately informed about the preference divergence between herself and the experts, and it is the receiver who is concerned about reputation. 28  Our work is also related to the literature on pandering.Maskin and Tirole (2004) analyze a twoperiod model where in the first period an official chooses a policy, which determines whether she stays in office in the second period.They show that if the official's desire to stay in office is sufficiently strong, then in the first period she could choose a popular action, i.e., she could pander to public 26 Also related is Halonen (1997), which shows that a joint ownership structure and a concern for reputation may help solve the hold-up problem and implement the first best in a (twice) repeated game with incomplete information. 27Also see Ottaviani and Sørensen (2006a,b) in which an expert with reputational concerns (but no bias) fails to provide full information to the receiver. 28There are other models in which multiple experts with known biases are involved in simultaneous or sequential cheaptalk, among which are Gilligan and Krehbiel (1989), Austen- Smith (1990), and Krishna and Morgan (2001).
opinion even if she does not think that the public opinion is the optimal policy.In our model, incentives to pander come from the desire to receive better information rather than the desire to stay in office. 29nother related strand of literature is the one on career concerns pioneered by Holmström (1999), in which an employee's concern about her reputation for talent leads her to exert costly effort even without explicit incentives provided by a contract. 30In our model, concern for reputation for being unbiased arises from the receiver's incentives to obtain accurate information and leads her to act in the interest of the sender.

CONCLUDING REMARKS
We have analyzed a model of repeated cheap-talk in which the conflict of interest is the private information of the receiver and either the sender or the receiver can determine how the stakes involved in their relationship evolve over time.We find that, if the potential conflict of interest is large, then the stakes increase over time, i.e., "starting small" or "gradualism" is the unique (payoff-dominant) equilibrium arrangement.
Basically, the stakes are designed in order to utilize the receiver's incentives to build reputation for being unbiased and facilitate communication.In equilibrium, the receiver mixes between her own and the sender's favorite action and the sender communicates truthfully with the receiver throughout their relationship as long as the receiver always does the "right thing." We also showed that if the potential bias is small, i.e., smaller than 1/2, then this pattern is reversed and the size of the stakes decreases over time.Note that if b < 1/2, or more generally the initial reputation of the receiver is good enough to make truthful communication possible even in the oneshot game, i.e., pb < 1/2, the sender-optimal equilibrium is not receiver-optimal.In fact, in this case, the sender-optimal equilibrium is sustained by the sender's off-the-equilibrium threat to communicate no information if the receiver plays the biased action.This threat is perfectly credible when b > 1/2, because the only equilibrium behavior once the receiver is revealed to be biased is to reveal no information.The same is not true when b < 1/2.Since the sender prefers truthful communication ex-ante, such a threat may be regarded as non-credible.If that is the position one takes, then our results should be deemed most convincing and interesting for those cases in which the potential bias of the receiver is large enough.
The current work raises many other questions and could be extended in a number of ways.For example, what would be the equilibrium path of stakes in a situation where reputational concerns create perverse incentives as in Morris (2001), Ely and Välimäki (2003), Maskin and Tirole (2004), or Kartik and Van Weelden (2015)?As opposed to what happens in our model, would it be optimal to front-load the decisions in order to avoid such perverse incentives?More technical extensions include richer type spaces for the players, but our preliminary analyses of such models have so far proved non-trivial.

PROOFS
Proof.[Proof of Theorem 1] We start with the proof of Lemma 2.
Proof.[Proof of Lemma 2] Let λ m = λ i (h, δ i , m), q = q (h, δ i ), and note that the period cost of sending message m for type θ ∈ {0, 1} is Property 2 implies that the continuation payoff does not depend on the message and hence only the period payoff matters for sequential rationality of the sender.As it is always the case in cheap-talk models, there is always an equilibrium in which the sender's strategy is completely uninformative irrespective of his beliefs, the so called "babbling equilibrium."Suppose that in equilibrium the sender provides full information to the receiver.Sequential rationality of type θ i = 0 is always satisfied, whereas sequential rationality of type plays a completely mixed strategy, then λ 2 1 + 2λ 1 qb = λ 2 0 + 2λ 0 qb, which implies λ 0 = λ 1 .This implies that both types mix with equal probabilities and hence the sender's strategy is non-informative.Therefore, in any other type of equilibrium behavior, type 0 must be playing a pure strategy while type 1 completely mixes.Suppose, without loss of generality, that type 0 sends message 0. This implies that λ 1 = 1 and λ 0 ∈ (0, 1/2).It is easy to show that type 0's sequential rationality is satisfied while type 1's sequential rationality implies that λ 0 = 1 − 2qb, which, in turn, implies that 1/4 < qb < 1/2.Lemma 3. Suppose that the sender chooses the allocation δ, then the assessment σ * is a perfect Bayesian equilibrium.
Proof.Fix a history h ∈ H * 1 and note that under σ * the biased receiver plays the biased action with probability one and the sender provides information if and only if p 1 (h) ≤ q.The receiver's strategy is sequentially rational since period 1 is the last period and the sender's strategy is sequentially rational by Lemma 2.
Let i > 1 and fix a history h i ∈ H * i such that p i (h i ) < 1.If δ i = δ * i , then under σ * the receiver is indifferent between the biased and unbiased actions after any message m i .In order to see this, note that it is true for i = 2, as we previously showed in section 3.2.Suppose that it is true in period i − 1. If, in period i , the receiver chooses the biased action, then she induces a history that is not in , which implies that in period i −1 the sender chooses δ i −1 = 0, provides no information, and the receiver plays the biased action.Therefore, the cost of playing the biased action is δ * i /4 .If she plays the unbiased action instead, then she suffers a cost equal to b 2 in period i but induces a history in In the next period, the sender choses δ * i −1 and provides full information.Under the induction hypothesis, her expected cost starting from period i − 1 is equal to δ * i −1 /4, i.e., the cost of playing the biased action in period i − 1.Therefore, the cost of playing the unbiased action in period i is equal to which, in turn, implies that she is indifferent between the biased and unbiased actions in period i .
Therefore, playing the biased action with total probability q * i p i (h) is optimal after such histories.
Lemma 2 implies that the communication strategy of the sender (see (4.6)) is sequentially rational.
If the sender chooses δ i = δ * i , then in period i he provides no information and the biased receiver plays the biased action.In period i − 1, he chooses δ i −1 = 0, provides no information, and the biased receiver again plays the biased action.Therefore, his expected cost of choosing δ i = δ * i is equal to If he chooses δ * i , then the receiver plays the biased action with total probability q * i p i (h) ≤ p i (h), and this cannot lead to a higher expected cost.Therefore, it is sequentially rational for the sender to choose δ * i .If h i ∉ H * i or p i (h) = 1, then the biased receiver plays the biased action with probability one.This is sequentially rational because the sender provides no information in any subsequent period.
The sender is willing to provide no information because babbling is always an equilibrium of the cheap-talk game.Moreover, it is sequentially rational for the sender to choose δ i = 0 because his continuation payoff is equal to p i (h i )b 2 + 1 4 and independent of his choice of δ i .Finally, it is straightforward to check that the beliefs defined in (4.7) and (4.8) satisfy the Bayes' rule whenever it can be applied conditional on reaching any h ∈ H i .
Lemma 4. Suppose that the receiver chooses the allocation δ, then the assessment σ * is a perfect Bayesian equilibrium.
Proof.Fix a history h ∈ H * 1 and note that under σ * the biased receiver plays the biased action with probability one and the sender provides information if and only if p 1 (h) ≤ q.The biased receiver's strategy is sequentially rational since period 1 is the last period and the sender's strategy is sequentially rational by Lemma 2.
Let i > 1 and fix a history h i ∈ H * i such that p i (h i ) < 1.If δ i = δ * i , then under σ * the biased receiver is indifferent between the biased and unbiased actions after any message m i .In order to see this, note that it is true for i = 2, as we previously showed in section 3.2.Suppose that it is true in period i − 1.
If, in period i , the receiver chooses the biased action, then she induces a history that is not in H * i −1 , which implies that in period i − 1 the receiver chooses δ i −1 = 0, the sender provides no information, and the biased receiver plays the biased action.Therefore, the cost of playing the biased action is δ * i /4 .If she plays the unbiased action instead, then she suffers a cost equal to b 2 in period i but induces a history in H * i −1 .In the next period, the receiver choses δ * i −1 and the sender provides full information.Under the induction hypothesis, her expected cost starting from period i − 1 is equal to δ * i −1 /4, i.e., the cost of playing the biased action in period i −1.Therefore, the cost of playing the unbiased action which, in turn, implies that she is indifferent between the biased and unbiased actions in period i .Therefore, playing the biased action with total probability q * i p i (h) is sequentially rational after such histories.Moreover, Lemma 2 implies that the communication strategy of the sender (see (4.6)) is sequentially rational because q * i p i (h) ≤ q.
If the receiver chooses δ i = δ * i , then in period i the sender provides no information and the biased receiver plays the biased action.In period i − 1, the receiver chooses δ i −1 = 0, the sender provides no information, and the biased receiver again plays the biased action.Therefore, the expected cost of choosing δ i = δ * i is equal to 1/4 for both the biased and unbiased receivers.The expected cost of choosing δ * i is equal to zero for the unbiased receiver which is clearly lower than the cost 1/4 which results from choosing δ i .The expected cost of choosing δ * i is equal to δ * i 1 4 for the biased receiver because this receiver is indifferent between the biased and unbiased actions following δ * i .However, this cost is clearly lower than 1/4 which results from choosing , then the biased receiver plays the biased action with probability one.This is sequentially rational because the sender provides no information in any subsequent period.
The sender is willing to provide no information because babbling is always an equilibrium of the cheap-talk game.Moreover, it is sequentially rational for the receiver to choose δ i = 0 because his continuation payoff is equal to 1/4 and independent of her choice of δ i .
Finally, it is straightforward to check that the beliefs defined in (4.7) and (4.8) satisfy the Bayes' rule whenever it can be applied conditional on reaching any h ∈ H i .
Proof.We first argue that σ * is receiver optimal for the biased receiver.Claim 6. Suppose that the sender chooses the parameter δ.We argue that the assessment σ * is biased receiver optimal.
Proof.Assume that the assessment σ * is receiver optimal for all reputation levels p > 1 − (1 − q) i −1 in the i stage communication game.Under this induction hypothesis, we show that σ * is receiver optimal in the i +1 stage communication game for all reputation levels p > 1−(1− q) i , i.e., the cost under assessment σ * is smaller than the cost under any other assessment σ given the induction hypothesis.
Fix an assessment σ = σ * and assume, on the way to a contradiction, that Step 1. Suppose that the receiver plays the biased action with total probability q i +1 > q in period i + 1.This implies that the receiver's cost is 1/4 since the sender cannot communicate in period i + 1 given that q i +1 > q.However, the receiver's cost under σ * is at most 1/4.
Step 2. On the other hand if q i +1 ≤ q, then p > 1 − (1 − q) i and Bayes' rule together imply that the receiver's reputation in period i , p i , is strictly greater than 1 − (1 − q) i −1 .There are two cases to consider: 4 where h i is the history under which the unbiased action is played in period i .However, this would contradict Therefore, δ i +1 < 1.In order for the biased receiver to play the unbiased action in period i we must have, 1 leading to a contradiction.
Case 2. q i +1 ≥ q * i +1 .The facts that q i +1 ≥ q * i +1 and q i +1 ≤ q together imply that q * i +1 ≤ q.If q * i +1 ≤ q, then the cost under σ * is equal to δ * i +1 /4.The cost under σ is at least δ i +1 /4 because the receiver must play the biased action with positive probability in period i +1.The fact that the receiver is indifferent between the biased and the unbiased action implies the following equalities: where C R i (σ) is the biased receiver's cost in the continuation game under the strategy σ.The induction hypothesis and Therefore showing that the cost under σ exceeds the cost under σ * .We now complete the inductive argument by showing that σ * is optimal for the biased receiver for i = 2 if p > 1 − (1 − q) 2−1 = q.This conclusion follows immediately from the argument above because for any two strategy profiles σ and σ * we have Claim 7. Suppose that the receiver chooses the parameter δ.We argue that the assessment σ * is biased receiver optimal.
Proof.Assume that the assessment σ * is receiver optimal for all reputation levels p > 1 − (1 − q) i −1 in the i stage communication game.Under this induction hypothesis, we show that σ * is receiver optimal in the i + 1 stage communication game for all reputation levels p > 1 − (1 − q) i .Fix an assessment σ = σ * and assume, on the way to a contradiction, that C R i +1 (σ) < C R i +1 (σ * ).Let δ i +1 be a δ in the support of σ such that p i +1 ≥ p. Bayes' rule implies that there must be such a δ i +1 in the support of σ.Let q i +1 denote the receiver's strategy after the history where δ i +1 has been chosen.
Suppose that the receiver plays the biased action with total probability q i +1 > q in period i + 1.This implies that the receiver's cost is 1/4 since the sender cannot communicate in period i + 1 given that q i +1 > q.However, the receiver's cost under σ * is at most 1/4.On the other hand if q i +1 ≤ q, then p i +1 > 1 − (1 − q) i and Bayes' rule together imply that the receiver's reputation in period i , p i , is strictly greater than 1−(1− q) i −1 .There are two cases to consider: Case 1. q i +1 < q * i +1 p i +1 where q * i +1 p i +1 is the probability of playing the biased action under σ * if the initial reputation is equal to p i +1 and given that δ * i +1 is chosen by the receiver.If q i +1 < q * i +1 p i +1 , then q * i (p i ) > q by construction and therefore C R i (σ * ) = 1 4 .Note that if δ i +1 = 1, then, by the induction hypothesis, 4 where h i is the history under which the unbiased action is played in period i .However, this would contradict our initial hypothesis that Therefore, δ i +1 < 1.In order for the biased receiver to play the unbiased action in period i we must have, 1 ) by the induction hypothesis we have leading to a contradiction.
Case 2. q i +1 ≥ q * i +1 p i +1 where q * i +1 p i +1 is the probability of playing the biased action under σ * if the initial reputation is equal to p i +1 and given that δ * i +1 is chosen by the receiver.Note that q * i +1 p i +1 ≥ q * i +1 p because p i +1 ≥ p.The facts that q i +1 ≥ q * i +1 p i +1 ≥ q * i +1 p > 0 and q i +1 ≤ q together imply that q * i +1 p ≤ q.If q * i +1 p ≤ q, then the cost under σ * is equal to δ * i +1 /4.The cost under σ is δ i +1 /4 because the receiver plays the biased action with positive probability in period i + 1.The fact that the receiver is indifferent between the biased and the unbiased action implies the following equalities: where C R i (σ) is the biased receiver's cost in the continuation game under the strategy σ.The induction hypothesis and showing that the cost under σ exceeds the cost under σ * .Claim 8. Suppose that the sender chooses the parameter δ.We argue that σ * is receiver optimal for the unbiased receiver.
Proof.If q * N ≤ q, then σ * is optimal for the unbiased receiver because her cost under σ * is equal to zero since the sender communicates truthfully in each period.
If, on the other hand, q * N > q, then her cost under σ * is equal to (1 − δ * N )/4 because the sender communicates truthfully in each period except period N .Choose any other assessment σ.
Case 1. Suppose that the receiver plays the biased action with total probability q N ≤ q.However, then q * N > q, and q N ≤ q together imply that q j > q for some j < N .However, then the sender will communicate no information in period j .This implies that, the biased receiver will play the biased action in period j + 1 with probability one.However, this further implies that the sender will communicate no information in period j + 1 either.Working backwards in this way, we find that biased receiver will play the biased action in each period i > j contradicting our initial hypothesis that q N ≤ q.
Case 2. Suppose that the receiver plays the biased action with total probability p > q N > q.In this case, the sender communicates no information in period N .Therefore, the unbiased receiver's cost , that is, the unbiased receivers cost under σ * is at most the cost under σ.
The biased receiver is indifferent between the biased action and the unbiased action in period N .
Therefore, playing the biased action in period N is a best response for the receiver given the senders strategy under σ and must result in a cost that is less than or equal to any other strategy that the receiver may use.In particular, the cost must be less than or equal to the cost of playing the unbiased action in each period i ∈ {N , î + 1} and then playing the biased action in period î .Therefore, where the second inequality follows because c i ≥ 0 for all i .
The fact that the biased receiver is indifferent between the biased and unbiased action in each period i > 1 under strategy σ * implies that 4 and the equality displayed above together imply that The facts that δ i > δ * i for i > î and δ î ≤ δ * î together imply that The first inequality above follows because δ î ≤ δ * î .The second inequality above follows because γ i and because 1 4 δ î ≤ b 2 (since b 2 ≥ 1 4 and δ î ≤ 1).However, the above inequality implies that which leads to a contradiction.
Claim 9. Suppose that the receiver chooses the parameter δ.We argue that σ * is receiver optimal for the unbiased receiver.
Proof.If q * N ≤ q, then σ * is optimal for the unbiased receiver because her cost under σ * is equal to zero since the sender communicates truthfully in each period.
If, on the other hand, q * N > q, then her cost under σ * is equal to 1 − δ * N /4 because the sender communicates truthfully in each period except period N .Choose any other assessment σ.Let δ N be a δ in the support of σ such that 1 > p N ≥ p. Bayes' rule implies that there must be such a δ N in the support of σ.Let q N denote the receiver's strategy after the history where δ N has been chosen.Case 1. Suppose that the receiver plays the biased action with total probability q N ≤ q after each δ N in the support of σ.However, then q * N > q, and q N ≤ q together imply that q j > q after some δ j for j < N in the support of σ.However, then the unravelling argument detailed further above implies that q N = p > q leading to a contradiction.Case 2. Suppose that the biased receiver chooses δ N and plays the biased action with total probability p N > q N > q.If δ N is in the support of the unbiased receiver's strategy, then the unbiased receiver's cost under σ is at least (1 − δ N ) /4.Alternatively, suppose that the biased receiver chooses a δ N under σ that is not in the support of the unbiased receiver.In this case, the sender communicates no information in period N .Let δ N denote any choice of δ in the support of the the unbiased receiver. Then and therefore c N = 1 4 .Hence, the unbiased receiver's cost under σ is at least , that is, the unbiased receivers cost under σ * is at most the cost under σ.Suppose instead that δ N > δ * N .Let î denote the largest i < N such that δ i ≤ δ * i for some δ i in the support of the unbiased receiver.Note that î is possibly equal to one where δ 1 = δ * 1 = 0. Also, note that we have δ i > δ * i for i > î and δ î ≤ δ * î by construction.
Suppose that δ N is in the support of the unbiased receiver.Therefore, choosing δ N and playing the biased action in period N is a best response for the biased receiver given the senders strategy under σ and must result in a cost that is less than or equal to any other strategy that the biased receiver may use.In particular, the cost must be less than or equal to the cost of choosing δ N in the support of the unbiased receiver and playing the unbiased action in each period i ∈ N , î + 1 and then playing the biased action in period î .Therefore, where the second inequality follows because c i ≥ 0 for all i .
Alternatively, suppose that the biased receiver chooses a δ N under σ that is not in the support of the unbiased receiver.In this case, the sender communicates no information in period N .Let δ N denote any choice of δ in the support of the the unbiased receiver.Choosing δ N is a best response for the biased receiver given the senders strategy under σ and must result in a cost that is less than or equal to any other strategy that the biased receiver may use.In particular, the cost must be less than or equal to the cost of choosing δ N in the support of the unbiased receiver and playing the unbiased action in each period i ∈ N , î + 1 and then playing the biased action in period î .Therefore, The fact that the biased receiver is indifferent between the biased and unbiased action in each period i > 1 under strategy σ * implies that 4 and the equality displayed above together imply that The first inequality above follows because δ î ≤ δ * î .The second inequality above follows because γ i and because 1 4 δ î ≤ b 2 (since b 2 ≥ 1 4 and δ î ≤ 1).However, the above inequality implies that which leads to a contradiction.
Claims 1-4 complete the proof of the lemma.
The assessment that chooses τ i = δ i = 1 in all periods N through i * (p) and then follows σ * is receiver optimal irrespective of the identity of the player that chooses δ.
Proof.First note that the above specified assessment is optimal for the unbiased receiver because the sender communicates truthfully in each period where γ i > 0.
Choose an assessment σ.Let σ * * denote the above specified assessment.Assume that the above described assessment is receiver optimal for all N − 1 period games.Note that because the receiver is indifferent between the unbiased and biased action in each period 1 < i ≤ i * (p) and the receiver plays the biased action in period 1.
If the receiver plays the biased action with probability q N > q in period N , then the receiver's cost under σ is equal to 1/4 which is clearly worse than her cost under the above specified assessment.
If q N = 0, then the cost under σ is clearly higher than the cost under σ * * .This is because Moreover, the receiver is indifferent between the biased and unbiased actions in each period i ≤ i * .Therefore, Suppose that q * i * ≤ q N ≤ q.The fact that q N ≤ q implies that i The fact that the receiver is indifferent between the biased and unbiased action in period N implies that concluding the proof.
Lemma 7.For an N period game suppose that σ = τ i , µ i , q i , p i is such 1.The belief p j = 1 for some j ≤ N and p i < 1 for all i > j , 2. The choices q i and µ i sequentially rational under σ 3. The probability q i (h) ≤ q for all i ≥ j where only the unbiased action has been played in history h and h is a history which has positive probability under σ 4. The communication strategy µ i entails truthful communication in each period N > i ≥ j 5.The total probability of playing the biased action in periods N through j is equal to p 6.The γ i implied by τ i is equal to zero for all periods i < j .
Then C S N − j +1 (σ) ≥ C S N (σ * ) where the initial reputation is equal to p.
Proof.We will prove this for the case where the period where p j = 1 is taken as period 1.This is without loss because we can take N * = N − j + 1.Moreover, a simple calculation shows C S N − j +1 (σ * ) ≥ C S N (σ * ).The assessment σ that satisfies 1-6 is feasible for the following minimization problem.
γ j )q i = 0 for all i > 1 (7.2) In the optimization above γ i = N j =i +1 δ j +1 (1 − δ i ) and q i is the total probability of playing the biased action in periods i and 1 {q N > q} is the indicator function which is equal to one if q N > q and zero otherwise.The objective function is the total cost of the sender under the assumption that the sender communicates truthfully in every period except possibly period N .Constraint (7.1) states that playing the biased action is at least as costly as the unbiased action in every period except the last, i.e., cancelling i j =1 γ j from both sides we obtain the constraint.This constraint must hold, because if it did not, then the receiver would play the biased action with probability one in that period.However, then the probability of playing the unbiased action would exceed q in that period.This would however contradict step 2. Constraint (7.2) says that Constraint (7.1) can only hold strictly in periods where the receiver plays the biased action with probability zero.Constraint (7.3) says that the receiver must play the biased action with probability at most q.This follows from step 1. Constraint (7.4) says that the biased type eventually plays the biased action.Therefore, the total probability of the biased action is at least equal to the prior probability that the sender faces a biased receiver.
The optimization problem above is feasible because the assessment that we constructed satisfies all of the constraints.Moreover, the constraint set is compact.Therefore, the optimization problem admits a solution.Below we argue that our assessment solves the problem and therefore is sender optimal.
We argue that Constraint (7.1) holds with equality for all i > 1 in any solution to the optimization problem.Suppose that 4b 2 i j =2 γ j < i −1 j =1 γ j for some i < N and suppose that i is the largest index where this constraint does not hold with equality.This implies that q i = 0 because of the second constraint.
We show that if we increase γ i by ∆ to γi so that the i th constraint binds, decrease γ i +1 by ∆ to γi+1 , set qi = q i +1 , and qi+1 = 0 and leave all other variables unchanged, then all the constraints continue to hold.However, we show that this new feasible choice has strictly lower cost and dominates the old plan. Let This choice of ∆ ensures that the i th constraint binds with equality.Note that Hence, γi+1 > 0. Also, note that the i + 1 st constraint now holds strictly.This is because Also, all constraints j > i + 1 continue to hold with equality: Note that this new strategy entails strictly less cost for the sender.This is because we have decreased δ i and the sender's cost is decreasing in δ i .See the proof of Step 2 for an argument.However, this contradicts the assertion that the initial plan solved the optimization problem.Therefore this line of reasoning establishes that all the such constraints must hold with equality in the optimal solution.where p−q 1−q = q.Hence, To complete the proof, note that q N (p) ≥ q * N (p) is automatically true if q * N = 0.If q * N p > 0, then q * i (h i ) = q in any subsequent period where h i is any history on the equilibrium path where only the unbiased action has been observed.If q N (p) < q * N (p), then there must exist a history on the equilibrium path h i where only the unbiased action has been played such that q i (h i ) > q.However, then the sender communicates no information in period i < N leading to a contradiction.Lemma 9. Fix an assessment σ.Let period m * ≥ 1 be such that p m * = 1 under σ.The assessment σ is sender optimal only if the miscommunication costs are equal to zero in every period m * ≤ i < N and if γ m * = 0 for all i < m * under σ.
Proof.Suppose m * = 1.We come back to the case where m * > 1 at the end of the proof.
On the way to a contradiction, suppose that there are information costs x j > 0 in some period j < N in a sender-optimal assessment σ which satisfies Property 1 and 2 after an history where only the unbiased action has been observed.Under this hypothesis, we show that there is an assessment σ , which satisfies Property 1 and 2 and has strictly lower costs for both the sender and the receiver than assessment σ.In this new assessment the sender communicates truthfully in each period except possibly period N after observing the unbiased action.This is incentive compatible because we have q i ≤ q in every period i < N by assumption.
Case 1. Suppose that q N ≤ q.Then the sender communicating truthfully in period N is incentive compatible Let N > j * ≥ 1 be a period such that there are miscommunication costs in this period and no information costs in any period i ∈ { j * − 1, ..., 1}.Period j * is well defined since we allow for j * to equal one.In any period i > j * the following inequality holds where z < i is any period in which the receiver plays the biased action with positive probability.Such a period z must exist because the receiver will play the biased action with probability one in the last period of the game.The sum i −1 k=z ( i l =k+1 δ l )(1 − δ k )x k ≥ 0 is the total miscommunication costs that the receiver incurs in the posited equilibrium in the periods {i −1, ..., z}.The inequality holds because the receiver must prefer to play the unbiased action until period z and then switch to the biased action in period z.Moreover, note that in any period i > j where the receiver plays a mixed strategy the inequality holds with equality.support, the argument follows without alteration for the case where τ s support is arbitrary by simply applying Jensen's inequality.
Proof.[Proof of Proposition 1] Define D 1 = 1, let a = 4b 2 and note that δ * i , i = 2, . . ., N , is defined by the following system of equations: for all i = 2, . . ., N .This, in turn, can be reduced to the following difference equation with initial condition D 1 = 1: for any i = 1, . . ., N − 1.This proves the claim.
It is now easy to show that growth rate of the importance parameter γ is a−1 and that γ N decreases in a.
Proof.[Proof of Proposition 2] Let the prior be p > q and k the largest integer such that p > 1 − (1 − q) k−1 .Assume that k ≥ 2. Theorem 1 and the discussion that follows it implies that, if N ≥ k, the cost of the sender is equal to since q k ≤ q.Cost when N < k, on the other hand, is at least qb 2 , because the total probability of the biased action is greater than or equal to q in period N .This implies that it is strictly better to choose N ≥ k rather than N < k.Let W i = γ 1 + • • • + γ i and note that W k = D N /D k by definition.Since γ k = W k − W k−1 , cost can be written as Note that p ≤ 1 − (1 − q) k by definition of k.Therefore, if N ≥ k, then there is no cost of miscommunication in period N and hence the cost of the unbiased receiver is equal to zero while the cost of the biased receiver is δ k /4 in the receiver-optimal equilibrium.If N < k, however, there is a cost of miscommunication in period N , which implies that the cost of the unbiased receiver is (1 − δ N ) /4 and the cost of the biased receiver is 1/4.Therefore, any N ≥ k minimizes the cost for the receivers.
This implies that the cost of the receivers is (weakly) decreasing in N .
If k = 1 or p ≤ q, then the cost is equal to γ 1 pb 2 = D N pb 2 .Equation (7.11) implies that D N is strictly decreasing in N , which implies that the cost of the sender is decreasing N .Furthermore, if b > 1/2, then lim N →∞ D N = 1−1/a > 0, which implies that the lower bound on the total cost is strictly positive.If b < 1/2, then lim N →∞ D N = 0.
When k = 1 or p ≤ q, the cost of both types of the receivers in the receiver-optimal equilibrium is equal to zero, and hence it is constant in N .