A Common Value Experimentation with Multiarmed Bandits

We study a value common experimentationwithmultiarmed bandits and give an application about the experimentation.The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers amongmultiple sellers, we obtain the necessary and sufficient conditions about the common experimentation.TheMarkov perfect equilibrium and the socially effective allocation in K-armed markets are discussed.


Introduction
A financial debate has arisen when we need to choose the best goods among multiple products.In Robbins [1], this problem is described as a decision maker facing  slot machines (called arms) and the maker has to choose one of them at each instantaneous time.Gittins and Jones [2] and Michael et al. [3] calculate the value of pulling an arm, i.e., Gittins index, in discrete time.Comparing the value to the Gittins index of all other arms, Michael et al. [3] present that the optimal strategy for an -armed problem is an dimensional discounted Markov decision chain and the value pulling each arm itself is independent of the cutoff.Karatzas [4], Kaspi, and Mandelbaum [5] transform the problem into a standard optimal stopping problem.Bolton and Harris [6] and Bergemann and Välimäki [7] show that when there are  ≥ 2 sellers who sell different products and  consumers whose preferences are identical (but unknown) in the market, the optimal strategies of consumers are to buy the products from the same seller; i.e., it is symmetric equilibria.Cohen and Solan [8] study two-armed bandit problems in the continuous time with the property of Lévy processes and obtain the Hamilton-Jacobi-Bellman (HJB) equation for the problem.They conclude that the optimal strategy is a cutoff strategy when the arms have two types.For other optimal strategies and control approaches, the reader is referred to [9][10][11] and the references therein.
Jan and Xi [12] investigate that second derivatives of value functions are equal at the cutoff with value matching and smooth pasting (assume that the value function is () ∈  2 ( + ) and the cutoff is  * ∈  + .Value matching: ( * −) = ( * +) and smooth pasting:  耠 ( * −) =  耠 ( * +) which is discussed in [13,14], where (−) fl lim 푥㨀→푥 − () and (+) fl lim 푥㨀→푥 + ()), when there are two arms with different types, and conclude that the optimal strategy is an interval strategy in the market.In [12], an application is given about strategic pricing of two vendors in a competitive market.There are  ≥ 2 consumers who have the same type either  or .Two vendors produce two different kinds of goods for the two types, respectively.Jan and Xi [12] describe the socially efficient allocation and pricing strategies of two vendors in the market.Moreover, they use value matching, smooth pasting, and second derivatives of value functions to discuss the Markov perfect equilibrium and the socially efficient allocation.
In this article, we investigate multiarmed bandits, while two-armed bandits are studied in [12].We consider multiple sellers instead of two sellers.In general, there are multiple sellers to sell the same type of goods in the market, but the quality, utilities, and the prices of goods sold by each seller are different.Therefore, changing the number of arms from two to  ≥ 2 is reasonable in the market.People believe that purchasing two products has the best effects to type  or  in the market.For example, the first seller is the highest utility to type  but is the lowest utility to type .On the contrary, the second seller is the highest utility to type  but is the lowest utility to type  and other utilities are between about the two sellers to types  and .Instinctively, we think that a rational consumer chooses goods only from the first two sellers but this is not perfect or this strategy needs some conditions.We discuss the necessary and sufficient conditions for this strategy.
In order to obtain our conclusions under certain assumptions of the model for multiarmed bandits, using the methods used in [12], we calculate the optimal cutoff points by solving the corresponding ordinary differential equations.Then, we obtain the necessary and sufficient conditions for the strategy of which consumers purchase products from only two sellers among multiple sellers.
The rest of the paper is as follows.In Section 2, we introduce a multiarmed model under certain assumptions and show a conclusion about the model.In Section 3, we give an application for the model.In Section 3.1, we discuss optimal choice of consumers in the case of market equilibrium.Based on the optimal choice of consumers, we derive HJB equations for their utilities functions.Using solutions of the HJB equations, value matching, and smooth pasting, we get the cutoff point at Markov perfect equilibrium and the necessary and sufficient conditions about the common experimentation.In Section 3.2, we get the cutoff point at socially efficient allocation with the same way in Section 3.1.The relationship between the Markov perfect equilibrium and the socially efficient allocation in -armed markets is discussed.

The Model
Jin and Xi [12] consider one agent and a bandit with 2 arms.We study multiarmed bandits and consider the case where there is only one real-valued state () ∈ Υ and Υ ∈  is a connect set.The instantaneous flow payoff of each arm is  푗 () at state ,  = 1, 2, . . ., .Let  > 0 be the discount rate.For each arm , there is a probability space {Ω 푗 , F 푗 ,  푗 } endowed with filtration {F 푗 푡 ,  ≥ 0} and  푗 () is the total measure of time to time  when arm  has been chosen.From [12], we know that the updating of  in arm , when there are  arms in the market, satisfies where  푗 () is a standard Brownian motion,  푗 () is independent of  푖 () ( ̸ = ), and N 푗 is a Poisson random measure that is independent of  푗 .The state  is updated from arm ,  = 1, 2, . . .,  and () = ∑ 퐾 푖=1  푖 ().Let  푗 ((−),  푖 ) be the change of the state when there is a Poisson jump.Equation (1) shows that the state changes from  푗 to  푖 ( ̸ = ).Thus, (1) does not contain the case that state  푗 jump from other states.Because state  푗 can jump to any other states and any other states can jump back to  푗 , (1) contains all jump processes that describe the changes of state.
Assumption 2 (see [12]).Assume that the first derivatives of  푗 (),  푗 (),  푗 (), and  푗 (, ⋅) with respect to  are bounded.Namely, there exists  > 0 such that for any 2) and the dynamic programming principle (the detailed process is in [15]), for any ℎ > 0, we obtain Using Ito's lemma for  −푟푡 V( 푡 ) and property of Poisson random measure (the property of Poisson random measure is in [13]), we get where Then, we have Substituting ( 6) into (4), using the mean value theorem of integrals and sending ℎ → 0, we get the HJB equation where ] 푚  is the finite intensity measure of N 푚  .From [13], we know that there exists a unique solution to (1).
Theorem 3 gives a necessary condition under which second derivatives of value functions are equal at every cutoff when there are  arms in the market.
When  = 2, i.e., there are two arms in the market; the agent has two states  1 and  2 in the model.In this case, we only need to consider that an agent jumps between the two states, i.e.,  푡 = {1, 2}.Thus, there is one cutoff, which is discussed in [12].

Application
The application of Theorem 3 is similar to that in [12].The difference is that there are  ≥ 2 sellers offering different products in the market.We index these sellers with  = 1, 2, . . ., .We assume that all consumers have the same type , either  or .Let  푗퐻 and  푗퐿 be the utilities of consumers who buy good  with type  or , respectively.We assume  1퐻 >  2퐻 > ⋅ ⋅ ⋅ >  퐾퐻 and  1퐿 <  2퐿 < ⋅ ⋅ ⋅ <  퐾퐿 , i.e., the more likely the consumers are to buy type , the more tendency they choose the previous sellers.Therefore, we denote that  ∈ [0, 1] is the common belief that the type is high and then the expected utility of controlling the seller  is  푗 () fl  푗퐻 + (1 − ) 푗퐿 .If we denote  푗 fl  푗퐻 −  푗퐿 and  푗 fl  푗퐿 , the utility is represented by where  푗 +  푗 <  푗+1 +  푗+1 and  푗 >  푗+1 .At any time, all market participants observe all previous outcomes.Because of the influence caused by uncertain external factors, the flow utility  푗푖 () has a noisy signal of the true value (the detailed introduction can be found in [12]).
where W푗 () is independent of W푖 () ( ̸ = ). is related to time and it is described as a learning process in [16], denoted by  푡 .Without loss of generality, we assume that there are  푗 consumers choosing seller  and ∑ 퐾 푗=1  푗 =  ≥ 2. From the statements in [6,7], we have that  푡 satisfies where In the next subsection, the Markov perfect equilibrium and the socially efficient allocation in -armed market are discussed.

Markov Perfect Equilibrium.
Let  푗 denote the price of goods of seller .The price is related to  at instantaneous time .So  푗 is a mapping [0, 1] → R,  = 1, 2, . . ., .We denote  푖 as the choice of th consumer which is related not only to his common belief , but also to the prices of the sellers' goods, i.e.,  푖 : When the choice of the previous  − 1 consumers is observed, the utility of the th consumer is maximized.Let () denote the maximum utility of this consumer and  耠 푗 denote the number of choosing the th seller.We have There exists  ∈ {1, 2, . . ., } subject to  耠 푑 =  푑 − 1 and  耠 푖 =  푖 ,  ̸ = .Then, we have Due to the price competition, the consumer chooses equal utility for other sellers.Thus, we get Therefore, we obtain where ,  = 1, 2, . . .,  and  ̸ = .Now, we discuss the pricing of goods of sellers.Denote  푗 () as the th seller's utility.If there are  푗 consumers buying goods  when the price is  푗 , we have where We get  푗 = 0 or  because all consumers choose only one seller (see [6,7]).When  푗 = 0, the utility of seller  is presented in the form where we assume that all consumers choose seller , i.e.,  ∈ {1, 2, . . ., } − {} and  푑 = .
When only one consumer chooses seller , i.e.,  푗 = 1, the utility of seller  is As a rational market participant, when no consumer buys goods, the seller adjusts the price so that the payoff in this case is equal to the payoff when only one consumer chooses this seller.We obtain the price of goods of seller  in the form From (20), we get the price of seller  in the form We have  − 1 cutoffs  푗 ,  = 1, 2, . . .,  − 1, where 0 <  1 ≤  2 ≤ ⋅ ⋅ ⋅ ≤  푗−1 < 1.When common belief  ∈ ( 푖−1 ,  푖 ), consumer chooses seller ,  = 1, 2, . . .,  (  0 fl 0,  퐾 fl 1).If  =  푖 , the utilities of consumers are indifferent when they choose seller  or  + 1 due to value matching.When  ̸ =  푖 , we have the conclusion Theorem 4. All consumers only choose the first seller or the last seller in the market if and only if cutoffs  푗 ,  = 1, 2, . . .,  − 1, are the same and equal to ( 0 − ( 1 −  푘 ))/(( 1 −  퐾 ) −  1 ), where Proof.Firstly, we prove the sufficiency.When all cutoffs are equivalent and equal to ( 0 − ( 1 −  푘 ))/(( 1 −  퐾 ) −  0 ), all consumers only choose the first seller or the last seller in the market.
Theorem 4 shows the necessary and sufficient conditions for the consumers' choice.We find that when the consumers only choose the first seller and the last seller, the multiarmed bandits problem would be attributed to the two-armed bandits.In other words, other sellers gradually disappear in the market because they have no sales.The multiarmed bandit problem is transformed into the two-armed bandits in the situation discussed in [12].

Socially Efficient Allocation.
We consider the optimal choice of planners when they face multiarmed bandit problem.Let the total social surplus function be ().We have Assume that Π 푗 () is the total social surplus in a neighborhood of  if a planner choose seller .From (41), we have where  = 1, 2, . . ., .
There are (3 − 3) unknown parameters and (3 − 3) , for all  > 1, i.e., the coefficient matrix of system (45) is nonsingular.Thus, system (45) has a unique solution. The The results in [12] introduce the necessary and sufficient condition under which the Markov perfect equilibrium with cautious strategies is socially efficient with two-armed bandits.
The proof of Corollary 5 is similar to that of Theorem 2 in [12].We omit its proof.
Corollary 5 shows that the necessary and sufficient conditions under which the Markov perfect equilibrium with cautious strategies is socially efficient when the cutoffs are multiarmed bandits.Jin and Xi [12] present the conditions in the case of two-armed market.Thus, Theorem 4 extends parts of results in [12].

Conclusion
We study a common value experimentation with multiarmed bandits and present its application.This extends two-armed bandits in [12] to multiarmed bandits.We derive the HJB equation with multiarmed bandits.In the application, we get the necessary and sufficient conditions for the choices of consumers from two sellers.The necessary and sufficient conditions guarantee that the Markov perfect equilibrium with cautious strategies is socially efficient.In future, we need to solve all the cutoffs in system (45) when these cutoffs are different and give general solutions about these cutoffs.