Skip to main content
Log in

Average optimality in a Poissonian bandit with switching arms

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only of one which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is average optimal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Donchev DS (1995) On the two-armed bandit problem with non-observed Poissonian switching of arms. Technical report, University of Bonn

  • Donchev DS, Rachev ST, Stiegerwald D (1995) Optimal policies for exchange between two currencies in case of Poissonian switching. Technical report, University of California at Santa Barbara

  • Feldman D (1962) Contributions to the “two-armed bandit” problem. Ann Math Stat 33:847–856

    Google Scholar 

  • Presman EL (1990) A Poisson version of the two-armed bandit problem with discounting. Theory Prob. Appl 35:307–317

    Google Scholar 

  • Presman EL, Sonin IM (1990) Sequential Control with Incomplete Data: Bayesian Approach. Academic Press New York, Russian edition 1982

  • Sonin IM (1976) A model of resource distribution with incomplete information. In: Modelling scientific-technological progress and the control of economic processes under incomplete information. CEMI, USSR Academy of Sciences Press Moscow: 161–201, in Russian

  • Vermes D (1985) Optimal control of piecewise deterministic Markov processes. Stochastics 14:165–207

    Google Scholar 

  • Yushkevich AA (1989) On the two-armed bandit problem with continuous time parameter and discounted rewards. Stochastics 23:299–310

    Google Scholar 

  • Yushkevich AA (1989a) Verificiation theorems for Markov decision processes with a controlled deterministic drift and gradual and impulsive controls. Theory Prob Appl 34:474–496

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Supported by NSF grant DMS-9404177

Rights and permissions

Reprints and permissions

About this article

Cite this article

Donchev, D.S., Yushkevich, A.A. Average optimality in a Poissonian bandit with switching arms. Mathematical Methods of Operations Research 45, 265–280 (1997). https://doi.org/10.1007/BF01193865

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01193865

Key words

Navigation