Skip to main content
Log in

Multi-Armed bandit problem revisited

  • Contributed Papers
  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Varaiya, P., Walrand, J., andBuyukkoc, C.,Extensions of the Multi-Armed Bandit Problem: The Discounted Case, IEEE Transactions on Automatic Control, Vol. 30, 1985.

  2. Gittins, J. C.,Multi-Armed Bandit Allocation Indices, Wiley, New York, New York, 1989.

    Google Scholar 

  3. Gittins, J. C., andJones, D.,A Dynamic Allocation Index for the Sequential Design of Experiments, Progress in Statistics, 1972 European Meeting on Statistics, Edited by J. Gani, K. Sarkadi, and I. Vince, North-Holland, Amsterdam, Holland, Vol. 1, pp. 241–246, 1974.

    Google Scholar 

  4. Whittle, P.,Multi-Armed Bandits and the Gittins Index, Journal of the Royal Statistical Society, Vol. 42, pp. 143–149, 1980.

    Google Scholar 

  5. Glazebrook, K. D.,Optimal Strategies for Families of Alternative Bandit Processes, IEEE Transactions on Automatic Control, Vol. 28, pp. 858–861, 1983.

    Google Scholar 

  6. Mandelbaum, A.,Discrete Multi-Armed Bandits and Multiparameter Processes, Probability Theory and Related Fields, Vol. 71, pp. 129–147, 1986.

    Google Scholar 

  7. Mandelbaum, A.,Continuous Multi-Armed Bandits and Multiparameter Processes, Annals of Probability, Vol. 15, pp. 1527–1556, 1987.

    Google Scholar 

  8. Weber, R.,On the Gittins Index for Multi-Armed Bandits, Annals of Applied Probability, Vol. 2, pp. 1024–1033, 1992.

    Google Scholar 

  9. Weiss, G.,Turnpike Optimality of Smith's Rule in Parallel Machines Stochastic Scheduling, Mathematics of Operations Research, Vol. 17, pp. 255–269, 1992.

    Google Scholar 

  10. Whittle, P.,Arm-Acquiring Bandits, Annals of Probability, Vol. 9, pp. 284–292, 1981.

    Google Scholar 

  11. Ishikida, T.,Informational Aspects of Decentralized Resource Allocation, Technical Report UCB/ERL/IGCT-M92/60, Interdisciplinary Group on Coordination Theory, University of California at Berkeley, Berkeley, California, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Communicated by E. Polak

This research was supported by NSF Grant IRI-91-20074.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishikida, T., Varaiya, P. Multi-Armed bandit problem revisited. J Optim Theory Appl 83, 113–154 (1994). https://doi.org/10.1007/BF02191765

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02191765

Key Words

Navigation