Multi-Armed bandit problem revisited

Ishikida, T.; Varaiya, P.

doi:10.1007/BF02191765

Multi-Armed bandit problem revisited

Contributed Papers
Published: October 1994

Volume 83, pages 113–154, (1994)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

T. Ishikida¹ &
P. Varaiya²

411 Accesses
26 Citations
Explore all metrics

Abstract

In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Varaiya, P., Walrand, J., andBuyukkoc, C.,Extensions of the Multi-Armed Bandit Problem: The Discounted Case, IEEE Transactions on Automatic Control, Vol. 30, 1985.
Gittins, J. C.,Multi-Armed Bandit Allocation Indices, Wiley, New York, New York, 1989.
Google Scholar
Gittins, J. C., andJones, D.,A Dynamic Allocation Index for the Sequential Design of Experiments, Progress in Statistics, 1972 European Meeting on Statistics, Edited by J. Gani, K. Sarkadi, and I. Vince, North-Holland, Amsterdam, Holland, Vol. 1, pp. 241–246, 1974.
Google Scholar
Whittle, P.,Multi-Armed Bandits and the Gittins Index, Journal of the Royal Statistical Society, Vol. 42, pp. 143–149, 1980.
Google Scholar
Glazebrook, K. D.,Optimal Strategies for Families of Alternative Bandit Processes, IEEE Transactions on Automatic Control, Vol. 28, pp. 858–861, 1983.
Google Scholar
Mandelbaum, A.,Discrete Multi-Armed Bandits and Multiparameter Processes, Probability Theory and Related Fields, Vol. 71, pp. 129–147, 1986.
Google Scholar
Mandelbaum, A.,Continuous Multi-Armed Bandits and Multiparameter Processes, Annals of Probability, Vol. 15, pp. 1527–1556, 1987.
Google Scholar
Weber, R.,On the Gittins Index for Multi-Armed Bandits, Annals of Applied Probability, Vol. 2, pp. 1024–1033, 1992.
Google Scholar
Weiss, G.,Turnpike Optimality of Smith's Rule in Parallel Machines Stochastic Scheduling, Mathematics of Operations Research, Vol. 17, pp. 255–269, 1992.
Google Scholar
Whittle, P.,Arm-Acquiring Bandits, Annals of Probability, Vol. 9, pp. 284–292, 1981.
Google Scholar
Ishikida, T.,Informational Aspects of Decentralized Resource Allocation, Technical Report UCB/ERL/IGCT-M92/60, Interdisciplinary Group on Coordination Theory, University of California at Berkeley, Berkeley, California, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Engineering and Operations Research, University of California, Berkeley, California
T. Ishikida (Post-Doctoral Fellow)
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California
P. Varaiya (Professor)

Authors

T. Ishikida
View author publications
You can also search for this author in PubMed Google Scholar
P. Varaiya
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Communicated by E. Polak

This research was supported by NSF Grant IRI-91-20074.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ishikida, T., Varaiya, P. Multi-Armed bandit problem revisited. J Optim Theory Appl 83, 113–154 (1994). https://doi.org/10.1007/BF02191765

Download citation

Issue Date: October 1994
DOI: https://doi.org/10.1007/BF02191765

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-Armed bandit problem revisited

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Reward Function Design in Reinforcement Learning

A survey of experimental research on contests, all-pay auctions and tournaments

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Multi-Armed bandit problem revisited

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Reward Function Design in Reinforcement Learning

A survey of experimental research on contests, all-pay auctions and tournaments

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation