Abstract
In this paper, we revisit aspects of the multi-armed bandit problem in the earlier work (Ref. 1). An alternative proof of the optimality of the Gittins index rule is derived under the discounted reward criterion. The proof does not involve an explicit use of the interchange argument. The ideas of the proof are extended to derive the asymptotic optimality of the index rule under the average reward criterion. Problems involving superprocesses and arm-acquiring bandits are also reexamined. The properties of an optimal policy for an arm-acquiring bandit are discussed.
Similar content being viewed by others
References
Varaiya, P., Walrand, J., andBuyukkoc, C.,Extensions of the Multi-Armed Bandit Problem: The Discounted Case, IEEE Transactions on Automatic Control, Vol. 30, 1985.
Gittins, J. C.,Multi-Armed Bandit Allocation Indices, Wiley, New York, New York, 1989.
Gittins, J. C., andJones, D.,A Dynamic Allocation Index for the Sequential Design of Experiments, Progress in Statistics, 1972 European Meeting on Statistics, Edited by J. Gani, K. Sarkadi, and I. Vince, North-Holland, Amsterdam, Holland, Vol. 1, pp. 241–246, 1974.
Whittle, P.,Multi-Armed Bandits and the Gittins Index, Journal of the Royal Statistical Society, Vol. 42, pp. 143–149, 1980.
Glazebrook, K. D.,Optimal Strategies for Families of Alternative Bandit Processes, IEEE Transactions on Automatic Control, Vol. 28, pp. 858–861, 1983.
Mandelbaum, A.,Discrete Multi-Armed Bandits and Multiparameter Processes, Probability Theory and Related Fields, Vol. 71, pp. 129–147, 1986.
Mandelbaum, A.,Continuous Multi-Armed Bandits and Multiparameter Processes, Annals of Probability, Vol. 15, pp. 1527–1556, 1987.
Weber, R.,On the Gittins Index for Multi-Armed Bandits, Annals of Applied Probability, Vol. 2, pp. 1024–1033, 1992.
Weiss, G.,Turnpike Optimality of Smith's Rule in Parallel Machines Stochastic Scheduling, Mathematics of Operations Research, Vol. 17, pp. 255–269, 1992.
Whittle, P.,Arm-Acquiring Bandits, Annals of Probability, Vol. 9, pp. 284–292, 1981.
Ishikida, T.,Informational Aspects of Decentralized Resource Allocation, Technical Report UCB/ERL/IGCT-M92/60, Interdisciplinary Group on Coordination Theory, University of California at Berkeley, Berkeley, California, 1992.
Author information
Authors and Affiliations
Additional information
Communicated by E. Polak
This research was supported by NSF Grant IRI-91-20074.
Rights and permissions
About this article
Cite this article
Ishikida, T., Varaiya, P. Multi-Armed bandit problem revisited. J Optim Theory Appl 83, 113–154 (1994). https://doi.org/10.1007/BF02191765
Issue Date:
DOI: https://doi.org/10.1007/BF02191765