Abstract
We consider application of the two-armed bandit problem to processing a large number N of data where two alternative processing methods can be used. We propose a strategy which at the first stages, whose number is at most r − 1, compares the methods, and at the final stage applies only the best one obtained from the comparison. We find asymptotically optimal parameters of the strategy and observe that the minimax risk is of the order of N α, where α = 2r−1/(2r − 1). Under parallel processing, the total operation time is determined by the number r of stages but not by the number N of data.
Similar content being viewed by others
References
Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.
Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.
Hellman, M.E. and Cover, T.M., Comment on Automata in Random Media, Probl. Peredachi Inf., 1970, vol. 6, no. 2, pp. 21–30 [Probl. Inf. Trans. (Engl. Transl.), 1970, vol. 6, no. 2, pp. 107–114].
Zigangirov, K.Sh., Multiple Hypothesis Discrimination Using Finite-State Automata, Probl. Peredachi Inf., 1977, vol. 13, no. 3, pp. 45–55 [Probl. Inf. Trans. (Engl. Transl.), 1977, vol. 13, no. 3, pp. 194–202].
Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.
Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice: Recursive Algorithms), Moscow: Nauka, 1986.
Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.
Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod (Sequential Control Based on Incomplete Data: Bayesian Approach), Moscow: Nauka, 1982.
Kolnogorov, A.V., On Optimal Prior Learning Time in the Two-Armed Bandit Problem, Probl. Peredachi Inf., 2000, vol. 36, no. 4, pp. 117–127 [Probl. Inf. Trans. (Engl. Transl.), 2000, vol. 36, no. 4, pp. 387–396].
Kolnogorov, A.V. and Melnikova, S.V., Minimax R-Stage Strategy for the Multi-Armed Bandit Problem, in Proc. 9th IFAC Workshop on Adaptation and Learning in Control and Signal Processing (ALCOSP’07), St. Petersburg, Russia, 2007. Available at http://www.ifac-papersonline.net/Detailed/30255.html.
Witmer, J.A., Bayesian Multistage Decision Problems, Ann. Statist., 1986, vol. 14, no. 1, pp. 283–297.
Cheng, Y., Multistage Decision Problems, Sequential Analysis, 1994, vol. 13, no. 4, pp. 329–349.
Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, no. 2, pp. 444–451.
Lai, T.L. and Robbins, H., Asymptotically Efficient Adaptive Allocation Rules, Adv. Appl. Math., 1985, vol. 6, no. 1, pp. 4–22.
Prokhorov, Yu.V. and Rozanov, Yu.A., Teoriya veroyatnostei: osnovnye poniatiya, predel’nye teoremy, sluchainye protsessy, Moscow: Nauka, 1987, 3rd ed. First edition translated under the title Probability Theory: Basic Concepts, Limit Theorems, Random Processes, Berlin: Springer, 1969.
Ibragimov, I.A. and Linnik, Yu.V., Nezavisimye i statsionarno svyazannye velichiny, Moscow: Nauka, 1965. Translated under the title Independent and Stationary Sequences of Random Variables, Groningen: Wolters-Noordhoff, 1971.
Petrov, V.V., Generalization of Cramér’s Limit Theorem, Uspehi Matem. Nauk (N.S.), 1954, vol. 9, no. 4, pp. 195–202.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Kolnogorov, 2012, published in Problemy Peredachi Informatsii, 2012, Vol. 48, No. 1, pp. 83–95.
Rights and permissions
About this article
Cite this article
Kolnogorov, A.V. Two-armed bandit problem for parallel data processing systems. Probl Inf Transm 48, 72–84 (2012). https://doi.org/10.1134/S0032946012010085
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0032946012010085