Skip to main content
Log in

Two-armed bandit problem for parallel data processing systems

  • Large Systems
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

We consider application of the two-armed bandit problem to processing a large number N of data where two alternative processing methods can be used. We propose a strategy which at the first stages, whose number is at most r − 1, compares the methods, and at the final stage applies only the best one obtained from the comparison. We find asymptotically optimal parameters of the strategy and observe that the minimax risk is of the order of N α, where α = 2r−1/(2r − 1). Under parallel processing, the total operation time is determined by the number r of stages but not by the number N of data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyu biologicheskikh sistem, Moscow: Nauka, 1969. Translated under the title Automaton Theory and Modeling of Biological Systems, New York: Academic, 1973.

    MATH  Google Scholar 

  2. Varshavskii, V.I., Kollektivnoe povedenie avtomatov (Collective Behavior of Automata), Moscow: Nauka, 1973. Translated under the title Kollektives Verhalten von Automaten, Warschawski, W.I., Berlin: Akademie, 1978.

    MATH  Google Scholar 

  3. Hellman, M.E. and Cover, T.M., Comment on Automata in Random Media, Probl. Peredachi Inf., 1970, vol. 6, no. 2, pp. 21–30 [Probl. Inf. Trans. (Engl. Transl.), 1970, vol. 6, no. 2, pp. 107–114].

    MathSciNet  MATH  Google Scholar 

  4. Zigangirov, K.Sh., Multiple Hypothesis Discrimination Using Finite-State Automata, Probl. Peredachi Inf., 1977, vol. 13, no. 3, pp. 45–55 [Probl. Inf. Trans. (Engl. Transl.), 1977, vol. 13, no. 3, pp. 194–202].

    MathSciNet  MATH  Google Scholar 

  5. Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow: Nauka, 1981.

    MATH  Google Scholar 

  6. Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov: rekurrentnye algoritmy (Adaptive Choice: Recursive Algorithms), Moscow: Nauka, 1986.

    Google Scholar 

  7. Berry, D.A. and Fristedt, B., Bandit Problems: Sequential Allocation of Experiments, London: Chapman & Hall, 1985.

    MATH  Google Scholar 

  8. Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie po nepolnym dannym. Baiesovskii podkhod (Sequential Control Based on Incomplete Data: Bayesian Approach), Moscow: Nauka, 1982.

    Google Scholar 

  9. Kolnogorov, A.V., On Optimal Prior Learning Time in the Two-Armed Bandit Problem, Probl. Peredachi Inf., 2000, vol. 36, no. 4, pp. 117–127 [Probl. Inf. Trans. (Engl. Transl.), 2000, vol. 36, no. 4, pp. 387–396].

    MathSciNet  Google Scholar 

  10. Kolnogorov, A.V. and Melnikova, S.V., Minimax R-Stage Strategy for the Multi-Armed Bandit Problem, in Proc. 9th IFAC Workshop on Adaptation and Learning in Control and Signal Processing (ALCOSP’07), St. Petersburg, Russia, 2007. Available at http://www.ifac-papersonline.net/Detailed/30255.html.

  11. Witmer, J.A., Bayesian Multistage Decision Problems, Ann. Statist., 1986, vol. 14, no. 1, pp. 283–297.

    Article  MathSciNet  MATH  Google Scholar 

  12. Cheng, Y., Multistage Decision Problems, Sequential Analysis, 1994, vol. 13, no. 4, pp. 329–349.

    Article  MathSciNet  MATH  Google Scholar 

  13. Vogel, W., An Asymptotic Minimax Theorem for the Two-Armed Bandit Problem, Ann. Math. Stat., 1960, vol. 31, no. 2, pp. 444–451.

    Article  MATH  Google Scholar 

  14. Lai, T.L. and Robbins, H., Asymptotically Efficient Adaptive Allocation Rules, Adv. Appl. Math., 1985, vol. 6, no. 1, pp. 4–22.

    Article  MathSciNet  MATH  Google Scholar 

  15. Prokhorov, Yu.V. and Rozanov, Yu.A., Teoriya veroyatnostei: osnovnye poniatiya, predel’nye teoremy, sluchainye protsessy, Moscow: Nauka, 1987, 3rd ed. First edition translated under the title Probability Theory: Basic Concepts, Limit Theorems, Random Processes, Berlin: Springer, 1969.

    Google Scholar 

  16. Ibragimov, I.A. and Linnik, Yu.V., Nezavisimye i statsionarno svyazannye velichiny, Moscow: Nauka, 1965. Translated under the title Independent and Stationary Sequences of Random Variables, Groningen: Wolters-Noordhoff, 1971.

    Google Scholar 

  17. Petrov, V.V., Generalization of Cramér’s Limit Theorem, Uspehi Matem. Nauk (N.S.), 1954, vol. 9, no. 4, pp. 195–202.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. V. Kolnogorov.

Additional information

Original Russian Text © A.V. Kolnogorov, 2012, published in Problemy Peredachi Informatsii, 2012, Vol. 48, No. 1, pp. 83–95.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kolnogorov, A.V. Two-armed bandit problem for parallel data processing systems. Probl Inf Transm 48, 72–84 (2012). https://doi.org/10.1134/S0032946012010085

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0032946012010085

Keywords

Navigation