To read this content please select one of the options below:

Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

Ole‐Christoffer Granmo (Department of ICT, University of Agder, Aust‐Agder, Norway)

International Journal of Intelligent Computing and Cybernetics

ISSN: 1756-378X

Article publication date: 8 June 2010

682

Abstract

Purpose

The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. The purpose of this paper is to report research into a completely new family of solution schemes for the TABB problem: the Bayesian learning automaton (BLA) family.

Design/methodology/approach

Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. BLA avoids the problem of computational intractability by not explicitly performing the Bayesian computations. Rather, it is based upon merely counting rewards/penalties, combined with random sampling from a pair of twin Beta distributions. This is intuitively appealing since the Bayesian conjugate prior for a binomial parameter is the Beta distribution.

Findings

BLA is to be proven instantaneously self‐correcting, and it converges to only pulling the optimal arm with probability as close to unity as desired. Extensive experiments demonstrate that the BLA does not rely on external learning speed/accuracy control. It also outperforms established non‐Bayesian top performers for the TABB problem. Finally, the BLA provides superior performance in a distributed application, namely, the Goore game (GG).

Originality/value

The value of this paper is threefold. First of all, the reported BLA takes advantage of the Bayesian perspective for tackling TABBs, yet avoids the computational complexity inherent in Bayesian approaches. Second, the improved performance offered by the BLA opens up for increased accuracy in a number of TABB‐related applications, such as the GG. Third, the reported results form the basis for a new avenue of research – even for cases when the reward/penalty distribution is not Bernoulli distributed. Indeed, the paper advocates the use of a Bayesian methodology, used in conjunction with the corresponding appropriate conjugate prior.

Keywords

Citation

Granmo, O. (2010), "Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton", International Journal of Intelligent Computing and Cybernetics, Vol. 3 No. 2, pp. 207-234. https://doi.org/10.1108/17563781011049179

Publisher

:

Emerald Group Publishing Limited

Copyright © 2010, Emerald Group Publishing Limited

Related articles