Two-alternative optimization of moderate batch data processing

We consider optimization of moderate batch data processing in the framework of Bernoulli two-armed bandit problem with indefinite control horizon. We assume that there are two alternative processing methods with different a priori unknown efficiencies which are caused by different reasons including those related to legislation. One has to find the most efficient method and to provide its predominant usage. The arising batches of data with close properties have moderate and possibly uncertain sizes. The problem is considered in minimax setting. According to the main theorem of the game theory minimax risk and minimax strategy are searched for as Bayesian ones corresponding to approximately the worst-case prior distribution concentrated on the finite set of parameters. Numerical experiments show that this approach provides good approximations of minimax strategy and minimax risk.


Introduction
We consider optimization of data processing in a framework of Bernoulli two-armed bandit problem (see, e.g. [1]) if there are two alternative processing methods available with different a priori unknown efficiencies which are caused by different reasons including those related to legislation. The problem is also well known as a problem of adaptive learning and adaptive control [2]- [4]. A Bernoulli two-armed bandit is a random control process is used to describe a Bernoulli two-armed bandit. The value of the process 1  n  corresponds to successfully processed data item number n and 0  n  corresponds to unsuccessfully processed data item number n . The goal is to maximize (in some sense) the cumulative expected number of successfully processed data. The total number of data N is also called a control horizon and is assumed to have moderate values according to Note that the case of large N which corresponds to big data processing is presented in [5][6][7][8] where different strategies, including batch data processing, are investigated.
A control strategy  at the point of time   . A regret, which is also often called the loss function, is defined as follows and describes expected losses of the total income due to incomplete information. Here corresponding optimum strategy M  is called minimax strategy. According to the main theorem of the game theory minimax risk and minimax strategy can be determined as Bayesian ones calculated over the worst-case prior distribution at which Bayesian risk attains its maximum value. In what follows, we determine approximately the worst-case prior distributions on the finite number sets of parameters So, we consider symmetric finite sets of parameters. Note that finite sets of parameters in Bernoulli two-armed bandit problem were considered in [9]. Given this prior distribution, Bayesian risk is defined as follows (2) corresponding optimal strategy B  is called Bayesian strategy. If parameters are properly assigned one can expect that the approximate equality holds The rest of the paper is organized as follows. Recursive Bellman-type equation for determining Bayesian risk and Bayesian strategy is presented in section 2. In section 3 a recursive equation for determining a regret is derived. In section 4 we present numerical results. Section 5 contains conclusion.

Recursive equation for Bayesian risk and Bayesian strategy finding
. One can see that the following equality holds and The standard recursive Bellman-type equation for determining Bayesian risk (2) calculated with respect to regret (3) is as follows the choice of the action is arbitrary. Bayesian risk (2) is calculated by the formula is defined in (4). The following recursive equation holds and then , , , (   , and Bayesian strategy prescribes to choose the  -th action if ) , , , (12) Formulas (9)-(12) follow from (5)-(8).

Determination of the regret
Let's define a regret calculated with respect to prior distribution , and strategy  as follows (13) One can use the following recursive equation to determine ) , , ,

Numerical experiments
In this section, we present the results of numerical experiments. The values of control horizons were chosen as