A quantum speedup in machine learning: Finding a N-bit Boolean function for a classification

We compare quantum and classical machines designed for learning an N-bit Boolean function in order to address how a quantum system improves the machine learning behavior. The machines of the two types consist of the same number of operations and control parameters, but only the quantum machines utilize the quantum coherence naturally induced by unitary operators. We show that quantum superposition enables quantum learning that is faster than classical learning by expanding the approximate solution regions, i.e., the acceptable regions. This is also demonstrated by means of numerical simulations with a standard feedback model, namely random search, and a practical model, namely differential evolution.


Introduction
Over the past few decades, quantum physics has brought remarkable innovations into fields of various disciplines. For example, there are exponentially faster quantum algorithms, compared to their classical counterparts [1][2][3]. The physical limit of measurement precision has been improved in quantum metrology [4,5], and a large number of protocols offering higher security have been proposed in quantum cryptography [6,7]. These achievements are enabled by appropriate usage of quantum effects such as quantum superposition and quantum entanglement.
Another important scientific area is machine learning, which is a subfield of artificial intelligence and one of the most advanced automatic control techniques. While learning is usually regarded as a characteristic of humans or living things, machine learning enables a machine to learn a task [8]. Machine learning has been attracting great attention, with its novel ability to learn. On one hand, machine learning has been studied to provide an understanding of the learning of a real biological system, in a theoretical manner. On the other hand, machine learning is also expected to provide reliable control techniques for use in designing complex systems in a practical manner [8].
Recently, the hybridizing of the two scientific fields described above, quantum technology and machine learning, has received great interest [9][10][11][12]. One question naturally arises: can machine learning be improved by using favorable quantum effects? Several attempts to answer this question have been made in the past few years-for example, using quantum perceptrons [13], neural networks [14][15][16], and quantum-inspired evolutionary algorithms [17,18]. Most recently, remarkable studies have been carried out [19][20][21][22]. In [19], a learning speedup for the quantum machine was observed with a lower memory requirement for a specific example, namely the kth-root NOT operation. In [20], a strategy for designing a quantum algorithm was introduced, establishing a link between the learning speedup and the speedup of the quantum algorithm found. In [21,22], the authors showed quantum speedup for the task of classifying a large number of data. However, it is still unclear what quantum effects work in machine learning and how they work, particularly in the absence of a fair comparison between classical and quantum machines.
In this work, we consider a binary classification problem as a learning task. Such a classification can be realized for an N-bit Boolean function that maps a set of N-bit binary strings in {0, 1} N into {0, 1} [23]. The main objective in this paper is to compare a quantum machine with a classical machine. These two machines are equivalent. The only differentiation is that the quantum machine can deal with quantum effects, whereas the classical machine cannot. The machines are analyzed in terms of the acceptable region, defined as a localized solution region of parameter space. In the analysis, it is shown that the quantum machine can learn faster due to the acceptable region being expanded by quantum superposition. Such a quantum learning speedup is understood in terms of an expansion of the acceptable region. In order to make the analysis more explicit, we analyze further by using random search, which is a standard model for use in learning performance analysis [24]. In such a primitive model, we validate the quantum speedup, showing that the overall number of iterations required to complete the learning is proportional to α O(e ) D , with α ≃ 3.065 for the classical machine and α ≃ 0.238 for the quantum machine. Here, D is the size of the search space. Differential evolution is employed as a learning model, taking into account more realistic circumstances. By means of numerical simulations, we show that the quantum speedup is still observed even in such a case.

Classical and quantum machines
Machine learning can be decomposed into two parts: the machine and the feedback. The machine performs various tasks depending on its internal parameters, and the feedback adjusts the parameters of the machine in order for the machine to perform a required task called the target. Learning is a process involving finding suitable parameters for the machine, whereby the machine is expected to generate desired results working towards a target 4 . This concept of machine learning has been widely adopted in the context of machine learning at the fundamental level [8].
In this work, we assign to a machine a binary classification problem as a task, where the machine will learn a target N-bit Boolean function, defined as . This function can be written by using the positive polarity Reed-Muller expansion [25]: where ⊕ denotes modulo-2 addition, ⊕ means a direct sum of the moduli, and the Reed-Muller coefficients a k are either 0 or 1. Here, C k is an index set whose elements are given in such a way. The number j is then taken to be an element of C k only if k j is equal to 1 when k is written as an N-bit string, k k k ... The Boolean function can be implemented by a reversible circuit as shown in figure 1, where an additional bit channel, called the work channel, and controlled operations are N operations G k conditioned by the input bits x. Here, the constant input c is set to be 0, which gives rise to an output bit y. employed [26,27]. A single-bit operation G 0 is placed on the work channel and − (2 1) N controlled-G k operations are caused to act on the work channel when all the control bits, x j ( ∈ j C k ), are 1. The input signal c on the work channel is fixed at 0. The operation G k is given as either the identity (i.e., doing nothing), if a k = 0, or NOT (i.e., flipping an input bit to its complement bit), if a k = 1. As an example, a one-bit Boolean function (i.e., with N = 1) has = 2 4 2 1 sets of Reed-Muller coefficients (a 0 , a 1 ), which determine all possible Boolean functions. Table 1 gives four possible one-bit Boolean functions with Reed-Muller coefficients and the corresponding operations.
With a reversible circuit model, we then define classical and quantum machines. The classical machine consists of classical channels and operations, and the Boolean function of the classical machine is described as f with classical bits x, y, and c. We suppose the Reed-Muller coefficients a k to be probabilistically determined by the internal parameters p k , which implies that G k performs the identity and NOT operations with probabilities p k and − p 1 k , respectively. These probabilistic operations are primarily intended to provide a fair comparison with the quantum machine, which naturally employs a probabilistic operation. Now, we construct the quantum machine by setting only the work channel to be quantum. The input channels are left as classical, as the input information is classical in our work. Thus, the Boolean function of the quantum machine is described as f where the signal on the work channel is encoded into a qubit state. The classical probabilistic operations G k are also necessarily replaced by unitary operators: where p k is the probability of Ĝ k performing the identity operation, i.e., . Note that the relative phases ϕ k are free parameters suitably chosen before the learning. The feedback adjusts only the p k parameters, controllable both in the classical and the quantum experimental setups [28,29]. Table 1. Four possible one-bit Boolean functions are given with Reed-Muller coefficients (a 0 and a 1 ), and operations (G 0 and G 1 ). These are common to the classical and quantum cases.
These classical and quantum machines are equivalent to each other. They have the same circuit structures and exactly the same number of control parameters, p k . Moreover, the single classical operation G k and the quantum operator Ĝ k cannot be discriminated between by measuring the distribution of outcomes for the same input x and parameters p k .

The acceptable region
A target Boolean function is represented by a point, in the 2 N -dimensional search space spanned by the probabilities, p k . For example, four possible learning targets, f j ( = j 1, 2, 3, 4), for the one-bit Boolean function, correspond to four points in the search space: . Similarly, the machine behavior is also characterized as a point = p p Q ( , ) m 0 1 , i.e., the respective points lead to different probabilistic tasks that the machine performs. Learning is simply regarded as a process of moving Q m to a given target point in the whole search space. It is, however, usually impractical (actually, impossible in real circumstances) to locate Q m exactly at the target point. Instead, it is feasible to find approximate solutions near to the exact target, i.e. the learning is expected to lead the point Q m into a region near to the target point [8]. We call such a region an acceptable region for the approximate target functions. As the learning time and convergence depend primarily on the size of the acceptable region, it is usually expected that a larger acceptable region will make the learning faster [30]. We examine, in this sense, the acceptable regions of classical and quantum machines.
The acceptable region is defined as a set of points which guarantee the errors, ϵ = −  1 , to be less than or equal to a tolerable value, ϵ t . Here,  is the figure of merit of the machine performance, called the task fidelity, and it quantifies how well the machine performs a target function, defined by x y is a conditional probability of obtaining an output y given an input x, and the target probabilities τ x P y ( | ) are those for the target. For example, we have target probabilities for f 1 in table 1 written as The term ∑ τ x x P y P y ( | ) ( | ) y in equation (6) corresponds to the closeness of the two probability distributions x P y ( | ) and τ x P y ( | ) for the given x [31]. The task fidelity,  , increases as the outputs get close to the required outputs;  becomes unity only when the machine gives the target for all x, and otherwise is less than 1. The acceptable region can be seen as a set of probabilities, p k , such that , and thus higher  guarantees a wider acceptable region for a given tolerance, ϵ t .
Let us begin with, as the simplest case, the target function f 1 5 , a one-bit Boolean function, whose task fidelity,  p p ( , ) 0 1 , is reduced to 4 which is common to the classical and the quantum machines. For the classical machine, equation (8) is evaluated as   is the difference of the phases of the two unitaries Ĝ 0 and Ĝ 1 . Thus, the task fidelity  q of the quantum machine is evaluated as where the additional term Δ cos is apparently the result of quantum superposition. From the result of equation (12), we can see that q c q c provided that < < p 0 0 j (j = 0,1). The phase Δ plays an important role in helping the quantum machine via constructive interference, leading to >   q c . The task fidelities for the other three targets are also listed in table 2. Note here that, for all cases of the target function f j ,  q can always be larger than  c on choosing appropriate free parameters ϕ 1 and ϕ 2 before the learning. Therefore, the quantum machine has wider acceptable regions than the classical machine for a Table 2. The task fidelities of the quantum and classical machines are given in terms of the probabilities (p 0 and p 1 ) for each target function of the one-bit Boolean function. The phase Δ is defined in the main text, and it plays an important role in quantum machine learning.
given tolerance. In figure 2, the task fidelity and the acceptable region for each machine are shown for the target f 1 when Δ = 0 is chosen to maximize the difference between the two machines. We also found that the acceptable region of the quantum machine is about 5.6 times the size of that of the classical machine. The optimal phase condition for improving the task fidelity, as in equation (13), can be generalized to an arbitrary N-bit Boolean function ( > N 1). We provide one of the conditions as  . Here, we set Δ = 0 to maximize the task fidelity of the quantum machine. It is found that the acceptable region of the quantum machine is about 5.6 times the size of that of the classical machine.

Learning speedup via an expanded acceptable region
This section is devoted to the learning time in machine learning. For a numerical simulation, we employ random search as a feedback; this has often been considered for studying learning performance, rather than for any practical reasons [24]. Random search runs as follows. First, all 2 N control parameters p k are randomly chosen, and then, the task fidelity is measured with the chosen p k parameters. These two steps are thought of as a single iteration of the procedure. The iterations are repeated until the condition ϵ ⩾ −  1 t is satisfied for a given ϵ t . After a sufficient number of simulations have been performed, we then calculate the mean iteration number defined as = ∑ n nP n ( ) c , where P(n) is the probability of completing learning at the nth iteration. This mean iteration number, n c , can be used to quantify the learning time, and the results of numerical simulations for n c are shown in table 3, where quantum learning is demonstrated to be faster than classical learning. This is a direct result of the wider acceptable region of the quantum machine, as n c is inversely proportional to the size of the acceptable region in random search; γ = n 1 c is given by substituting in , where γ is equal to the ratio of the acceptable region to the whole space in random search. We demonstrate this by comparing the results for n c with the acceptable regions γ found from Monte Carlo simulation, given in table 3, and thereby we note that the acceptable region is the main feature directly influencing the learning time in random search. Also in figure 3, the data for n c in table 3 are well fitted to a function α β = + n D ln c , implying that the size of the acceptable region is exponentially decreased as the dimension = D 2 N of the parameter space increases, i.e. = α n O(e ) c D [32]. The fitting parameters are given as 3.065 0.072, 3.188 1.196 in the classical case, 0.238 0.008, 2.267 0.127 in the quantum case. (15) It is remarkable that the exponent α in the quantum case is much smaller than that in the classical case.
It follows from what has been shown that the acceptable region is the main feature directly influencing the learning time in random search. We have proved that we can always prepare a quantum machine which has an acceptable region larger than that of the classical one, in the previous section. Therefore, we finally conclude that the learning time can be shorter in the quantum case than in the classical case. The results of numerical simulation also support the assertion that the quantum machine learns much faster, particularly in a large search space. We clarify again that such a quantum speedup is enabled by the quantum superposition, and appropriately arranged phases.

Applying differential evolution
We consider a more practical learning model, taking into account real circumstances. A general analysis of the learning efficiency is very complicated, as so many factors are associated with the learning behavior. Furthermore, the most efficient learning algorithms tend to use heuristic rules and are problem-specific [33,34]. Nevertheless, it is usually believed that the acceptable region is a key factor for the efficiency of learning in a heuristic manner [32]. We conjecture, in this sense, that the quantum machine offers the quantum speedup even in a practical learning method.
We apply differential evolution (DE), which is known as one of the most efficient learning methods for global optimization [30]. We start with M sets of control parameter vectors = − p p p p ( , ,..., ) .., , whose components are the control parameters of the machine. In DE, these vectors, p i , are supposed to evolve by 'mating' their components p k with each other. Equation (6) is used as a criterion for how well machines with p i fit to the target. This process is iterated until the task fidelity reaches a certain level of accuracy ϵ − 1 t (see [30] or [20] for the detailed method of effecting differential evolution).
We perform the numerical simulations by increasing N from 1 to 7. The results are averaged over 1000 realizations for M = 50 and ϵ = 0.05 t . The target function is a constant function: = x f ( ) 0 for all x. Free parameters in differential evolution (e.g., the crossover rate In this work, we consider a constant target function that yields 0 for all inputs x, the optimal phase condition of equation (14) is chosen for the quantum machine, and the tolerable error ϵ t is set as 0.05. The data are well fitted to α β = + n D ln c in the classical (red line) and quantum (blue line) cases, with the fitting parameters α and β as in equation (15). and differential weight) are chosen to achieve the best learning efficiency for a classical machine 6 . Nevertheless, we expect the quantum machine to still exhibit the quantum speedup, assisted by the quantum superposition, with the optimal phases in equation (14). We give the mean task fidelity averaged over M in figure 4(a). For both classical and quantum cases, the mean task fidelities are increased close to 1, but the quantum machine is much faster for all cases. We investigate the learning time n c as we increase the dimension = D 2 N of the parameter space, as depicted in figure 4(b). The data are well fitted to a presumable function α ≃ β n D c , with α ≃ 3.82, β ≃ 0.97 for the classical machine, and α ≃ 1.61, β ≃ 0.80 for the quantum machine 7 . We note that the quantum machine still exhibits the speedup with the smaller α and β. Therefore, we expect such quantum speedup to be achievable even in real circumstances.

Summary and discussion
We investigated the learning performances of two machines by considering the task of finding an N-bit Boolean function which can be used in a binary classification problem. The two machines were designed equivalently to make the comparison of these two machines as convincing as possible. The critical difference between the two machines was that the . Note that the quantum machine still shows better convergence with the smaller α and β. 6 One may worry about the crossover point (for ⩾ N 5) in figure 4(a), associated with the validity of the quantum learning speedup for ϵ → 0 t . However, the appearance of the crossover is due to the DE optimization with the free parameters. Note here that the free parameters are optimized for the classical machine. The crossover can be removed by choosing the appropriate free parameters for each machine. 7 Such a polynomial result shows much improvement from the differential evolution one-but this is quite distinct from the case of random search, which exhibits exponential dependence. operations in the quantum machine are described by unitary operators, to deal with the quantum superposition. The learning processes of the two machines were characterized in terms of the acceptable region: the localized region of the parameter space including approximate solutions. We have found that the quantum machine has a wider acceptable region, induced by quantum superposition. We demonstrated a simulation with a standard feedback method, namely random search, to show that the sizes of the acceptable regions were inversely proportional to the learning time. Here, it was also shown that a wider acceptable region makes the learning faster; that is, the learning time is proportional to α O(e ) D , with α ≃ 3.065 in the classical learning case and α ≃ 0.238 for the quantum machine. We then applied a practical learning method, namely differential evolution, to our main task, and observed the learning speedup of the quantum machine.
Here, we wish to recall that the maximized learning speedup of the quantum machine is achieved by choosing suitable phases as in equation (14). From a practical perspective, one may consider that an additional task, such as finding the relative phases, is required to ensure the remarkable performance of the quantum learning machine for other N-bit Boolean function targets. Alternatively, such an issue could be resolved by synchronizing the relative phases with the control parameters in the quantum machine, still yielding the learning speedup (see appendix B for details).
We expect our work to motivate researchers to study the role of various quantum effects in machine learning, and to open up new possibilities for improving the machine learning performance. It is still open whether the quantum machine can be improved more by using other quantum effects, such as quantum entanglement.
In deriving the above reduced form of equation (A.1), we used that = Here we assume further that the search space is isotropic around S, so the machines on the surface of the hypersphere δ = d (Q, S) have the same task fidelity. This assumption is physically reasonable for very small tolerance error. Thus, without loss of generality, we consider the near-solution machine corresponding to the point Q on the sphere δ = d (Q, S) , satisfying − = s p c | | k k for all k. Here, δ = c 2 N . In these circumstances, x x P f ( ( )| ) for a classical near-solution machine is necessarily smaller than 1, depending on δ. On the other hand, if we choose the optimal phases ϕ k , then x x P f ( ( )| ) can be 1 without any dependence on δ in the quantum machine. To show this, let us first write the conditional probability respectively. This supports the assertion that the practical quantum machine always learns faster than the classical machine, while the performance of the original quantum machine depends on the target function. We then obtained the learning time of the practical quantum machine, which is shown in figure B1(b). These data are also well fitted to α β = + n D ln c , with the fitting parameters α ≃ ± 0.985 0.101 and β ≃ − ± for the classical machine (see equation (15)). This result shows that a considerable learning speedup is still achieved with this practical quantum machine, even though it takes up a little more time as compared to an original machine available with the optimal relative phases ( ∼ n O(e ) c D 0.238 ).