Reinforcement learning for semi-autonomous approximate quantum eigensolver

The characterization of an operator by its eigenvectors and eigenvalues allows us to know its action over any quantum state. Here, we propose a protocol to obtain an approximation of the eigenvectors of an arbitrary Hermitian quantum operator. This protocol is based on measurement and feedback processes, which characterize a reinforcement learning protocol. Our proposal is composed of two systems, a black box named environment and a quantum state named agent. The role of the environment is to change any quantum state by a unitary matrix U ˆ E = e − i τ  ˆ E where  ˆ E is a Hermitian operator, and τ is a real parameter. The agent is a quantum state which adapts to some eigenvector of  ˆ E by repeated interactions with the environment, feedback process, and semi-random rotations. With this proposal, we can obtain an approximation of the eigenvectors of a random qubit operator with average fidelity over 90% in less than 10 iterations, and surpass 98% in less than 300 iterations. Moreover, for the two-qubit cases, the four eigenvectors are obtained with fidelities above 89% in 8000 iterations for a random operator, and fidelities of 99% for an operator with the Bell states as eigenvectors. This protocol can be useful to implement semi-autonomous quantum devices which should be capable of extracting information and deciding with minimal resources and without human intervention.

For the first group, we have two classes of algorithms, one of them are the supervised learning algorithms, which use a previously labeled data set named training data to infer a labeled criterion which is used to classify new data; a remarkable example is pattern recognition algorithms [22][23][24]. The other class is unsupervised learning algorithms. In this case, the training data is not necessary, and the approach is to group the unlabeled data in different sets, where each set is characterized by the mean value of some property of its constituents. The different groups are constructed to optimize some indicator of the dispersion in each subset with respect to the value that characterized it, e.g.the standard deviation. An example of these algorithms is the clustering problem [25,26].
• Feedback loop: The information of the measuring process is communicated to a command center with the ability to perform a unitary transformation j (quantum gate) over the state of A in order to change the possible results in the next information extraction step.
• Decision process: If the outcome of the measurement process is the state |f ñ A j , , with ¹ j 0, this means that |f ñ A,0 changes when system A interacts with E, therefore, |f ñ A,0 cannot be an eigenvector of E . In this case, we define the unitary transformation j asˆ( and j α is a random angle in the range [ ] p p -w w , , with w the searching range given by the RF. We note that j is a pseudo-random rotation in the subspace expanded by . For this outcome we define the state of A asˆ|f ñ  j A,0 , and start again with the information extraction step.
If the outcome of the measuring process is |f ñ A,0 , it means that |f ñ A,0 could be an eigenvector of E . We point out that the eigenvectors of an operator remain constant up to a global phase under the action of a function of this operator. In this case, we apply the identity operator . Moreover, we keep the same state |f ñ A,0 and start again with the information extraction step. Figure 1 shows a scheme of the policy of the algorithm.
For the RF we define the reward rate r<1 and the punishment rate p>1. If the outcome of the measure is |f ñ A,0 we define¯· = w w r and¯· = w w p in other case. Finally, we renamed= w w for the next iteration of the algorithm, which means that when we measure |f ñ A,0 we reduce the searching range, and we increase it in other case. The initial value for w is chosen according to the problem.
As we can note, the protocol does not need store the states, or all the history of the algorithm, it only needs to store the final operationˆ( ) D N via storing the parameters that characterize this operation classically.
To ensure the convergence of our algorithm, we define the VF as the value of w. This implies that, when  w 0, our protocol converges. For a correct choice of r and p we have that  w 0 only if we obtain, in the measurement process of |f ñ A,0 , the outcome |f ñ A,0 many times in a row. This means that |f f á ñ~1 As this is an iterative protocol, we define the following notation for the remainder of the article: any superindex between parenthesis refers to the iteration of the algorithm, e.g.| ( ) f ñ A,0 4 is the state of A before the interaction with E in the fourth iteration. Similarly,ˆ( )  j k is the unitary transformation defined in the decision process for the iteration k. As a special case, the super-index ( ) 1 refers to the initial values, e.g. ( ) w 1 represents the initial searching range.
It is necessary to mention that our algorithm uses one single-shot measurement per loop, representing advantage with respect to employing an expectation value or the fidelity. The latter imply hundreds of measurements for a two-level system, being this proposal exposed less time to noise sources. Also, as we use pseudo-random operationsˆ( ) D k , the effect of any noise in the gate can be seen as part of the randomness of the protocol.
For the explicit form¯( ) q k and¯( ) f k in terms of α, β, τ and the eigenvalues of E see appendix A. Moreover, for the explicit form of ( ) D q k and ( ) D f k , see appendix B. Now, to perform the measurement process over |¯( ,0 , we apply the basis-rotation matrixˆ| in order to measure in the basis {| | } ñ ñ 0 , 1 for all iterations. After the measurement process, the state of A is and start again the algorithm.
where σ x is the Pauli matrix x, and apply the pseudo-random operatorˆ( )  k 1 defined by equation (2). Then, after the measurement process, we apply over | ( ) ñ m k the operatorˆ( j j the spin operators, with σ j the Pauli matrix j. Then, the operatorˆ( For this case, the RF that defines the value of ( ) where r and p are the reward rate and punishment rate, respectively, described previously.
When the algorithm converges, we have | |( In order to explore the complete space we must choose w (1) =1.

Single-qudit case
In this case, the agent is a d-dimensional system or qudit, the operator E is described by a d×d Hermitian matrix with eigenvalues {λ j }, eigenvectors {| } ñ v j and j={0,1,2 ,K, d−1}. In the kth iteration of the algorithm, the state of A before E reads Subsequently, we apply the operatorˆ( and perform the measurement process in the basis is the outcome of the measurement process. In this case the decision process applies the operatorˆ( ) G k 0 defined by equation (11), but witĥ . Also in this caseˆˆˆ( The state of A for the next iteration reads |ˆ| ( ) . Finally, the RF that updates the value of the searching range is given by Once the algorithm converges, we have that is an approximate eigenvector, therefore In order to find another eigenvector of E , we start again the algorithm for the iteration Thus, we can calculate the operatorû j as in equation (22). The decision process changes asˆˆˆ( . Finally, the RF reads These changes mean that we perform the protocol in the subspace orthogonal to | ( ) f ñ A,0 1 . When the algorithm converges again, after N 1 iterations more, we have that the statesˆ| are approximate eigenvectors. Therefore, to obtain the next eigenvector we perform the algorithm again but in the 1 are the d eigenvectors of E .

Multiqubit case
For this case, we can suppose that the system A is a qudit state, where now the states | ñ j of the basis, correspond to the binary representation of j with ( ) log d 2 digits. For example, for d=16 we have 4 digits, where each of them represents the state of a qubit; then | | ñ = ñ 5 0101 . Also, we can produce the different operatorsû j using controlled-not gates and single-qubit rotations [53]. Therefore, we can map this problem to the qudit case obtaining the same algorithm as in the previous case.
As we can see from this section, our protocol does not need to encode quantum information in a classical processor, being advantageous with respect to classical algorithms that need to characterize the quantum interactions by quantum tomography. The latter imply hundreds of measurements of the quantum system, using in this process more resources than the entire algorithm proposed. Moreover, as our algorithm finds the eigenstate statistically, it is simpler than a full quantum algorithm that finds the eigenstates exactly, being our protocol experimentally feasible. The [51,52] show the experimental implementation of an algorithm that employs the same basics steps in which our current algorithm is based, for the case of quantum states, instead of quantum operators, opening the door to the implementation of this work.

Numerical results
It is convenient to define the following quantities for the numerical analysis of the protocol, · n n =  = r p p r, with r (p) the reward (punishment) rate, the total number of rewards n r and the total number of punishments n p in the algorithm. The VF of our algorithm is the value of ( ) = w r p Moreover, when ν is larger, the algorithm needs more iterations to converge, but nevertheless it achieves larger fidelities. This is the exploration versus exploitation balance known in RL. Here, we perform the simulation for a single-and two-qubit case for different values of ν and r. Remember that for all cases we choose Finally, as the unitary operatorû j given by equation (22) depends on pseudo-randoms angles, we perform many times the algorithm, defining the mean fidelity  and the mean searching range  as where ℓ | ñ E is the ℓth eigenvector of E , the index i refers to the ith repetition of the protocol and  is the total number of repetitions. In all subsequent cases we choose =  1000.

Single-qubit case
For the general performance of our protocol, we start with a E described by a random Hermitian matrix. Figure 2 shows the mean fidelity ( ) for different values of the reward rate r, and the parameter ν. From this figure, we can see that for r=0.9 and ν=2, we obtain ( ) >  k 0.98 0 with k<300. Also, in all cases we have ( ) >  k 0.90 0 for k<10. It means that using a reduced number of iterations we can obtain good fidelities for the eigenvector of a completely random single-qubit operator. On the other hand, we observe that when r and ν are larger, the maximum value of ( )  k 0 increases, but we need more iterations for the convergence of the algorithm. Figure 3 shows the mean searching range ( )  k for the same cases. From this figure we can clearly see how the algorithm needs less iterations when r and ν decrease, with the extreme case of r=0.6, ν=1, where the algorithm converges before 70 iterations. Now, we consider a particular exampleˆˆs = =  S E x x 1 2 . In this case, the distance in the Bloch sphere between | ñ 0 and the eigenstates of E is the largest possible. Figure 4 shows that our algorithm converges with few iterations to good approximations of the eigenvectors, we can see that we obtain the eigenvectors with fidelity above 98% in 400 iterations, for the case ν=2 and r=0.9.
As we can see, the maximum fidelity for the caseˆ=  S E x has decreased with respect to the random one. This is because the distance between | ñ 0 and the eigenvectors ofŜ x is larger than the distance between | ñ 0 and the eigenvectors of E in the random case, therefore, the protocol has worse convergence.

Two-qubit case
This case is analogous to the single-qudit case with d=4. First, for a general performance, we consider E as a random two-qubit operator. Moreover, we choose =  1000 and calculate the mean fidelity ( )  k j and the mean searching range  j given by equation (33). Figure 5 shows the numerical calculation for r=0.9 and ν={1.5,2}. It shows again that for small ν the convergence is faster but the maximum value of  j is smaller. Furthermore, with ν=2 we need 8500 iterations such that the four approximate eigenvectors converge. With ν=1.5, we only need 6000 iterations. Nevertheless, for ν=2 we obtain > . Also, we can see from the evolution of ( )  k that the number of iterations needed for the convergence is smaller each time that the algorithm starts again to approximate the next eigenvector, that is, N 0 >N 1 >N 2 . Finally, we consider as special caseˆ=  B E , whereB is an operator given bŷ the maximally-entangled Bell states. Figure 6 shows the performance of our protocol for this case. We can see that we obtain high fidelities ( >  0.99 j ) with only 1000 iterations to approximate the four eigenvectors. We obtain this performance due to the fact that our algorithm is sensitive to the number of the product states involved in each subspace (dimension of the subspace) and not to the total dimension of the operator E . In this case, the operatorB is block-diagonal, where one block acts in the subspace {| | } ñ ñ 00 , 11 and the other in {| | } ñ ñ 01 , 10 . This implies that the present case is similar to two independent single-qubit cases. In figure 6, we can see that from k=1 to k=500 we approximate the eigenstates of the first block, that is |f ñ  at the same time, and from k=501 to k=1000 we approximate the eigenstates of the second block |y ñ  , where both cases have a performance similar to the single-qubit case.

Conclusions
We propose and analyze an approximate quantum eigensolver based on RL with minimal resources. This proposal can be classified as a hybrid classical-quantum algorithm, such that we use a classical optimization algorithm to change a quantum system to improve a quantum task using a feedback loop combined with partially-random unitary gates. This is in contrast with other hybrid algorithms that measure the fidelities or some expectation value in each step. Therefore, our proposal is advantageous with respect to the usual hybrid algorithms, in the sense that our protocol needs minimal storage to save only the last step of the algorithm and  (33), where E is a random two-qubit operator. We employ =  1000 and r=0.9.
employs just one single-shot measurement per iteration, instead of fidelities or expectation-value measurements, which decrease the effect of the source of noise. Moreover, our protocol considers pseudorandom two-level rotations, such that it is not necessary to implement high-fidelity operations, because the randomness of the algorithm absorbs the errors of the gates. For this reason, our algorithm would be experimentally feasible in almost any current quantum platform. Additionally, we validated our proposal with numerical calculations of four different choices of the operator  E , random single-qubit operator,Ŝ x operator, random two-qubit operator, andB operator defined by equation (34), obtaining as a general rule that our algorithm reaches higher fidelities for the approximate eigenvectors for large values of ν and r, but the convergence in this case is slower. This is related to the balance between exploration and exploitation typical from RL algorithms. Moreover, our algorithm is sensitive to the size of the different subspaces expanded by product states and not to the size of the total space of the operator E . This is the case showed in figure 6, where the eigenvectors are the maximally-entangled Bell states. We point out that, in order to improve the performance of the protocol in future extensions, it could be interesting to study dynamical reward rates (r) and dynamical parameter ν.
Finally, due to the simplicity, minimal resources employed by our protocol, and the fact that we need only a basic classical processor (command center) capable to perform pseudo-random rotations, it can be useful for the development of near future semi-autonomous quantum devices, which will have to make decisions with incomplete information obtained by interaction with the external environment.

Acknowledgments
We acknowledge support from Financiamiento Basal para Centros Científicos y Tecnológicos de Excelencia

Data availability statement
The data that support the findings of this study are openly available at https://github.com/PanchoAlbarran/ EigenSolver.