Speeding-up the decision making of a learning agent using an ion trap quantum processor

We report a proof-of-principle experimental demonstration of the quantum speed-up for learning agents utilizing a small-scale quantum information processor based on radiofrequency-driven trapped ions. The decision-making process of a quantum learning agent within the projective simulation paradigm for machine learning is implemented in a system of two qubits. The latter are realized using hyperfine states of two frequency-addressed atomic ions exposed to a static magnetic field gradient. We show that the deliberation time of this quantum learning agent is quadratically improved with respect to comparable classical learning agents. The performance of this quantum-enhanced learning agent highlights the potential of scalable quantum processors taking advantage of machine learning.


INTRODUCTION
The past decade has seen the parallel advance of two research areas -quantum computation [1] and artificial intelligence [2] -from abstract theory to practical applications and commercial use. Quantum computers, operating on the basis of information coherently encoded in superpositions of states that could be considered classical bit values, hold the promise of exploiting quantum advantages to outperform classical algorithms, e.g., for searching databases [3], factoring numbers [4], or even for precise parameter estimation [5,6]. At the same time, artificial intelligence and machine learning have become integral parts of modern automated devices using classical processors [7][8][9][10]. Despite this seemingly simultaneous emergence and promise to shape future technological developments, the overlap between these areas still offers a number of unexplored problems [11]. It is hence of fundamental and practical interest to determine how quantum information processing and autonomously learning machines can mutually benefit from each other.
Within the area of artificial intelligence, a central component of modern applications is the learning paradigm of an agent interacting with an environment [2,12,13] illustrated in Fig. 1 (a), which is usually formalized as so-called reinforcement learning. This entails receiving perceptual input and being able to react to it in different ways. The learning aspect is manifest in the reinforcement of the connections between the inputs and actions, where the correct association is (often implicitly) specified by a reward mechanism, which may be external to the agent. In this very general context, an approach to * wunderlich@physik.uni-siegen.de explore the intersection of quantum computing and artificial intelligence is to equip autonomous learning agents with quantum processors for their deliberation procedure 1 . That is, an agent chooses its reactions to perceptual input by way of quantum algorithms or quantum random walks. The agent's learning speed can then be quantified in terms of the average number of interactions with the environment until targeted behavior (reactions triggering a reward) is reproduced by the agent with a desired efficiency. This learning speed cannot generically be improved by incorporating quantum technologies into the agent's design [14].
However, a recent model [15] for learning agents based on projective simulation (PS) [13] allows for a generic speed-up in the agent's deliberation time during each individual interaction. This quantum improvement in the reaction speed has been established within the reflecting projective simulation (RPS) variant of PS [15]. There, the desired actions of the agent are chosen according to a specific probability distribution that can be modified during the learning process. This is of particular relevance to adapt to rapidly changing environments [15], as we shall elaborate on in the next section. For this task, the deliberation time of classical RPS agents is proportional to the quantities 1/δ and 1/ . These characterize the time needed to generate the specified distribution in the agent's internal memory and the time to sample a suitable (e.g., rewarded rather than an unrewarded) action from it, respectively. A quantum RPS (Q-RPS) agent, in contrast, is able to obtain such an action quadrati- (a) Learning agents receive perceptual input ("percepts") from and act on the environment. The projective simulation (PS) decision-making process draws from the agent's memory and can be modeled as a random walk in a clip network, which, in turn, is represented by a stochastic matrix P . (b) Q-RPS agents enhance the relative probability of (desired) actions (green columns) compared to other clips (grey) that may include undesired actions or percepts (blue) within the stationary distribution of P before sampling, achieving a quadratic speed-up w.r.t. to classical RPS agents.
Here, we report on the first proof-of-principle experimental demonstration of a quantum-enhanced reinforcement learning system, complementing recent experimental work in the context of (un)supervised learning [16][17][18]. We implement the deliberation process of an RPS learning agent in a system of two qubits that are encoded in the energy levels of one trapped atomic ion each. Within experimental uncertainties, our results confirm the agent's action output according to the desired distributions and within deliberation times that are quadratically improved with respect to comparable classical agents. This laboratory demonstration of speeding up a learning agent's deliberation process can be seen as the first experiment combining novel concepts from machine learning with the potential of ion trap quantum computers where complete quantum algorithms have been demonstrated [19][20][21][22] and feasible concepts for scaling up [23][24][25] are vigorously pursued.

EXPERIMENTAL IMPLEMENTATION OF RANK-ONE RPS
The proof-of-principle experiment that we report in this paper experimentally demonstrates the quantum speed-up of quantum-enhanced learning agents. That is, we are able to empirically confirm both the quadratically improved scaling of O(1/ √ ), and the correct output according to the tail of the stationary distribution. Here, denotes the initial probability of finding a flagged action within the stationary distribution α α α = {a i }) for the average number of calls of the diffusion operator before sampling one of the desired actions (see Methods). The tail is defined as the first n components of α α α. The latter means that a j /a k = b j /b k ∀j, k ∈ {1, . . . , n}, where b j denotes the final probability that the agent obtains the flagged action labeled j. Note that the Q-RPS algorithm enhances the overall probability of obtaining a flagged action such that˜ whilst maintaining the relative probabilities of the flagged actions according to the tail of α α α, as illustrated in Fig. 1 (b). For the implementation we hence need at least a threedimensional Hilbert space that we realize in our experiment using two qubits encoded in the energy levels of two trapped ions (see the experimental setup section): Two states to represent two different flagged actions (represented in our experiment by |00 and |01 ), and at least one additional state for all non-flagged actions (|10 and |11 in our experiment). The preparation of the stationary state is implemented by where R j (θ, φ) is a single-qubit rotation on qubit j, i.e., Here, X j , Y j , and Z j denote the usual Pauli operators of qubit j. The total probability = a 00 + a 01 for a flagged action within the stationary distribution is then determined by θ 1 via = cos 2 (θ 1 /2), whereas θ 2 determines the relative probabilities of obtaining one of the flagged actions via a 00 / = cos 2 (θ 2 /2).
The reflection over the flagged actions ref A is here given by a Z rotation, defined by R j,z (θ) = exp[−i θ 2 Z j ], with rotation angle −π for the first qubit, The reflection over the stationary distribution can be performed by a combination of single-qubit rotations determined by θ 1 and θ 2 and a CNOT gate given by which can be understood as two calls to U P (once in terms of U † P ) supplemented by fixed single-qubit operations [26]. The total gate sequence for a single diffusion step (consisting of a reflection over the flagged actions followed by a reflection over the stationary distribution) can hence be decomposed into single-qubit rotations and CNOT gates and is shown in Fig. 2. The speed-up of the rank-one Q-RPS algorithm w.r.t. a classical RPS agent manifests in terms of a quadratically smaller average number of calls to U P (or, equivalently, to the diffusion operator D = ref α ref A ) until a flagged action is sampled. Since the final probability of obtaining a desired action is˜ ≡ i=1,...,n b i , we require 1/˜ samples on average, each of which is preceded by the initial preparation of |α and k diffusion steps. The average number of uses of U P to sample correctly is hence C = 2k( ) + 1 /˜ , which we refer to as 'cost' in this paper. In the following, it is this functional relationship between C and that we put to the test, along with the predicted ratio a 00 /a 01 of occurrence of the two flagged actions.

The experimental setup
Two 171 Yb + ions are confined in a linear Paul trap with axial and radial trap frequencies of 2π × 117 kHz and 2π × 590 kHz, respectively. After Doppler cooling, the two ions form a linear Coulomb crystal, which is exposed to a static magnetic field gradient of 19 T/m, generated by a pair of permanent magnets. The ion-ion spacing in this configuration is approximately 10 µm. Magnetic gradient induced coupling (MAGIC) between ions results in an adjustable qubit interaction mediated by the common vibrational modes of the Coulomb crystal [27]. In addition, qubit resonances are individually shifted as a result of this gradient and become position dependent. This makes the qubits distinguishable and addressable by their frequency of resonant excitation. The addressing frequency separation for this two-ion system is about 3.7 MHz. All coherent operations are performed using radio frequency (RF) radiation near 12.6 GHz, matching the respective qubit resonances [28]. A more detailed description of the experimental setup can be found in Refs. [21,27,29].
The qubits are encoded in the hyperfine manifold of each ion's ground state, representing an effective spin 1/2 system. The qubit states |0 and |1 are represented by the energy levels | 2 S 1/2 , F = 0 and | 2 S 1/2 , F = 1, m F = +1 , respectively. The ions are Doppler cooled on the resonance | 2 S 1/2 , F = 1 ↔ | 2 P 1/2 , F = 0 with laser light near 369 nm. Optical pumping into long-lived meta-stable states is prevented using laser light near 935 nm and 638 nm. The vibrational excitation of the Doppler cooled ions is further reduced by employing RF sideband cooling for both the center of mass mode and the stretch mode. This leads to a mean vibrational quantum number of n ≤ 5 for both modes. The ions are then initialized in the qubit state |0 by state selective optical pumping with a 2.1 GHz red-shifted Doppler-cooling laser on the | 2 S 1/2 , F = 1 The desired qubit states are prepared by applying an RF pulse resulting in a coherent qubit rotation with precisely defined rotation angle and phase (Eq. (3)). The required number of diffusion steps is then applied to both qubits, using appropriate single-qubit rotations and a two-qubit ZZ-interaction given by which is directly realizable with MAGIC [27]. A CNOT gate (U CNOT ) can then be performed via The required number of single qubit gates is optimized by combining appropriate single qubit rotations together from ref A and ref α (see Fig. 2). Thus, we can simplify the algorithm to as shown in Fig. 5 of Methods. During the evolution time of 4.24 ms for each diffusion step both qubits are protected from decoherence by applying universally robust (UR) dynamical decoupling (DD) pulses [30]. The complete pulse sequence for the experiment reported here can be found in Fig. 5 of Methods.

Results
As discussed above, our goal is to test the two characteristic features of rank-one Q-RPS: (i) the scaling of the average cost C with , and (ii) the sampling ratio for the different flagged actions. For the former, we expect a scaling of 1/ √ , while we expect the ratio of the number of occurrences of the two actions to be maintained with respect to the relative probabilities given by the stationary distribution. Therefore, our first set of measurements studies the behavior of the cost C as a function of the After the preparation of |α , k diffusion steps are applied before an action is sampled. This procedure is repeated until a flagged action is obtained, accumulating a certain cost C, whose average is shown on the vertical axis. Measurements are performed for different values of corresponding to k = 1 to k = 7 diffusion steps. The dashed black line and the solid blue line represent the behavior expected for ideal Q-RPS (1/ √ ) and ideal classical RPS (1/ ), respectively. The fit to the experimental data confirms that the scaling behavior follows a 1/ √ behavior, and thus is consistent with Q-RPS. The data show that the experimental Q-RPS outperforms the classical RPS within the range of chosen in the experiment. The error bars on the x-axis represent the uncertainties in preparing the quantum states. The error bars on the y-axis represent the statistical errors. total initial probability . The second set of measurements studies the behavior of the output probability ratio r f = b 00 /b 01 as a function of input probability ratio r i = a 00 /a 01 .
For the former, a series of measurements is performed for different values of corresponding to k = 1 to k = 7 diffusion steps after the initial state preparation. To obtain the cost C = 2k( ) + 1 /˜ , where˜ = b 00 + b 01 , we measure the probabilities b 00 and b 01 after k diffusion steps and repeat the experiment 1600 times for fixed . The average cost is then plotted against as shown in Fig. 3. The experimental data shows that the cost decreases with as 1/ √ , as desired. This is in good agreement with the behavior expected for the ideal Q-RPS algorithm. In the range of chosen probabilities , the experimental result of Q-RPS outperforms the classical RPS, as shown in Fig. 3. Therefore, we demonstrate that the experimental efficiency is already good enough not only to obtain improved scaling, but also to outperform the classical algorithm, despite the offset in the cost function and the finite precision of the quantum algorithm. The deviation from the ideal behavior is attributed to a small detuning of the RF pulses implementing coherent operations, as we discuss in the Supplementary Materials. For the second set of measurements, we select a few calculated probabilities a 00 and a 01 in order to obtain different values of the input ratio r i = a 00 /a 01 between 0 and 2, whilst keeping k( ) in a range between k = 1 and k = 3. For these probabilities a 00 and a 01 , the corresponding rotation angles θ 1 and θ 2 of RF pulses intended for preparation are extracted using Eq. (4) and Eq. (5). We then perform the Q-RPS algorithm for the specific choices of k and repeat it 1600 times to estimate the probabilities b 00 and b 01 . We finally obtain the output ratio r f = b 00 /b 01 , which is plotted against the input ratio in Fig. 4. The experimental data follows a straight line with an offset from the behavior r f /r i = 1 expected for an ideal Q-RPS agent. The slopes of the two fitted linear functions agree within their respective error showing that the deviation of the output ratio from the ideal result is independent of the number of diffusion steps. In addition, this indicates that this deviation is not caused by the quantum algorithm itself, but by the initial state preparation and/or by the final measurement process where such a deviation can be caused by an asymmetry in the detection fidelity. Indeed, the observed deviation is well explained by a typical asymmetry in the detection fidelity of 3% as encountered in the measurements presented here. This implies reliability of the quantum algorithm also for a larger number of diffusion steps. A detailed discussion of experimental sources of error is given in the Supplementary Materials. CONCLUSION We have investigated a quantum-enhanced deliberation process of a learning agent implemented in an ion trap quantum processor. Our approach is centered on the projective simulation [13] model for reinforcement learning. Within this paradigm, the decision-making procedure is cast as a stochastic diffusion process, that is, a (classical or quantum) random walk in a representation of the agent's memory.
The classical PS framework can be used to solve standard textbook problems in reinforcement learning [31][32][33], and has recently been applied in advanced robotics [34], adaptive quantum computation [35], as well as in the machine-generated design of quantum experiments [36]. We have focused on reflecting projective simulation [15], an advanced variant of the PS model based on "mixing" (see Methods), where the deliberation process allows for a quantum speed-up of Q-RPS agents w.r.t. to their classical counterparts. In particular, we have considered the interesting special case of rank-one Q-RPS. This provides the advantage of the speed-up offered by the mixing-based approach, but is also in one-to-one correspondence with the hitting-based basic PS using twolayered networks, which has been applied in classical task environments [31][32][33][34][35][36].
In a proof-of-principle experimental demonstration, we verify that the deliberation process of the quantum learning agent is quadratically faster compared to that of a classical learning agent. The experimental uncertainties in the reported results, which are in excellent agreement with a detailed model, do not interfere with this genuine quantum advantage in the agent's deliberation time. We achieve results for the cost C for up to 7 diffusion steps corresponding to an initial probability = 0.01 to choose a flagged action. The systematic variation of the the ratio r i between the input probabilities, a 00 and a 01 for flagged actions and the measurement of the ratio r f between the learning agent's output probabilities, b 00 and b 01 as a function of r i shows that the quantum algorithm is reliable independent of the number of diffusion steps.
This experiment highlights the potential of a quantum computer in the field of quantum enhanced learning and artificial intelligence. A practical advantage, of course, will become evident once larger percept spaces and general rank-N Q-RPS are employed. Such extensions are, from the theory side, unproblematic given that the modularized nature of the algorithm makes it scalable. An experimental realization of such large-scale quantum enhanced learning will be feasible with the implementation of scalable quantum computer architectures. Meanwhile, all essential elements of Q-RPS have been successfully demonstrated in the proof-of-principle experiment reported here. A generic picture for modeling autonomous learning scenarios is that of repeated rounds of interaction between an agent and its environment. In each round the agent receives perceptual input ("percepts") from the environment, processes the input using an internal deliberation mechanism, and finally acts upon (or reacts to) the environment, i.e., performs an "action" [13]. Depending on the reward system in place and the given percept, such actions may be rewarded or not, which leads the agent to update its deliberation process, the agent learns.
Within the projective simulation (PS) [13] paradigm for learning agents, the decision-making procedure is cast as a (physically motivated) stochastic diffusion process within an episodic compositional memory (ECM), i.e., a (classical or quantum) random walk in a representation of the agent's memory containing the interaction history. One may think of the ECM as a network of clips that can correspond to remembered percepts, remembered actions, or combinations thereof. Mathematically, this clip network is described by a stochastic matrix (defining a Markov chain) P = (p ij ), where the p ij with 0 ≤ p ij ≤ 1 and i p ij = 1 represent transition probabilities between the clips labeled i and j with i, j ∈ {1, 2, . . . , N }. The learning process is implemented through an update of the N × N matrix P , which, in turn, serves as a basis for the random walks in the clip network. Different types of PS agents vary in their deliberation mechanisms, update rules, and other specifications.
In particular, one may distinguish between PS agents based on "hitting" and "mixing". For the former type of PS agent, a random walk could, for instance, start from a clip c 1 called by the initially received percept. The first "step" of the random walk then corresponds to a transition to clips c j with probabilities p 1j . The agent then samples from the resulting distribution {p 1j } j . If such a sample provides an action, e.g., if the clip c k is "hit", this action is selected as output, otherwise the walk continues on from the clip c k . An advanced variant of the PS model based on "mixing" is reflecting projective simulation (RPS) [15]. There, the Markov chain is first "mixed", i.e., an appropriate number 2 of steps are applied until the stationary distribution is attained (approximately), before a sample is taken. This, or other implementations of random walks in the clip network provide the basis for the PS framework for learning. The classical PS framework can be used to solve standard textbook problems in reinforcement learning [31][32][33], and has recently been applied in advanced robotics [34], adaptive quantum computation [35], as well as in the machine-generated design of quantum experiments [36].
Here, we focus on RPS agents, where the deliberation process based on mixing allows for a speed-up of Q-RPS agents w.r.t. to their classical counterparts [15]. In contrast to basic hitting-based PS agents, the clip network of RPS agents is structured into several sub-networks, one for each percept clip, and each with its own stochastic matrix P . In addition to being stochastic, these matrices specify Markov chains which are ergodic [15], which ensures that the Markov chain in question has a unique stationary distribution, i.e., a unique eigenvector α α α with eigenvalue +1, Pα α α = α α α. Starting from any initial state, continued application of P (or its equivalent in the quantized version) mixes the Markov chain, leaving the system in the stationary state.
As part of their deliberation process, RPS agents generate stationary distributions over their clip space, as specified by P , which is updated as the agent learns. These distributions have support over the whole subnetwork clip space, and additional specifiers -flagsare used to ensure an output from a desired sub-set of clips. For instance, standard agents are presumed to output actions only, in which case only the actions are "flagged" 3 . This ensures that an action will be output, while maintaining the relative probabilities of the actions.
The same mechanism of flags, which can be thought of as short-term memory, is also used to eliminate iterated attempts of actions which did not yield rewards in recent time-steps. This leads to a more efficient exploration of correct behavior.
In the quantum version of RPS, each clip c i is represented by a basis vector |i in a Hilbert space H. The mixing process is then realized by a diffusion process on two copies of the original Hilbert space. On the doubled space H⊗H a unitary operator W (P ) (called the Szegedy walk operator [37,38]) and a quantum state |α with W (P ) |α = |α take the roles of the classical objects P and α α α. Both W (P ) and |α depend on a set of unitaries U i on H that act as U i |0 = j √ p ij |j for some The more intricate construction of W (P ) is given in detail in [26]. The feature of the quantum implementation of RPS that is crucial for us here is an amplitude amplification similar to Grover's algorithm [3], which incorporates the mixing of the Markov chain and allows outputting flagged actions after an average of O(1/ √ ) calls to W (P ), where is the probability of sampling an action from the stationary distribution. The algorithm achieving this is structured as follows. After an initialization stage where |α is prepared, a number of diffusion steps are carried out. Each such step consists of two parts. The first part is a reflection over the states corresponding to actions in the first copy of H, i.e., an operation where A = span{|1 , . . . , |n } denotes the subspace of the clip network corresponding to actions. In the second part, an approximate reflection over the state |α , the mixing, is carried out, i.e., an operation designed to approximate ref α = 2 |α α | − 1 [15]. This second step involves O(1/ √ δ) calls to W (P ). The two-part diffusion steps are repeated O(1/ √ ) times before a sample is taken from the resulting state by measuring in the basis {|i } i=1,...,N . If an action is sampled, the algorithm concludes and that action is chosen as output. Otherwise, all steps are repeated. Since the algorithm amplifies the probability of sampling an action (almost) to unity, carrying out the deliberation procedure with the help of such a Szegedy walk hence requires an average of O(1/ √ δ ) calls to W (P ). In comparison, a classical RPS agent would require an average of O(1/δ) applications of P to mix the Markov chain, and an average of O(1/ ) samples to find an action. Q-RPS agents can hence achieve a quadratic speed-up in their reaction time.
Here, it should be noted that, its elegance not withstanding, the construction of the approximate reflection for general RPS networks is extremely demanding for current quantum computational architectures. Most notably, this is due to the requirement of two copies of H, on which frequently updated coherent conditional operations need to be carried out [26,39,40]. However, as we shall explain now, these requirements can be circumvented for the interesting class of rank-one Markov chains. In this special case, the entire Markov chain P can be represented on one copy of H by a single unitary U P = U i ∀i, since all columns of P are identical. Conceptually, this simplification corresponds to a situation where each percept-specific clip network contains only actions and the Markov chain is mixed in one step (δ = 1). In such a case one uses flags to mark desired actions. Interestingly, these minor alterations also allow to establish a one-to-one correspondence with the hittingbased basic PS using two-layered networks, which has been applied in classical task environments [31][32][33][34][35][36].
Let us now discuss how the algorithm above is modified for the rank-one case with the flagging mechanism in place. First, we restrict A to be the subspace of the flagged actions only, assuming that there are n N of these, and we denote the corresponding probabilities within the stationary distribution by a 1 , . . . , a n . In the initialization stage, the state |α = i=1,...,N √ a i |i is prepared. Then, an optimal number of k diffusion steps [3] is carried out, where and = i=1,...,n a i is the probability to sample a flagged action from the stationary distribution. Within the diffusion steps, the reflections over all actions of Eq. (10) are replaced by reflections over flagged actions, i.e., In the rank-one case, the reflections over the stationary distribution α becomes an exact reflection ref α over the state |α and can be carried out on one copy of H [26]. After the diffusion steps, a sample is taken and the agent checks if the obtained action is marked with a flag. If this is the case, the action is chosen as output, otherwise the algorithm starts anew. While a classical RPS agents requires an average of O(1/ ) samples until obtaining a flagged action, this number reduces to O(1/ √ ) for Q-RPS agents. This quantum advantage is particularly pronounced when the overall number of actions is very large compared to n and the environment is unfamiliar to the agent or has recently changed its rewarding pattern, in which case may be very small. Given some time, both agents learn to associate rewarded actions with a given percept, suitably add or remove flags, and adapt P (and by extension α α α).
In the short run, however, classical agents may be slow to respond and the advantage of a Q-RPS agent becomes apparent. Despite the remarkable simplification of the algorithm for the rank-one case with flags, the quadratic speed-up is hence preserved.

FIG. 5.
Experimental sequence for Q-RPS. RF1 and RF2 each indicate a time axis for a qubit. The qubits are prepared in the desired input states using single-qubit rotations implemented by applying RF pulses. For each RF pulse, the two parameters within the parentheses represent the specific rotation angle and phase according to Eq. (3 -5) in the main text. Also, dynamical decoupling (DD) during conditional evolution, (Uzz(π/2)) (indicated by a green box) is implemented using RF pulses (indicated in yellow). Ten sets of 14 pulses each (UR14) are applied [30] during the evolution time τ = 4.24 ms with a J-coupling between the two ions of 2π × 59 Hz. The diffusion step is repeated k times according to Eq. (11) in Methods. Laser light near 369 nm is used for cooling and to initialize the ions in the qubit state |0 ≡ | 2 S 1/2 , F = 0 . At the end of the coherent manipulation, laser light is used again for state selective detection and also for Doppler cooling.
optimal k( ), it is expected that the probability of obtaining a flagged action is close to 100%. However, the success probability in our experiment lies between 66% (for k = 7) and 88% (for k = 1). In what follows, we discuss several reasons for this.

Scaling error
Even in an ideal scenario without noise or experimental imperfections the success probabilityε, as defined in Eq.(4) of the main text, after k diffusion steps is usually not equal to unity, and depends on the specific value of . This behavior originates from the step-wise increase of the number of diffusion steps k = round( π 4 √ − 1 2 ) in the algorithm. The success probability is hence only 100% if k is an integer without rounding. The change of the ideal success probability with deviations of from such specific values is largest for small numbers of diffusion steps (e.g., k = 1) and can drop down to 82% (neglecting the cases where it is not advantageous to use a quantum algorithm at all). For larger numbers of diffusion steps, the exact value of does not play an important role any more for the ideal success probability provided that the correct number of diffusion steps is chosen. For example, for k = 6, the ideal success probability is larger than 98% independently of the exact value of . Throughout this paper, we have chosen in such a way, that ( π 4 √ − 1 2 ) is always close to an integer (see Tab. I), such that the deviation from a 100% success probability due to the theoretically chosen is negligible compared to other error sources.
However, in a real experiment, the initial state, and therefore , can only be prepared with a certain accuracy. This can lead to an inaccurate estimation of the optimal number of diffusion steps. As opposed to the ideal case, an assumed accuracy of ± 1% for the preparation only has a small effect on the success probability˜ (drop of less than 5%) for 0.01, corresponding to k ≤ 3. However, when ε does not fulfil the aforementioned condition and approaches ≈ 0.01 from above, corresponding to k = 6, then the success probability drops down to˜ = 70% due to a non-optimal choice of k.
The preparation accuracy depends on the detuning ∆ω of the RF pulses for single-qubit rotations as well as on the uncertainty ∆Ω in the determination of the Rabi frequency Ω. The calibration of our experiment revealed ∆ω/Ω < 0.05 and ∆Ω/Ω = 0.0015 leading to an error in of ±2.5·10 −3 and a decrease of the success probability˜ of less than 0.04. The detuning ∆ω and the uncertainty  of the Rabi frequency ∆Ω not only influence the state preparation at the beginning of the quantum algorithm, but also its fidelity, as is detailed in the next paragraph.
To prevent decoherence during conditional evolution, we use 140 MW π-pulses per diffusion step and ion. Therefore, already a small detuning influences the fidelity of the algorithm. Consequently, the error induced by de-tuning is identified as the main error source leading, for example, toε ≈ 0.70 for k = 6 and ∆ω/Ω = 0.03. This error is much larger than the error caused by dephasing (that is still present after DD is applied), or the detection error. To estimate the error by dephasing, we assume an exponential decay with γτ ≈ 1/14 for a single diffusion step of duration τ ≈ 4 ms which would lead toε ≈ 0.90 for k = 6. Here, γ indicates the experimentally diagnosed rate of dephasing, and τ is the time of coherent evolution. The influence of the detuning on the cost of our algorithm is shown in Fig. 6 for different detunings. Here, we simulated the complete quantum algorithm including the experimentally determined dephasing and detection errors for ∆ω/Ω ∈ {0.01, 0.03, 0.05}. The experimental data is consistent with an average relative detuning of ∆ω/Ω = 0.03. Note that the detuning not only influences the single-qubit rotations that are an integral part of the quantum algorithm, but also leads to errors during the conditional evolution when dynamical decoupling pulses are applied.

Ratio error
In the ideal algorithm, the output ratio r f = b 00 /b 01 of the two flagged actions represented by the states |00 and |01 at the end of the algorithm equals the input ratio r i . However, in the experiment we have observed deviations from r f /r i = 1. During the measurements for the investigation of the scaling behavior (Fig. 3 in the main text), we fixed r i = 1. The observed output ratios are varying by 0.98 ≤ r f /r i ≤ 1.33. That is, the probability b 00 to obtain the state |00 is increased w.r.t. b 01 . Also during the measurement testing the output ratio, we observe that the output ratios are larger than the input ratios.  An asymmetric detection error could be the cause for this observation. Typical errors in our experiment are given by the probability to detect a bright ion (|1 ) with a probability of d B = 0.06 as dark, and a dark ion (|0 ) with a probability of d D = 0.02 as bright. In Fig. 7 we compare the measured output ratios with the calculated output ratios assuming the above mentioned detection errors only. Fig. 7 shows that the experimental data, both for one step and for three steps, are well approximated by the simulation when the experimentally determined detection error is taken into account. Thus, the deviation of the measured ratios from the ideal result can be traced back mainly to the unbalanced detection error. In addition, also errors in the preparation of the input states play a role, especially when preparing very large or very small ratios leading to either a 00 or a 01 being close the the preparation accuracy of ≤ 2.5 · 10 −3 . At the same time, the detuning plays a less prominent role in these measurements because fewer dynamical decoupling pulses where required due to the small number of diffusion steps. Moreover, the detuning during these measurements could be kept below ∆ω/Ω = 0.03 leading to an average success probability of˜ = 0.85% also for k = 3 diffusion steps compared to˜ = 0.77% for k = 3 during the measurements investigating the scaling (see Tab. I).