Randomized benchmarking of single and multi-qubit control in liquid-state NMR quantum information processing

Being able to quantify the level of coherent control in a proposed device implementing a quantum information processor (QIP) is an important task for both comparing different devices and assessing a device's prospects with regards to achieving fault-tolerant quantum control. We implement in a liquid-state nuclear magnetic resonance QIP the randomized benchmarking protocol presented by Knill et al (PRA 77: 012307 (2008)). We report an error per randomized $\frac{\pi}{2}$ pulse of $1.3 \pm 0.1 \times 10^{-4}$ with a single qubit QIP and show an experimentally relevant error model where the randomized benchmarking gives a signature fidelity decay which is not possible to interpret as a single error per gate. We explore and experimentally investigate multi-qubit extensions of this protocol and report an average error rate for one and two qubit gates of $4.7 \pm 0.3 \times 10^{-3}$ for a three qubit QIP. We estimate that these error rates are still not decoherence limited and thus can be improved with modifications to the control hardware and software.


I. INTRODUCTION
Quantum information processing devices have the potential to revolutionize our understanding of computational complexity and solve certain problems exponentially faster than current classical algorithms. In order to achieve these goals the ability to coherently control a large number of two level quantum systems (qubits) will have to be demonstrated. An important issue in this research path is to be able to quantify the level of control demonstrated. A clear, systematic and standardized algorithm is needed to be able to report the relevant level of control achieved in a given system. Such a protocol would be useful in a number of ways: it should provide a fair and transparent way to compare different devices and technologies; it should provide a way to quantify engineering improvements to the same device and it should provide a rough measure of the device's prospects with regards to fault-tolerant computation [1].
Full characterization of any quantum process, and hence calculation of the fidelity of control, is possible through a procedure known as quantum process tomography (QPT) [2]. However, there are a number of caveats with this approach. It is difficult to analyse and to reconstruct a completely positive map from the results when there are errors in the preparation and readout steps and/or there is noise in the measurments [3]. Indeed to quantify the error in a certain gate with QPT, readout and preparation pulses with a lower error level than the gate being measured are required. QPT gives full characterization of a particular quantum gate in a particular setting. Although this is useful information, it does not necessarily tell us how another gate will perform, or even * These authors contributed equally to this work. how the same gate will perform as part of a larger computation. Finally, full QPT requires an exponential number of experiments, making it experimentally prohibitive for QIP's larger than a few qubits.
Ultimately, full knowledge of a quantum operation is often not needed to provide an answer to the above problems. Randomization has been proposed as a useful technique in revealing a smaller number of relevant coarsegrained parameters of the channel [4]. By twirling a channel with random, Haar distributed, unitaries the channel is reduced to a depolarizing channel with a single parameter to describe the strength of the noise and thus the average gate fidelity. This approach benefits from the concentration of measure in large Hilbert spaces whereby the average fidelity can be estimated with only a few experiments [5]. This technique can be generalized to a sequence of random unitaries and a fidelity decay is measured as function of increasing number of gates. The rate of fidelity decay can then be measured and related to the average gate fidelity.
Generating fully Haar-random unitaries for this protocol is inefficient as it requires an exponential number of continuous parameters and thus an exponential amount of elementary gates to describe and create and Haar-random unitary gate. Fortunately, previous work has shown that the Clifford group is a unitary 2-design, meaning it is sufficient to sample from the n qubit Clifford group to depolarize a n qubit channel and to estimate its average fidelity [6,7]. Efficient methods exist for generating random Clifford gates from elementary 1 and 2 qubit gates [6,8] and it is even possible to reduce the number of gates required by using pseudo-random Clifford gates from either a prescribed algorithm [7], or simply multiplying together randomly chosen 1 and 2 qubit gates [9]. Randomized benchmarking of single qubit Clifford group gates was formalized in a protocol presented by Knill et al. [10], where the fidelity decay under a se-quence of random Clifford group operations is measured and the average gate fidelity can then be calculated.
Liquid state NMR offers a clean system with high fidelity control built on decades of engineering experience in NMR spectroscopy. Utilizing this control, liquid-state NMR QIP's have established many demonstrations of quantum algorithms and simulations [11,12] and are an ideal testbed for exploring ideas about quantum control for quantum information processing purposes [13]. Here we present results of applying these randomized benchmarking protocols to both single and multiple qubit gates in a liquid state NMR QIP. In these experiments we are able to quantify the control achieved by both standard pulse techniques on a single qubit and more advanced pulse shaping approaches from optimal control theory in the multi-qubit setting. While our single qubit experiments followed Ref. [10], there are potentially many generalizations of the protocol to more than one qubit and we suggest two such protocols. Finally, it is difficult to obtain analytical results in the case of benchmarking pulse dependent errors. Indeed, we find and analyze an experimentally relevant error model where randomized benchmarking fails to reveal a single average error per gate. This serves to highlight the difficulty in devising universal efficient benchmarking protocols.

II. PROTOCOLS
The protocols are a form of a generalized motion reversal applied to efficient gate fidelity estimation [5]. The basic steps are to apply sequences of random unitary gates and then measure the average fidelity decay as a function of the number of gates. With the assumption that the errors are independent of the gate performed and that the gates are chosen uniformly according to the invariant Haar measure, the series of random gates and averaging over different gate sequences will effectively depolarize the noise. That is, the state after a self-inverting sequence of n gates is given by: where |ρ(i) is density matrix ρ(i) after the i'th operation represented as a vector in Liouville space,Û i = U * i ⊗ U i is the superoperator representation of the unitary gate U i andΛ is the noise superoperator [14]. Under the av-eragingΛ ave becomes a depolarizing noise [5], where D is the dimensionality of the system and the depolarizing parameter is related to the original noise operator by Therefore, we expect the average fidelity of the output state with respect to an arbitrary input state after n gates to decay exponentially to a saturation level which depends on the dimension of the Hilbert space: where we have defined Measuring the decay of the average fidelity thus gives us a concrete information about the strength of the noise, without giving the details of the action of the noise. From an error correction and fault-tolerance perspective, the schemes are usually developed regardless of the specifics of the action of the noise and the strength of the noise is the most relevant piece of information. And from the strength of the noise the average gate fidelity can be calculated: Because the gate fidelity corresponds to a second order polynomial in the gate and its complex conjugate (also known as a (2, 2) polynomial), the average gate fidelity over the Haar measure can be evaluated using a unitary 2-design so that the continuous integral over the unitaries can be replaced by a sum over, for example, the finite Clifford group C [6,7], i.e.
Then the sequence of random unitaries becomes a sequence of random Clifford group gates. The use of Clifford gates for benchmarking has a number of justifications. Clifford group operations are of paramount importance in most fault tolerant constructions based on stabilizer codes. The Clifford group operations are the main computational elements and universality is granted via state preparation of so-called "magic states", e.g. states of the form cos π 8 |0 + sin π 8 |1 [15]. The performance of many computational steps can be bootstrapped through the use of higher fidelity Clifford group operations, e.g. several noisy magic states can be purified with ideal Clifford gates to create one magic state with a lower error rate [15]. Morevover, the state's evolution under Clifford group operations can be efficiently tracked classically allowing an efficient construction of a recovery gate and/or prediction of the ideal final state [8].
The protocols are designed to extract the average gate fidelity which under reasonable assumptions about the error model should be the computationally relevant quantity. Algorithms using only Clifford operations, for example many fault-tolerant constructions, can be Pauli randomized at every step (whether it occurs inherently as part of a teleportation [16] or is explicitly put in) and so the quantity measured by randomized benchmarking should be close to the error rate experienced in an algorithm. It is certainly true that with many qubits the Hilbert space is large enough to hide a worst case fidelity of 0 while the average fidelity is very high. And so, it is possible that some very large, highly correlated and specially designed error will be undetected by this benchmarking procedure. However, this would seem to require a contrived unphysical error model. Furthermore, for one and two qubit gates the Hilbert space is too small for the worst case and average fidelity to be significantly different. Finally because it is too difficult to show faulttolerance for an arbitrary distribution of errors, proofs [17] and simulations [18] of fault-tolerance schemes rely on a stochastic distribution of error locations or a depolarizing error model respectively, for which the average fidelity should be the relevant quantity to measure.

A. Single Qubit
In the case of the single qubit benchmarking we followed exactly the implementation of Knill et al [10]. For depolarizing one qubit noise, the single qubit Clifford operations are isomorphic to the 48 operations parametrized as where Q ∈ {X, Y, Z}, P ∈ {1 1, X, Y, Z}, that is, a π pulse (or Pauli operation) followed by a π 2 pulse (or a symplectic operation). The symplectic operations are deemed the "computationally relevant" operations that advance the computation while the Pauli operations serve only to redefine the Pauli frame.
The circuit implemented is shown in Figure 1. To perform an approximate averaging, a series of 192 computational gates was chosen at random and truncated at a series of different lengths. Random Pauli operations were then inserted between each computational gate. The initial state was chosen to be the thermal state in NMR: The identity component is unobservable in NMR and can be considered a large error in the preparation or measurement which is normalized out by the protocol. The state was tracked through the computational gates and the recovery gate R was chosen to return the state to either +Z or −Z with equal probability. The state was then readout with a 90 degree readout pulse and the fidelity measured by comparing the integral of the signal to a reference spectrum. For each truncation, the Pauli operations were randomized 8 times. Each point was further averaged over four different computational gate sequences and the averaged fidelity from the 32 experiments for each truncation was used in the fitting.
One technical point to note is that rotations about the Z axis are implemented through an abstract frame FIG. 1: Quantum circuit implementing single qubit benchmarking. A fiducial state is prepared and a sequence of computational gates G is applied. The recovery gate R is chosen to return system in a known final state. The Pauli gates P interleaved with the computational gates induce a Pauli randomization.
change (changing the phase of subsequent pulses and potentially the observation) and take no time. However, for consistency, a delay equivalent to the π 2 or π pulse time was executed for those gates. This is the procedure followed in [10]. However, performing the Z rotation in this manner (as opposed to physically implementing the gate) is not as effective at depolarizing the noise because in commuting the Z rotation through the pulse sequence it is also assumed the Z rotation commutes with the noise operation. In situations where the noise is dominated by dephasing this may be appropriate but for a general case this is not true.

B. Multiple Qubits
In the case of more than one qubit, it is difficult to prescribe the correct gate set for determining an error per gate. The gates should depolarize the noise but at the same time the error per gate should be meaningful in relation to the fault-tolerant thresholds. It would be ideal to quantify the error per gate for one and two qubit gates and also storage errors for wait steps. However, it is difficult to isolate the errors for only these gates if the error model does not satisfy the independent error model -that each gates errors are described by a quantum operations acting only on qubits which the gate affects. In realistic situations it is most likely that applying a gate to qubit a could induce an error on qubit b.
One possibility is to choose a generating gate set consisting of single qubit Clifford generators (say the Hadamard and phase gates) and controlled NOT's between pairs of qubits. This will generate the multi-qubit Clifford group and indeed after only a small number of gates will approximate a 2-design necessary for depolarizing the noise [9]. The multi-qubit protocol then becomes: 1. Choose a series of lengths of computational gates to measure the fidelity decay at. The number of random gates necessary to achieve depolarization of the noise depends on the number of qubits and may be large. Thus we expect only the asymptotic error rate to be meaningful.
2. For each truncation length choose n g random sequences of computational gates from the generating set of the full n-qubit Clifford group.
3. Determine a recovery sequence which will return the state to one with a known definite output upon measurement in the absence of error. This can either undo the entire sequence to return to the input state or ensure that one stabilizer has a certain measurement outcome as suggested in Ref. [10]. Because the Clifford group operations can be efficiently tracked this is possible to do efficiently on a classical computer and should have no more than O(n 2 / log n) gates [8].
4. Apply some parallelization routine to the random sequence of gates to ensure that the number of wait steps does not grow with the size of the computer. This parallelization step allows a fair comparison between different size QIP, say a 5 and a 50 qubit computer. The error per time step may be larger in the 50 qubit computer but many more gates are possible in each timestep.
5. Measure the fidelity decay as in the single qubit case. An exponential fit to the fidelity decay will reveal the average error per one and two qubit gate. It is possible that the average error could mask a distribution of error rates such that for example all single qubit gates are perfect but the two qubit gates are much worse. However, more detailed, but still coarse grained information is available by doing more experiments (see Sec. IV).
Numerical simulations have confirmed that this protocol will return the correct asymptotic ( beyond ≈ 30 gates for the 3 qubit case) error rate for a variety of error models such as dephasing and pulse dependent unitary errors. For the later, it should be mentioned that we made the assumption that the errors were of the same strength, hence numerically verifying the conjecture in Ref. [5] for this case. Not surprisingly, larger amounts of randomization are required compared with the single qubit protocol.

III. EXPERIMENT
The experiments were performed in liquid state NMR on a 700MHz Bruker Avance spectrometer using a TCI cryogenic probe. The cryo-probe provides enhanced sensitivity and associated improved signal-to-noise ratio but the high quality factor of the probe resonant circuit leads to phase-transient and radiation damping effects.

A. Single Qubit
The proton spins of unlabeled chloroform were chosen as the single qubits. A sample was made from a 0.3% aqueous solution of unlabeled chloroform dissolved in d6acetone. The sample was not vaccuum-pumped to avoid unnecessarily long T 1 relaxation times. The T 1 was measured to be 7 seconds through inversion recovery and the T 2 to be 4.5 seconds using a standard CPMG sequence.
The unrefocussed T * 2 was 0.45 seconds calculated from the spectral linewidth of the NMR signal.
To address the amplitude and phase transient issues with the high Q cryoprobe, 24µs gaussian shaped π 2 pulses were used, which avoid these unwanted effects due to their more slowly varying amplitude profile. Since the largest part of the errors are expected to be due to pulse miscalibration, amplifier drift and r.f. inhomogeneity, composite pulses, robust to r.f. field variation were also tested. The BB1 family of pulses from Wimperis et al. [19] are robust to pulse length (calibration) errors ǫ up to order ǫ 6 and are universally compensating in that they are robust unitary operations rather than robust for a particular state to state transformation. Their usefulness in experimental QIP has been previously reported [20]. The pulses consist of a compensating block followed by the desired pulse so that a rotation by an angle θ about the x axis can be replaced by, Where, φ 1 and φ 2 depend on the pulse flip angle: The location of the compensating block is not important and it can be placed before or after the pulse. The pulse can even be symmeterized by placing the compensating block between two halves of the pulse [21].
The results of the single qubit benchmarking with BB1 composite pulses are shown in Figure 2. It is clear that the pulse fidelity is low and furthermore that the curve does not fit a single exponential decay well. However, these results can be explained by the r.f. field strength variation across the sample. This r.f. inhomogeniety is particularly bad in cryogenic probes [22]. Indeed, by measuring the r.f. inhomogenity profile and simulating the experiment across that variation we were able to reproduce both quantitatively and qualitatively the results showing we understand well the error model. The result can be interpreted intuitively in that we expect spins which see an r.f. field very different to the ideal field to very quickly end up at some random point on the Bloch sphere whereas those close to the ideal field strength will closely track the ideal evolution for many gates. Thus we expect the fidelity to initially decay quickly (with large fluctuations) as the spins at the edge of the r.f. profile are depolarized and then for the fidelity to level off and decay much more slowly. This intuitive picture is confirmed in a more detailed analysis in Appendix A. It is also interesting to note that with this pulse-dependent coherent error model, it is impossible to average the fidelity decay to a single exponential; it is always a sum of exponentials with different decay rates. This error model is not restricted to ensemble effects but would also apply in the case were a parameter (say a laser power in an ion trap) slowly varies so that it is constant for the time of one experiment but fluctuates from experiment to experiment.  2: (Color online) Experimental ( ) fidelity as a function of number of randomized gates for a single qubit using BB1 composite pulses plotted on a semi-log plot. The fidelity decay is clearly non-exponential indicating incoherent pulse dependent errors [23]. This effect is caused by the large distribution of r.f. field strengths across the sample shown in the inset. Also shown are the results from simulations of the pulse sequence (▽) averaged over the measured r.f. profile. The simulations match the experimental results both qualitatively and quantitatively.
In NMR, the issues arising from r.f. inhomogeneity can be largely eliminated by running a r.f. selection sequence. This is a sequence of pulses and gradients that leaves polarization on only a subset of the ensemble of processors that experience an r.f. field within a certain range, say ±2% of the ideal field strength [24]. For calibration purposes and again to avoid the sharp transitions of hard pulses we developed a numerically optimized control pulse which implemented the r.f. selection. The pulse was designed to rotate spins outside the ±2% range of desired powers to the x − y plane while leaving the calibrated spins along the z-axis. The unwanted spins were then dephased using magnetic field gradient techniques. This dramatically improves the results and gives a single exponential decay which we fit to give an error per randomized computation gate of 1.3 ± 0.1 × 10 −4 (see Figure 3). A drawback of the r.f. selection sequence is that small fluctuations in the pulse power from the amplifier or changes in the resonant circuit give large (up to 5%) changes in the output signal. These were normalized through a stroboscopic observation of the signal after r.f. selection for each experiment.
An estimate of the expected error rate due to intrinsic decoherence can be made from the measured T 1 and T 2 values. The combined time for a randomized computational gate using BB1 composite pulses is 516.8µs (including delays between pulses to avoid overheating). A map consisting of purely T 1 and T 2 decoherence acting for this time would imply an error per randomized gate of 5 × 10 −5 . This represents a lower bound on the expected error rate which we should be able to reach with hardware and software improvements. If the T * 2 rather than the T 2 is used in the decoherence model, the estimated error per gate climbs to 4×10 −4 . The randomized gate sequence will somewhat refocus the static field inhomogenities contributing to T * 2 , but they are not explicitly refocussed. The remaining impediments of incoherence across the ensemble members and the fluctuations in power from the amplifier could be overcome with even more robust and compensated pulses, although there is a tradeoff between more highly compensated pulses and the increased losses due to instrinsic decoherence because of the longer pulse times. The error bars (68% confidence) indicate the uncertainty from randomization (i.e. different computational sequences and Pauli randomizations give different fidelities due to coherent or biased errors). The uncertainty in each measurement due to signal to noise and fluctuations in the amount of signal from the r.f. selection sequence is less than 0.5%. The fidelity decay is a good fit to a single exponential shown in red (dashed line) with 68% confidence fits and reveals an error per gate of For comparison purposes, we also tested other pulse types with the same protocol. Using only simple uncompensated gaussian pulses we obtain an error rate of 2.1 ± 0.1 × 10 −4 and using GRAPE numerically optimized pulses [25], an error rate of 1.8 ± 0.2 × 10 −4 . The GRAPE pulses were numerically optimized to 99.999% fidelity (Hilbert-Schmidt (HS) norm) over a range of r.f. powers ±3% from the ideal power. They were 100µs in length and discretized at 1µs. It is somewhat surprising that the numerically optimized pulses cannot match the performance of the BB1 pulses. However, the BB1 pulses are well suited to compensating for systematic deviations from the ideal pulse shape which manifest themselves as calibration errors. Numerically optimized pulses are somewhat robust to noise in the pulse generation: because the controls are at a local maximum of fidelity, any deviation gives no change in the fidelity to first order. However, numerically optimized pulses are still more sensitive to other imperfections in the implementation. For example, the optimization and robustness assumes the control fields are constant at each time step in the discritized pulse. In the experiment, finite bandwidth effects and noise prevent exact implementation of this and lead to a loss of fidelity.

B. Multiple Qubits
A three qubit molecule was made from a sample of selectively labelled 13 C tris(trimethylsilyl)silane-acetylene dissolved in deuterated chloroform [23]. The structure and a table of the natural Hamiltonian parameters is shown in Fig. 4. obtained from spectral fitting. The diagonal elements give the chemical shifts with respect to the transmitter frequencies while the off-diagonal elements give the J-couplings. T1's and T2's (seconds) are measured from standard inversion recovery and CPMG echo sequences respectively. C1 and C2 were isotopically labelled with 13 C and rest of the molecule contained natural abundances and was ignored for the purposes of this experiment.
Control was achieved through the GRAPE optimal control technique [25]. The pulses were optimized to above 99.95% HS fidelity over a range of r.f. powers ±3% from the ideal power. The pulses were discretized at 2 µs as a balance between smoothness and spectrometer memory constraints. Single qubit pulses were 1.2ms long; CNOT gates between H and C 1 (with any single qubit gate on C 2 ) were 2.4ms; and CNOT gates between C 1 and C 2 (with any single qubit gate on H) were 4ms. These pulses are not time-optimal but have low enough powers for experimental implementation. Shorter pulses tended to require unfeasible high power levels which lead to probe heating during long computational sequences. Non-linearities in the pulse generation and transient effects from the probe's resonant circuit lead to distortions in the implementation of shaped pulses. To avoid this, the r.f. field at the sample was detected through a pickup coil and corrected through a simple feedback loop. This correction procedure was only applied to individual pulses and the longer term power inverse droop we observed [30] was not corrected but should instead be handled by engineering robust pulses. Due to finite spectrometer memory we were limited to 120 gates in a computational sequence. Each truncation was averaged over 48 different computational gate sequences. The same numerically optimized r.f. selection sequence used in the single qubit experiment was applied before each experiment to the proton nuclei. Polarization on the carbon nuclei was dephased with gradient techniques giving the starting deviation density matrix ZII (using product operator notation).
A sequence of random gates was constructed in the following manner. The Clifford group generating set was chosen to be the Hadamard and P HP † (a Hadamard gate conjugated by a phase gate) single qubit gates and CNOT gates between nearest neighbors. With a probability of 2/3, a random single qubit gate was performed and with probability 1/3, a random CNOT was implemented [8]. The resulting state was then tracked and a recovery sequence to return the state to ±ZII calculated. To design the recovery sequence, Hadamard or PHP † gates were applied to each qubit such that their individual state was either I or Z. This state was then transformed into the final ZII by finding minimal amount of CNOT gates needed to transfer all the polarization back to the first qubit. The algorithm is general and efficient in the number of qubits. These final recovery gates were not counted in the total number gates and will not affect the asymptotic error rate. The entire sequence was then parallelized with a simplistic interative scheme of repeatedly checking whether gates in series could be compressed into a single parallel gate. For example, a CNOT gate between qubits 2 and 3 followed by a Hadamard gate on qubit 1 would be compressed to a single timestep which implements both gates in parallel. The fidelity of the state was then measured through a readout pulse on the proton spin.
The results are shown in Figure 5. The results fit an exponential decay well and give an error per gate of 4.7 ± 0.3 × 10 −3 , approximately an order of magnitude larger than the single qubit results. Again, an estimate of the lower bound on the error rate can be obtained from the measured T 1 's and T 2 's. Assuming an independent and uncorrelated error model (which is unlikely but does not significantly affect the result) gives an average error per gate of 1.5 × 10 −3 . Moreover, from the design of the pulses, we would expect an error of 4.4 × 10 −4 , which is an order of magnitude smaller than the experimentally measured error rate. This leads us to suspect that there are still errors in the implementation of the pulses and/or knowledge of the chemical properties of the molecule that are not currently handled by our pulse design.

IV. EXTENSIONS TO THE MULTI-QUBIT PROTOCOLS
More detailed information about the errors can be obtained by combining the ideas of previous randomization protocols [26,27] with the randomized computational sequences. For example, one may wish to determine on which qubits the errors are occurring or the difference in error rate between one and two qubit gates. The steps of the proposed protocol are as follows: 1. Perform the single qubit benchmarking procedure on each qubit individually. These numbers will give an estimate of the error per gate for single qubit gates. Unfortunately, as discussed above, the error model is unlikely to follow an independent error model and the possibility that performing the single qubit gates induces errors on non-target gates needs to be checked. This can be achieved by measuring the fidelity of the identity operation on the other n−1 qubits. Efficient procedures exist for this measurement. For example, performing single qubit Clifford gates at the beginning and their inverses at the end of the sequence and then randomizing allows an estimation of the fidelity of the channel with a small number of experiments [26]. A possible concern is that the error model on either the single qubit or the remaining n − 1 qubits might be highly non-Markovian. However, the benchmarking procedure should effectively act as a randomized dynamical decoupling sequence preventing entanglement between the two sub-systems [28].
2. Two qubit benchmarking, using the procedure described above can then be performed on all pairs of qubits. Knowing the single qubit error rates from the first step it should be possible to extract an estimate of the two-qubit error rate. The action on the other n − 2 qubits should be characterized as in the first step to asses the fidelity of the wait steps.
3. This procedure can be iterated to all groups of 3 qubits and so on but because most fault-tolerant constructions are specified in terms of one and two qubits gates, going as far as all pairs should be sufficient.

V. CONCLUSIONS
We have demonstrated implementations of single and multi-qubit benchmarking in liquid state NMR. In both instances the control is still not decoherence limited and improvements through both hardware and software should be possible. Potential software improvements include pulse more robust to calibration errors and noise in the pulse generation and better modeling of the system and apparatus. Efforts in hardware improvements will be focussed on ensuring the implementation of the optimal control pulses is as close as possible to the ideal optimal control pulse.
Simulations and proofs of fault-tolerant constructions suggest that given certain architecture assumptions, an error rate of 10 −4 is sufficiently low to enable arbitrarily long quantum computations. Here, the single qubit experiments demonstrated close to that level of control. However, showing fault-tolerant levels of control on small demonstration systems does not imply a scalable quantum computer is possible. Indeed the benchmarking of the three qubit system yielded an error per gate an order of magnitude worse than the single qubit system. It is important to investigate how the level of control scales with the system size and how compatible the architecture is with the assumptions of the fault-tolerant construction before concluding anything about the fault-tolerance capabilities of a given system. Multi-qubit benchmarking protocols will be an important part of that investigation. (A6) Therefore, the effective averaged channel action is given byΛ ave (ρ) =pρ + (1 −p) The gate fidelity obtained by numerically integrating Eq. A6 using the measured r.f. distribution is compared to numerical simulations of the experimental sequences under the measured r.f. distribution in Fig. 6, which FIG. 6: Numerical simulation and analytical prediction of the fidelity decay of the randomized benchmarking protocol under a r.f. inhomogeneity error model plotted on a semi-log plot. The agreement of the two curves demonstrate the error model is well understood. The small discrepancy is due to the finite number of runs in the numerical simulations. This curve decays faster than that in Figure 2 because this analysis uses simple pulses whereas the experiment used robust composite pulses.
clearly demonstrate the non-exponential behavior of the decay.