Accelerated Randomized Benchmarking

Quantum information processing offers promising advances for a wide range of fields and applications, provided that we can efficiently assess the performance of the control applied in candidate systems. That is, we must be able to determine whether we have implemented a desired gate, and refine accordingly. Randomized benchmarking reduces the difficulty of this task by exploiting symmetries in quantum operations. Here, we bound the resources required for benchmarking and show that, with prior information, we can achieve several orders of magnitude better accuracy than in traditional approaches to benchmarking. Moreover, by building on state-of-the-art classical algorithms, we reach these accuracies with near-optimal resources. Our approach requires an order of magnitude less data to achieve the same accuracies and to provide online estimates of the errors in the reported fidelities. We also show that our approach is useful for physical devices by comparing to simulations. Our results thus enable the application of randomized benchmarking in new regimes, and dramatically reduce the experimental effort required to assess control fidelities in quantum systems. Finally, our work is based on open-source scientific libraries, and can readily be applied in systems of interest.

Quantum information processing devices offer great promise in a variety of different fields, including chemistry and material science, data analysis and machine learning [1][2][3][4], as well as cryptography [5]. Over the past few years, proposals have been advanced for quantum information processing past the classical scale, based on node-based architectures [6,7]. In addition, rapid progress has been made towards experimental implementations that might allow for developing such devices [8,9]. An impediment in this effort, however, is presented by the difficulty of calibrating and diagnosing quantum devices.
In particular, in the development of quantum information processing, an important experimental challenge is to efficiently characterize the quality with which we can control a quantum system. By characterizing the quality of a quantum gate that is implemented by a control pulse, we can then reason about the utility of that gate for quantum information processing tasks. For instance, we can estimate the feasibility of and the resources required to implement error correction using that control by comparing to proven and numerically estimated fault-tolerance thresholds [10,11], or can adjust our control sequences to account for differences between our control model and the actual system. * cgranade@cgranade.com; http://cgranade.com/; Literate source code for this work is available at https: //github.com/cgranade/accelerated-randomized-benchmarking. To view the code online, visit http://nbviewer.ipython.org/ github/cgranade/accelerated-randomized-benchmarking/blob/ master/src/model_testing.ipynb.
In cases where only the quality of a quantum gate or set of gates is required, randomized benchmarking has proven to be a useful means of extracting this information with relatively little experimental effort [12], has been demonstrated in a variety of experimental settings [9,[13][14][15][16][17][18][19]. Randomized benchmarking has also been used to improve gate fidelities by characterizing crosstalk [20] or distortions [21]. Extracting fidelity information can often be useful in diagnosing performance and problems with a device in lieu of full characterization [22]. Moreover, randomized benchmarking has also been used to extract information about the completely positive and unital parts of linear maps [23].
Here, using near-optimal data processing together with prior information, we accelerate the data processing used in benchmarking experiments, such that to achieve the accuracy demanded of benchmarking protocols, we require orders of magnitude less experimental data. We also extend results on the achievable estimation quality in the presence of finite sampling [24] and prior information, then show that our accelerated methods are nearly optimal. Our data processing methods also provide estimates of their own performance, such that our approach thus enables randomized benchmarking to be used where data collection costs make existing benchmarking protocols impractical. Thus, our work complements recent results on the robustness of randomized benchmarking [25] to provide an experimentally useful tool.
Randomized benchmarking has been recently used to adaptively calibrate control designed by optimal control theory methods such as GRAPE [26], allowing for differences between the control model and the actual system to be adjusted for in experimental practice [27]. These methods are applied in a control design and calibration step, however, and do not allow for control for to be recalibrated dynamically. Whereas randomized benchmarking is performed at the inner-loop of current control calibration algorithms [22], any data collection overhead in benchmarking becomes a very significant cost to control calibration as a whole. Thus, by reducing the data requirements using both better fitting methods and strong prior information, we can enable new applications, such as extending control calibration to an online context.
Here, we show that by using prior information together with the sequential Monte Carlo (SMC) parameter estimation algorithm, we can obtain very accurate estimates even in the limit of one bit of data per sequence length, using instead a variety of sequence lengths to probe the performance of our gate set. We also show that for gates with fidelities near unity, increasing the length of benchmarking sequences offers little compared to repeating experiments at already optimal sequence lengths. The SMC algorithm is based on Bayesian methods, which have been used successfully in a variety of quantum information processing tasks [28][29][30][31][32][33][34]. SMC has recently been used in quantum information to learn states [30] and Hamiltonians [35,36], and to provide robust error bounds on inferred parameters [37]. The primary cost incurred by the SMC algorithm is that the data must be simulated repeatedly; though this can be mitigated by using quantum resources [38][39][40]. Here we show that since the symmetries afforded by random benchmarking experiments can be used to simulate datasets with costs that are constant with respect to the dimension of the Hilbert space of interest [12], SMC can be implemented with little overhead. Thus, randomized benchmarking mitigates the primary disadvantage of SMC by removing the need to simulate the quantum system. Moreover, the method of hyperparameters [35] generalizes our approach to allow gate fidelities to be nontrivial functions of some other parameter of interest, such that the underlying parameter is learned directly. This approach is especially relevant if, for example, the effect of the unknown hyperparameter depends on an experimental choice, such that distinct benchmarking experiments can be used in concert in a straightforward way.
Our work proceeds first by defining the benchmarking model that we use, then showing bounds on the estimation of the parameters of this model using the Cramer-Rao bound. We then apply sequential Monte Carlo to the benchmarking model and compare to the performance of traditional methods, and to the optimal performance achievable with prior information, showing that our method offers distinct advantages, and is nearly optimal.

I. INTERPRETATION OF LIKELIHOOD AS MARGINALIZATION
Magesan et al [12] showed that the average fidelity F g taken over all randomized benchmarking sequences of a given length can be expressed in terms of the survival probability where E ψ is a measurement effect corresponding to a fiducial state ρ ψ , and whereŜ i m =Ŝ i m • · · · •Ŝ i 1 is the superoperator representing the sequence i m . In particular, the expectation value of this survival probability over all sequences of a given length m was shown to produce the uniform-average fidelity (2) We may thus interpret the fidelity averaged over a unitary design as a probability of survival in an experiment in which we do not know the sequence being performed. As discussed in detail in Appendix A, if sequences are fairly drawn from the 2-design independently of all other experimental choices, then this is a valid assumption, such that the marginalized survival probability can be taken as the likelihood for our randomized benchmarking model. Note that in the remainder of the paper, we will let ψ be fixed, and will drop the notation conditioning on this assumption.
Using the expansion of the marginalized survival F g (m) given by Magesan et al [12], we can rewrite the likelihood in a way that explicitly depends on the parameters of interest, and that no longer requires simulating the quantum dynamics of the system. Thus, we can use Bayesian methods without simulating the system under study. In particular, we consider the zerothorder model for parameters A 0 , B 0 and p given by and where F ave is the fidelity of the average channel Λ = E i,j [Λ i,j ], taken over time steps i and elements of the gate set j. By these definitions, for ideal preparation, evolution and measurement, A 0 = 1 − 1 d and B 0 = 1 d . Since we will often use the example of a qubit, we thus have that the ideal A 0 = B 0 = 1/2. A sketch of the derivation of this model is given in Figure 1. The interpretation of first-and higher-order models follows in a similar manner. Since we use the zeroth-order model as an example in this work, we will drop the subscript-0 for brevity.
Because the fidelity of a channel is invariant under Clifford twirling, the parameter p represents the strength of the depolarizing channel of fidelity F ave produced by twirling the average channel Λ, and can be used to recover F ave . Similarly, in the interleaved protocol [41], we consider two probabilities, p ref and pC , respectively representing the sequences with m random Clifford gates multiplied together, or interleaved with some gate C under study. From these probabilities, we can extract the referenced probability of gate error p := p ref /pC . Each of p ref and pC is traditionally extracted from a fit to the zeroth-or first-order model [54].

II. ACHIEVABLE ACCURACY
We now consider only the interleaved model since it is more general. For brevity, we represent the model by a vector x = (p, p ref , A, B), so that the likelihood function for the interleaved model is where we have labeled the survival event by "1" to more easily allow for using binomial distributions to consider sums over multiple measurements of the same sequence length.
Having defined our model, it is critical to account for the accuracy with which we can estimate the parameters using finite data records. Here, we extend the results of Epstein et al [24] by explicitly calculating the Fisher information of Pr(1|x). We can find a bound on the achievable estimation error in this model by appealing to the Cramér-Rao Bound [42], which states that the Fisher information matrix I(x) bounds the error matrix E(x) of any unbiased estimatorx by the inequality If the Fisher information matrix is singular, as is the case here when all of the measured sequences are of the same length, the inverse is taken to be the Moore-Penrose pseudo-inverse. With at least four different sequence lengths, however, we can break the degeneracy. Since this number depends on the dimension of the model and not the underlying Hilbert space, only four measurements are required to break the degeneracy, even for systems of higher dimension than qubits.
It is often the case that we are only interested inp, the survival probability, and hence the gate fidelity, of a particular gate [13]. In this case, we can bound the error of only that parameter by looking at a single element of the error and Fisher information matrix as To find the Fisher information for randomized benchmarking, we derive the Fisher score q of this model, conditioned on 1, where the similar expression for the outcome "0" follows immediately. With this, we can calculate the mance of the continuous infinity of possible estimators we could choose. However, given the difficulty of analytically computing the inverse of sums over Fisher information matrices of this form, we use numerical methods for its evaluation. In particular, QInfer [43] performs this calculation automatically, given an implementation of (8).
In experimentally relevant regimes, the task is to gain further accuracy when it is known a priori that the fidelity is high. To minimize the error in estimatingp, we maximize the corresponding element of the Fisher information matrix. Note that, as is shown in Figure 2, this optimum depends strongly on the value of A and B whenp, p ref ≈ 1. For ideal measurements and unital channels, as d → ∞ we have B → 0 and A → 1 such that for large systems, As shown in Figure 2, this can grow large for |1 − F| → 0, but even for fidelities near thresholds, such as |1 − F| ≈ 10 −3 as considered by [9], m opt remains manageable at about 500. The above calculation is relevant in scenarios where the parameters not of interest (that is, A and B) are known fairly well and the gate fidelity is already known to be near unity. If we have prior information that is not of this form, Bayesian analysis is better suited to the task.
The Bayesian analogue of Fisher information analysis is a straightforward generalization. We begin with a distribution π(x), called a prior, over the parameters. Ideally, this is a faithful encoding of the the experimenter's prior information, but the following analysis works equally well for any distribution. In particular, given a prior distribution π(x), the Bayesian information matrix J is then defined as [44] To calculate this we can perform a Monte Carlo integral over the prior by drawing samples x ∼ π and evaluating I at each x. The Bayesian Cramer-Rao Bound (BCRB) then states that the error matrix E : The calculation of the BCRB is naturally included into the sequential Monte Carlo algorithm, such that our approach bounds its own performance based on the best experimental data available. Moreover, contrary to the Cramer-Rao bound in Eq. (6), it is known that the mean of the posterior distribution minimizes the error [45].
Thus, we need not seek the optimal estimator, as it naturally arises from a representation of the posterior.

III. NUMERICAL EXAMPLES
In the numerical examples we consider here, we choose π to be a normal distribution with a mean vector (p, p ref , A, B) = (0.95, 0.95, 0.3, 0.5) and equal diagonal covariances given by a deviation of σ = 0.01. The least-squares fit estimator is seeded with an initial guess drawn from this prior, so as to fairly compare the estimators. This distribution is intersected with the hard constraints implied by definitions of the parameters, which defines the support of the prior as This distribution was chosen as the likelihood model is less degenerate given these constraints, such that it is easier to reason about bounds for approximately unimodal estimation strategies. Our choice of prior is not critical, however, as we will show later that our algorithm recovers well from the case in which we choose a "bad" prior.
To demonstrate the Bayesian approach, we compare the standard least squares fit (LSF) performance to the sequential Monte Carlo (SMC) algorithm [35], which computes estimates by updating the probability of each of a finite list of hypotheses according to Bayes' rule. In the case of randomized benchmarking, this consists of computing (5) for each hypothesis after each batch of measurements. We note that the cost of computing (5) is independent of the dimension of the system, such that randomized benchmarking explicitly avoids simulating quantum evolution with classical resources.
There are essentially two experimental design choices an experimenter can make: the length of the sequence m, and the number of repetitions K. In the first comparison, we fix the sequence length and vary K. In particular, we take all sequence lengths up to 100 for the reference signal and 50 for the interleaved signal. For each such K, we plot the mean squared error for the SMC and LSF estimators, along with the posterior variance, which provides an online estimate of the performance of SMC, and the Bayesian Cramer-Rao Bound. The results, shown in Figure 3, demonstrate that SMC can be used to obtain useful estimates ofp with a few orders of magnitude less data than is used by least-squares fitting. Moreover, this advantage becomes more pronounced as the number of shots per sequence length approaches one, such that SMC is especially useful in cases where data collection is expensive. We note that this advantage reflects both the performance of SMC itself, and the ability of SMC to take advantage of prior information: for small amounts of data, the least-squares fit estimator chooses estimates far from the initial guess drawn from the prior distribution, while the SMC estimate instead refines the prior. Moreover, SMC can accurately characterize its own performance and can obtain significantly closer accuracy to the ultimate bound given by the BCRB. These advantages are similar to other cases in which SMC shows a large advantage over traditional fitting methods for handling data that is far from deterministic [35,46].
In Figure 3 (bottom), we show the performance of SMC and LSF when the sequence lengths m vary and the number of shots K per sequence length is fixed, demonstrating that SMC can improve upon LSF especially for very short sequences. Moreover, we see the benefit from increasing the sequence length is minimal compared to repeating experiments at a given sequence length near the optimum length found from the Cramer-Rao bound.

IV. BENCHMARKING WITH SIMULATED GATES
Thus far in the analysis, we have used as a simulator the same zeroth-order model as is used to process and interpret the data. To demonstrate the utility of our approach in comparison with traditional LSFbased benchmarking, we now simulate gates according to a cumulant expansion, with physically realistic models. In particular, we use the superconducting model of [47] together with optimal control theory [26] to generate a set of gates implementing the target unitaries {1, X, Y, Z, H, P}, where H is the Hadamard gate, and where P = |0 0| + i |1 1| is the phase gate. We then use the superoperatorsŜ U for implementing each target unitary U obtained from a cumulant simulation [48,49] to sample from the likelihood function (1) [55].
To process these samples, we then use the zerothorder likelihood function (5) both as a model for sequential Monte Carlo and as a trial function for least-squares fitting. Since the actual implemented gates are known, we can compute the true parameters for comparison. In Table I, we show the true parameters, the result obtained using SMC, and the result obtained using least-squares fitting. The most important thing to note is that correct parameters are a distance 6.90 σ from the prior (meaning the true parameters are outside of the 99.9999998% credible ellipse). This shows that even in the case when the prior information fails to accurately capture the uncertainty in the true model, SMC still does well, providing evidence that our accelerated methods may also be robust, even when used to measure the fidelities of sets of gates with errors that are correlated between distinct gate types, or that include non-trivial unitary components [56]. We show this in more detail in Figure 4, comparing the posterior and prior distributions overp to the true and LSF-estimated values.
Finally, in Figure 5, we demonstrate the advantage of our method in the presence of physical gates to- gether with a more reasonable prior, and using approximately 10-fold less data than in Figure 4. Taken with other evidence of the robustness of SMC methods [39,46], these results thus show that our method is useful and provides advantages in data collection costs in experimentally-reasonable conditions. We also note that LSF provides an accurate estimate of p for the simulations with physical gates, but it appears to be at the expense of providing poor estimates for A and B. Given that the errors inp and those in A and B are not in general uncorrelated, that LSF often provides such poor estimates of A and B makes the estimates of p derived from LSF difficult to trust.
In this work, we have discussed the fundamental limits of the randomized benchmarking technique that are incurred due to small data sets, and have shown an algorithm that reliably saturates this optimum. In doing so, we have shown that by using sequential Monte Carlo, with a moderate tradeoff in computational costs, one can obtain as much as two orders of magnitude improvement in estimation accuracy, such that data collection requirements are similarly reduced by as much as a hundred-fold. Given the wide and expanding use of randomized benchmarking in experimental practice, this then translates to a significant performance benefit both in benchmarking, and in experimental protocols derived from benchmarking.
The second term corresponds to the mean variance over each fixed sequence i m , and governs how well we can estimate each F(m, i) individually. The first term, however, is more interesting, in that it measures the variance over sequences of the per-sequence survival probability p m,i = E d [d|i, m]. By the argument of Wallman and Flammia [25], this is small when the fidelity being estimated is close to 1; that is, when the gates being benchmarked are very good. For gates that are farther from the ideal Clifford operators, however, or for applications such as tomography via benchmarking [23], this term is not negligible, mandating that many different sequences must be taken forF g (m) to be a useful estimate of F g (m).
By demanding that each individual shot be drawn from an independently chosen sequence, our approach avoids this and samples from d|m directly. In this way, we see a similar effect as in [32]. In particular, it is not advantageous to concentrate one's sampling on one point, but to spread samples out and gain experimental variety. Here, the one shot per sequence limit plays the role of the one sample per time-point limit in the earlier discussion.