Introduction

While semiconductor spin qubits have been pioneered with GaAs-based quantum dot devices1,2,3,4,5,6,7, the adoption of isotopically purified silicon to avoid decoherence from nuclear spins has led to coherence times approaching one second8,9 and control fidelities above 99.9%10,11,12, thus meeting the requirements for scalable quantum computing regarding single-qubit performance. These results are generally achieved with resonant microwave control of individual spins via electric or magnetic fields. An additional key requirement is to controllably couple multiple qubits. A natural and widespread approach is to use the exchange interaction between tunnel-coupled electron spins, which can also be used to manipulate qubits encoded in two or more electron spins13,14. Advantages of this approach include a short gate duration and the avoidance of microwaves, which is a considerable simplification regarding power dissipation, complexity of control systems and addressability in the context of scaling to large qubit numbers. Exchange-based two-qubit gates of individual spins as well as two-electron single-qubit gates have reached fidelities up to about 98%15,16 before our work. However, qubit control via the exchange interaction is also associated with certain challenges like the need for strong driving well beyond the rotating wave approximation, nonlinear coupling to control fields, a susceptibility to charge noise that scales with the interaction strength17, and a high sensitivity to the detailed shape of baseband control pulses.

To address these difficulties, we numerically optimized control pulses for exchange-based single-qubit gates with fidelities approaching 99.9% in previous work18. Remaining inaccuracies in these optimized pulses can be removed by a closed-loop gate set calibration protocol (GSC), which allows the iterative tune-up of gates using experimental feedback18,19. In comparison to automated calibration of single-spin12 and superconducting qubits20,21, GSC extracts tomographic information about all unitary degrees of freedom to improve convergence. In addition to recalibration of drifting parameters, this method is suited for tune-up of gates with initial infidelities larger than 10%18. Here, it allows us to optimize roughly an order of magnitude more parameters than before12,20,21 to fully leverage the degrees of freedom provided by our hardware.

With this approach we achieve accurate control of GaAs-based singlet-triplet qubits encoded in two-electron spins with a fidelity of 99.50 ± 0.04% (in contrast to a preliminary preprint22 of the present study we used a device with a more representative charge noise level compared to other groups17). For comparison, state-of-the-art single-spin control in GaAs leads to a fidelity of 96%23. Furthermore, our result is in the range required by certain quantum error correction schemes24,25 and validates corresponding simulations18, which also predict a similar performance for two-qubit control26. A comparable fidelity of 99.6% has also recently been demonstrated for singlet-triplet single-qubit gates in Si27. In addition, we demonstrate a low leakage rate of 0.13% out of the subspace of valid qubit states, an important consideration for any qubit encoded in multiple spins28. A similar leakage rate of 0.17% has been observed in an isotopically purified Si device using a three-spin encoding29. Besides addressing the challenges of exchange-based qubit control, these results also open new perspectives for GaAs-based devices.

Results

Singlet-triplet qubit

The S − T0 spin qubit14 used in this work is illustrated in the middle of the figure below and can be described by the Hamiltonian \(H=\frac{\hslash J(\epsilon )}{2}{\sigma }_{x}+\frac{\hslash \Delta {B}_{z}}{2}{\sigma }_{z}\) in the \((\left|\uparrow \downarrow \right\rangle =\left|0\right\rangle ,\left|\downarrow \uparrow \right\rangle =\left|1\right\rangle )\) basis, where arrows denote electron spin up and down states. J(ϵ) denotes the exchange splitting between the singlet \(\left|{\rm{S}}\right\rangle =(\left|\uparrow \downarrow \right\rangle -\left|\downarrow \uparrow \right\rangle )/\sqrt{2}\) and sz = 0 triplet state \(\left|{{\rm{T}}}_{0}\right\rangle =(\left|\uparrow \downarrow \right\rangle +\left|\downarrow \uparrow \right\rangle )/\sqrt{2}\), while ΔBz is the magnetic field gradient across both dots from different nuclear-spin polarizations2. The remaining triplet states, \(\left|{{\rm{T}}}_{+}\right\rangle =\left|\uparrow \uparrow \right\rangle\) and \(\left|{{\rm{T}}}_{-}\right\rangle =\left|\downarrow \downarrow \right\rangle\), represent undesirable leakage states. J(ϵ) is manipulated by the detuning ϵ, the potential difference between both dots. We use standard state initialization and readout based on electron exchange with the lead, Pauli blockade and charge sensing (see methods). For single-qubit operations, ϵ is pulsed on a nanosecond timescale using an arbitrary waveform generator (AWG) whereas ΔBz is typically stabilized at 2π (42.1 ± 2.8 MHz) by dynamic nuclear polarization (DNP)5. The resulting dynamics are illustrated in Fig. 1. In our simulations we use the experimentally motivated model \(J(\epsilon )={J}_{0}\exp (\epsilon /{\epsilon }_{0})\)17 to capture the nonlinear relation between control voltage and exchange coupling.

Fig. 1: ST0 qubit energy diagram and Bloch sphere.
figure 1

The eigenenergies change as a function of detuning ϵ, which is used to control the exchange coupling J(ϵ). The ϵ pulses presented in this work start and finish at a baseline \({\epsilon }_{\min }\) with low J and pulse to higher values for short periods. The maximum amplitude is constrained to below the S−T+ anticrossing at large ϵ. We choose the convention that J(ϵ) points along the y-axis of the Bloch sphere. For low ϵ amplitudes, the qubit rotates about ΔBz, the z-axis of the Bloch sphere. Large amplitude ϵ pulses rotate the qubit about the y-axis and thus enable arbitrary single-qubit gates.

Numerical pulse optimization

To experimentally implement accurate single-qubit π/2 rotations around the x- and y-axis (denoted by π/2x and π/2y), we use a control loop adapted from ref. 18 (illustrated in Fig. 2a, b) in conjunction with numerically optimized control pulses. To obtain a reasonably accurate system model for the numerical optimization procedure, we measure the step response of our electrical setup, J0, ϵ0, and ΔBz. In addition, we determine the coherence properties of the qubit to construct a noise model including quasistatic hyperfine noise, quasistatic charge noise, and white charge noise. The details of the noise and control model are discussed further below and in Supplementary Notes 1 to 6. Next, we use this model to numerically optimize pulses consisting of Nseg piece-wise constant nominal detuning values ϵjj = 1…Nseg to be programmed into the AWG with a segment duration of 1 ns. The last four to five segments are set to the same baseline level \({\epsilon }_{\min }\) for all gates to minimize errors arising from pulse transients of previous pulses. We choose \({\epsilon }_{\min }\) such that \(J({\epsilon }_{\min })\ll \Delta {B}_{z}\). Typical optimized pulse profiles \({\epsilon }_{j}^{g},j=1\ldots {N}_{{\rm{seg}}}\) for two gates g = π/2x and g = π/2y are shown in Fig. 2a.

Fig. 2: Gate set calibration (GSC).
figure 2

a Numerical pulse optimization based on a realistic but inaccurate qubit model provides initial optimal control pulses (blue) for 36 ns long π/2x and π/2y gates. According to the model, the pulses shown in red are actually seen by the qubit. b Next, these pulses are optimized on the experiment using closed-loop feedback. 8 error syndromes \(\tilde{{S}_{i}}\) are extracted in each iteration by applying the gate sequences from Table 1. In order to remove gate errors, the syndromes \(\tilde{{S}_{i}}\) are minimized by adjusting the pulse segments' amplitudes \({\epsilon }_{j}^{g}\). c Typically, GSC converges within 15 iterations and can recover from charge rearrangements in the quantum dot (indicated by a red dot, see Supplementary Note 16). Before iteration 1, the gate fidelity is typically so low that randomized benchmarking33 (RB) can not be used to reliably extract the gate fidelity. This is remedied by scaling the pulses before the first iteration, leading to an average Clifford gate fidelity between 63 and 70%. In this specific calibration run, the feedback loop improved the fidelity of the gate set first to 99.0 ± 0.1%, then to 99.3 ± 0.1% (after disabling the decoherence syndromes \(\tilde{{S}_{7}}\) and \(\tilde{{S}_{8}}\)) and eventually to 99.50 ± 0.04% (after adding a small correction of 0.05 to SM). All of these fidelities are extracted using RB. d, e For a different gate set consisting of two 24 ns long pulses, we performed self-consistent state tomography32. After a few GSC iterations, the simulated Bloch sphere trajectories (right) can be reproduced in the experiment (left). A major portion of the remaining deviation can be attributed to concatenation errors with the measurement pulses, specifically when states following large J pulses are determined.

Experimental gate set calibration

Since our control model does not capture all effects to sufficient accuracy to directly achieve high-fidelity gates, these pulses need to be refined using experimental feedback. Hence, error information about the gate set is extracted in every iteration of our control loop. Standard quantum process tomography can not be applied to extract this information as it requires well-calibrated gates, which are not available before completion of the control loop. We solve this bootstrap problem with a self-consistent method that extracts eight error syndromes Sii = 1…8 in each iteration18. The first six syndromes are primarily related to over-rotation and off-axis errors while the remaining two syndromes are proxies for decoherence. A syndrome Si is measured by preparing \(\left|0\right\rangle\), applying the corresponding sequence Ui of gates from Table 1, and determining the probability \(p(\left|0\right\rangle )\) of obtaining the state \(\left|0\right\rangle\) by measuring the sequence 103 to 104 times. For perfect gates, the first six syndromes30 should yield \(p(\left|0\right\rangle )=0.5\), corresponding to \({S}_{i}=<{\sigma }_{z}> =0\). The last two syndromes should yield \(p(\left|0\right\rangle )=0\) (Si = −1). Deviations of Si from the expected values indicate decoherence and systematic errors in the gate set. To make our method less sensitive to state preparation and measurement (SPAM) errors, we also prepare and readout a completely mixed state with measurement result SM, and a triplet state \(\left|{{\rm{T}}}_{0}\right\rangle\), which yields the measurement result ST after correcting for the approximate contrast loss of the triplet preparation (see Supplementary Note 12). GSC then minimizes the norm of the modified error syndromes \(\tilde{{S}_{i}}={S}_{i}-{S}_{{\rm{M}}}\) for i = 1…6 and \(\tilde{{S}_{i}}={S}_{i}-{S}_{{\rm{T}}}\) for i {7, 8}.

Table 1 Tomographic gate sequences.

For swift convergence, we start the control loop with numerically optimized pulses \({\epsilon }_{j}^{g}\), which theoretically implement the desired operations without systematic errors and with minimal decoherence by partially decoupling from slow noise (similar to dynamically corrected gates31). First, we scale these pulses by  ±10% in 2% increments and measure, which scaling achieves the lowest \(\tilde{{S}_{i}}\). GSC then optimizes the best pulses by minimizing \(\tilde{{S}_{i}}\) with the Levenberg–Marquardt algorithm (LMA). In each LMA iteration, we use finite differences to experimentally estimate derivatives \(d\tilde{{S}_{i}}/d{\epsilon }_{j}^{g}\), which are subsequently used to calculate updated pulse amplitudes \({\epsilon }_{j}^{g}\). Note that ΔBz is not calibrated since the control via DNP is more involved than adjusting ϵ.

Convergence

Pulses with Nseg ≥ 24 lead to reliable convergence in several calibration runs, typically within 15 iterations as shown in Fig. 2c. Thus, all experiments were performed using 36 ns long gates. An exception is the extraction of gate trajectories in Fig. 2d, e, where 24 ns long gates were used. We have chosen the calibration algorithm such that it only adjusts those segments \({\epsilon }_{j}^{g}\), which are not at the baseline, resulting in 50 free parameters for the 36 ns long π/2x and π/2y gates shown in Fig. 2a. Sometimes, moderate charge rearrangements in our sample lead to a deterioration of the optimized gates. As a remedy we run GSC again, resulting in slightly different gates than before. While the initial tune-up took about 2 days, recalibration from previously tuned-up gates only takes on the order of minutes to hours, depending on how much the sample tuning changed in between. To visualize the experimental gates, we perform self-consistent quantum state tomography (QST)32 and extract state information after each segment \({\epsilon }_{j}^{g}\). As seen in Fig. 2d, e, the qubit state trajectories for model and experiment closely resemble each other, indicating that the GSC-tuned pulses remain close to the optimum found in simulations.

Fidelity and leakage benchmarking

In order to determine the gate fidelity \({\mathcal{F}}\), we apply randomized benchmarking (RB) after completion of GSC. In RB, the fidelity is obtained by applying sequences of randomly chosen Clifford gates, here composed of π/2x and π/2y gates, to the initial state \(\left|0\right\rangle\). The last Clifford operation of each sequence is chosen such that \(\left|0\right\rangle\) would be recovered if the gates were perfect33. For imperfect gates, the return probability \(p(\left|0\right\rangle )\) decays as a function of sequence length and the decay rate indicates the average error per gate. In addition, we also apply an extended RB protocol, which omits the last Clifford from each RB sequence28. Without leakage, averaging over many randomly chosen sequences should yield \(p(\left|0\right\rangle )=50 \%\). However, for nonzero leakage we expect a single exponential decay of \(p(\left|0\right\rangle )\) as a function of increasing sequence length since the additional leakage states have the same readout signature as \(\left|1\right\rangle\) (see methods).

We indeed find such a decay law, indicated in blue in Fig. 3. A joint fit of the standard (red) and leakage detection (blue) RB data yields \({\mathcal{F}}=99.50\pm 0.04 \%\) and an incoherent gate leakage rate \({\mathcal{L}}=0.13\pm 0.03 \%\)28 (in the same GSC run we observed fidelities of 99.4% and higher six times after different iterations in separate RB runs). Since our pulses operate close to the S − T+ transition while \(\left|{{\rm{T}}}_{-}\right\rangle\) is far away in energy, leakage should predominantly occur into the \(\left|{{\rm{T}}}_{+}\right\rangle\) level. In another sample we observed 0.4 ± 0.1% leakage and much higher fast charge noise levels (indicated by a low echo time of \({T}_{2}^{{\rm{echo}}}=183\ {\rm{ns}}\) for exchange oscillations)22. This suggests that leakage is predominantly driven by charge noise.

Fig. 3: Characterization of optimized gate sets.
figure 3

The overall fidelity of a gate set consisting of 36 ns long π/2x and π/2y gates is determined using RB (red) after a specific calibration run. Each data point is an average over 50 randomly chosen sequences of the respective length, the error bars indicate the standard error of the mean. In order to determine incoherent gate leakage we supplement the standard protocol (red) by a variant, which omits the last inversion pulse (blue)28. Simultaneously fitting both curves yields \({\mathcal{F}}=99.50\pm 0.04 \%\) and a leakage rate \({\mathcal{L}}=0.13\pm 0.03 \%\). For this measurement, ΔBz was stabilized with a standard deviation of \({\sigma }_{\Delta {B}_{z}}=2\pi \cdot 2.8\ {\rm{MHz}}\). The exact fit model is given in Supplementary Note 18.

Agreement with theoretical results

We previously predicted fidelities approaching 99.9% for GaAs-based S − T0 qubits18 with the best reported noise levels5,17. To make a sample-specific comparison, we measured \({T}_{2}^{* }\) and \({T}_{2}^{{\rm{echo}}}\) by performing free induction decay and echo experiments for both hyperfine and exchange driven oscillations at several detunings. For J(ϵ) = 2π 119 MHz we find that 726 coherent exchange oscillations are possible within \({T}_{2}^{{\rm{echo}}}=6.1\ \mu\)s. This is larger than 585 oscillations reported in ref. 17 for a comparable charge noise sensitivity dJ/dϵ ≈ 2π 160 MHz/mV. Based on these measurements, we construct a noise model including white charge noise and contributions from quasistatic hyperfine and charge noise (see Supplementary Note 3) to evaluate the fidelities of the numerically optimized gates used as a starting point for GSC (see Supplementary Note 6). Without systematic errors, their theoretical fidelities of 99.86% (π/2x) and 99.38% (π/2y) correspond to an average fidelity of 99.62%. The good agreement with the experimentally observed average fidelity of 99.50 ± 0.04%, which includes residual systematic errors, indicates that this noise model can be used to obtain good estimates of the achievable fidelities. Using this model we estimate that speeding up the gate pulses by a factor 6 would increase the fidelity at least to 99.8% by further reducing the effect of ΔBz fluctuations. Additional improvements are possible by using barrier control34 or reducing charge noise and residual systematic errors.

Discussion

One implication of our results is that the unavoidable presence of nuclear spins in GaAs spin qubits, which is often thought of as prohibitive for their technological prospects, actually does not preclude the fidelities required for fault-tolerant quantum computing. While the overhead associated with mitigating nuclear-spin-induced decoherence remains a disadvantage5, GaAs quantum dot devices currently tend to be more reproducible and require less challenging lithographic feature sizes. Furthermore, they avoid the complication of several near-degenerate conduction band valleys and are better suited for the conversion between spin states and flying photonic qubits due to the direct band gap35,36. Given that GaAs has certain advantages over Si (but also other disadvantages), our study contributes to a well-founded comparison of the two material systems. While the measured infidelities are somewhat lower than achieved in some Si devices, further improvement is possible. Moreover, two-qubit gates will be a more decisive factor for scalability. Our simulations based on the same type of model validated by the present experiments predict two-qubit fidelities of 99.9%26.

Independent of the host material, a strength of S − T0 qubits are gate durations of a few tens of nanoseconds. These gates require only sub-gigahertz electrical control, in contrast to the manipulation frequencies typically used for single-spin qubits, which often lead to slower gate speeds on the order of one microsecond. Furthermore, these relaxed control hardware requirements could substantially facilitate the adoption of integrated cryogenic control electronics.

Although driven by the needs of GaAs-based S − T0 qubits, we expect that our approach is equally viable for other qubit types—specifically if many free gate parameters need to be tuned, if it is not intuitive how these parameters affect the qubit operations or if the qubit control model is not very accurate. As such, future work will focus on implementing exchange-mediated two-qubit gates26 in GaAs or Si, and experimentally studying the factors determining the success rate of GSC. With appropriate parallelization and simplifications (e.g., fixing the gradients used by GSC), our method could also be adapted for the calibration of larger qubit arrays.

Methods

Qubit system

This work was performed using two different samples. The first sample is identical to the one in ref. 22 and was used to establish the calibration routine and measure the Bloch sphere trajectories. The gate fidelities were obtained in a second sample with lower charge noise.

We work in a dilution refrigerator at an electron temperature of about 130 mK. A lateral double quantum dot is defined in the two-dimensional electron gas of a doped, molecular-beam epitaxy-grown GaAs/AlGaAs-heterostructure by applying voltages to metallic surface gates. For both samples, we use the same gate layout as ref. 7 and ref. 37 shown in Fig. 2b with two dedicated RF gates (yellow) for controlling the detuning. As we only apply RF pulses to these gates and no DC bias, we can perform all qubit operations without the need for bias tees, which reduces pulse distortions.

Quantum gates are performed in the (1,1) charge configuration, where one electron is confined in the left and one in the right quantum dot. In this regime, the computational subspace is defined by the sz = 0 triplet state \(\left|{{\rm{T}}}_{0}\right\rangle\) and the spin singlet state \(\left|{\rm{S}}\right\rangle\). The other sz = ±1 (1,1) triplet states \(\left|\uparrow \uparrow \right\rangle\) and \(\left|\downarrow \downarrow \right\rangle\) are split off energetically via the Zeeman effect by applying an external magnetic field of 500 mT.

We always readout and initialize the dot in the \((\left|\uparrow \downarrow \right\rangle ,\left|\downarrow \uparrow \right\rangle )\) basis by pulsing slowly from (0,2) to (1,1) and thus adiabatically mapping singlet \(\left|{\rm{S}}\right\rangle\) and triplet \(\left|{{\rm{T}}}_{0}\right\rangle\) to \(\left|\uparrow \downarrow \right\rangle\) and \(\left|\downarrow \uparrow \right\rangle\) (see Supplementary Note 7).

Readout calibration

For measuring the quantum state, we discriminate between singlet and triplet states by Pauli spin blockade. Using spin to charge conversion2, the resistance of an adjacent sensing dot depends on the spin state and can be determined by RF-reflectometry38. In this manner, we obtain different readout voltages for singlet and triplet states but can not distinguish between \(\left|{{\rm{T}}}_{0}\right\rangle\) and the triplet states \(\left|{{\rm{T}}}_{\pm }\right\rangle\).

The measured voltages are processed in two ways. First, binning on the order of 104 consecutive single shot measurements yields bimodal histograms where the two peak voltages roughly correspond to the singlet and triplet state. Second, the measured voltages are averaged over many repetitions of a pulse to reduce noise.

For the benchmarking experiments (which were performed using the second sample), we linearly convert the averaged voltages to probabilities \(p(\left|0\right\rangle )\). The parameters of the linear transformation are obtained by fitting the histograms with a model that takes the decay of \(\left|{{\rm{T}}}_{0}\right\rangle\) to \(\left|{\rm{S}}\right\rangle\) during the readout phase into account4.

In the first sample we also observed considerable excitation of \(\left|{\rm{S}}\right\rangle\) to \(\left|{{\rm{T}}}_{0}\right\rangle\). Thus, we modify the histogram fit model for the self-consistent state tomography data (which was obtained with the first sample) by introducing the excitation of \(\left|{\rm{S}}\right\rangle\) to \(\left|{{\rm{T}}}_{0}\right\rangle\) as an additional fit parameter.

For GSC, the averaged voltages Ui corresponding to the error syndromes Si do not need to be explicitly converted to probabilities \(p(\left|0\right\rangle )\). Since mixed and triplet state reference voltages UM and UT are measured alongside the error syndromes, it is attractive to directly minimize the norm of \(\tilde{{U}_{i}}={U}_{i}-{U}_{{\rm{M}}}\) for i = 1…6 and \(\tilde{{U}_{i}}={U}_{i}-{U}_{{\rm{T}}}\) for i {7, 8}.

Note that RB is insensitive to state preparation and measurement (SPAM) errors. In addition, we decrease the sensitivity of GSC to SPAM errors by using reference measurements for a completely mixed and a triplet state. Therefore, our readout calibration does not need to be especially accurate or precise.

Further information regarding readout can be found in Supplementary Notes 711.