Machine Learning for Continuous Quantum Error Correction on Superconducting Qubits

Continuous quantum error correction has been found to have certain advantages over discrete quantum error correction, such as a reduction in hardware resources and the elimination of error mechanisms introduced by having entangling gates and ancilla qubits. We propose a machine learning algorithm for continuous quantum error correction that is based on the use of a recurrent neural network to identify bit-flip errors from continuous noisy syndrome measurements. The algorithm is designed to operate on measurement signals deviating from the ideal behavior in which the mean value corresponds to a code syndrome value and the measurement has white noise. We analyze continuous measurements taken from a superconducting architecture using three transmon qubits to identify three significant practical examples of non-ideal behavior, namely auto-correlation at temporal short lags, transient syndrome dynamics after each bit-flip, and drift in the steady-state syndrome values over the course of many experiments. Based on these real-world imperfections, we generate synthetic measurement signals from which to train the recurrent neural network, and then test its proficiency when implementing active error correction, comparing this with a traditional double threshold scheme and a discrete Bayesian classifier. The results show that our machine learning protocol is able to outperform the double threshold protocol across all tests, achieving a final state fidelity comparable to the discrete Bayesian classifier.


I. INTRODUCTION
The prevalence of errors acting upon quantum states, either as a result of imperfect quantum operations or decoherence arising from interactions with the environment, severely limits the implementation of quantum computation on physical qubits. A variety of methods have been proposed to suppress the frequency of these errors, such as dynamic decoupling [1], application of a penalty Hamiltonian [2], decoherence-free subspace encoding [3], and near-optimal recovery based on process tomography [4,5]. In addition to these tools for error prevention, there exist many schemes for quantum error correction (QEC) that are able to return the system to its proper configuration after an error occurs [6]. The ability to correct errors rather just suppress them is vital to the development of fault-tolerant quantum computation [7].
An essential feature of QEC is the measurement of certain error syndrome operators, which provides information about errors on the physical qubits without collapsing the logical quantum state. In the canonical approach, quantum error correction is conducted in a discrete manner, using quantum logic gates to transfer the qubit information to ancilla qubits and subsequently making projective measurements on these to extract the error syndromes. However, in contrast to this theoretical idealization of instantaneous projections of the quantum state, experimental implementation of such measurements inherently involves performing weak measurements over finite time intervals [8], with the dispersive readouts in superconducting qubit architectures constituting the prime example of this in today's quantum technologies [9][10][11][12]. This has motivated the development of continuous quantum error correction (CQEC) [13][14][15][16][17][18][19][20][21][22], where the error syndrome operators are measured weakly in strength and continuously in time.
CQEC operates by directly coupling the data qubits to continuous readout devices. This avoids the ancilla qubits and periodic entangling gates found in discrete QEC, reducing hardware resources. Additionally, the presence of these entangling gate sequences and ancillas introduces additional error mechanisms, occurring inbetween entangling gates or on ancillas, that can cause logical errors [20,22]. On noisy quantum hardware, multiple rounds of entangling gates and ancilla readouts are required to accurately identify the system state [23]. All of this is also avoided by measuring data qubits directly, as in CQEC.
In addition to quantum memory, CQEC naturally lends itself to modes of quantum computation involving continuous evolution under time-dependent Hamiltonians, such as adiabatic quantum computing [24] and quantum simulation [25]. Given that the Hamiltonians considered generally do not commute with the error oper-arXiv:2110.10378v2 [quant-ph] 5 Jul 2022 ators, the action of an error induces spurious Hamiltonian evolution within the corresponding error subspace until the error is ultimately diagnosed and corrected, resulting in the accrual of logical errors [21]. CQEC can effectively shorten the spurious evolution time in the error subspaces, and therefore increase the target state fidelity in quantum annealing.
Previous theoretical work on CQEC has focused primarily on measurement signals that behave in an idealized manner [19][20][21], such that each sample is assumed to be i.i.d. Gaussian with a mean given by one of the syndrome eigenvalues. However, in real dispersive readout signals we observe a wide variety of "imperfections" caused by hardware limitations and post-processing effects, which can lead to more complicated syndrome dynamics or significant alterations to the noise distribution. A well-calibrated CQEC protocol should be designed to take into account any significant non-ideal behavior for a given architecture. However, it is often difficult to generate a precise mathematical description of the imperfections present in real measurement signals.
Machine learning algorithms offer a solution to this problem, as they can be optimized to solve a task by looking directly at the relevant data instead of relying on hard-coded decision rules. Highly expressive models involving multiple neural network layers have proven to be particularly effective at solving complex tasks such as image recognition and language translation [26]. The recurrent neural network (RNN) is a popular sequential learning model, because it operates on inputs of varying length and provides an output at each step. After being trained on a set of non-ideal measurement signals, an RNN can function as a CQEC algorithm by generating probabilities which describe the likelihood of an error at a given time step. Most importantly, the flexibility of the algorithm allows it to handle imperfections in the signal that would otherwise be impractical to model.
In this paper we investigate the performance of an RNN-based CQEC algorithm which acts on measurement signals with non-ideal behavior. We emphasize here active correction, in which errors are corrected during the experiment as soon as they are observed. To quantify the benefits of using a neural network, we compare the RNN to a conventional double threshold scheme as well as to a discrete Bayesian classifier. The first threshold scheme for CQEC was by Sarovar et al. [16], who used the sign of the averaged measurement signals (i.e., a threshold at zero) to identify the error subspace. This filter was improved upon in Atalaya et al. [19] and Atalaya, Zhang et al. [21], as well as in Mohseninia et al. [20], by adding a second threshold to better detect errors that affect multiple syndromes. We chose to compare our RNN model to the threshold scheme in [21], since it had superior performance in numerical tests (see App. G).
The remainder of the paper is structured as follows. Sec. II reviews the three-qubit bit-flip code that will be used to evaluate the three models, and outlines the idealized mathematical formulation of CQEC. In Sec. III we use physical experimental data to characterize the imperfections that are present in typical superconducting qubit signals. We find that the noise possesses a significant amount of auto-correlation, while the syndromes demonstrate complex transient behavior after every bitflip, as well as drift of the mean values over time. Sec. IV then describes in detail the double threshold, discrete Bayesian, and RNN-based models that we will be comparing. In Sec. V we test the error correction capabilities of the models using four different sets of synthetic data, each displaying a different characteristic feature or set of features of non-ideal behavior. We show that the RNN is able to outperform the double threshold across all synthetic experiments, achieving results comparable to those of the Bayesian model. Sec. VI summarizes our findings and proposes directions for future work.
In the continuous operation of the three-qubit bit-flip code, the error syndrome operators S k , k = {1, 2} are continuously and simultaneously measured to yield the following idealized signals for each S k as a function of time t: Here ρ(t) is the density matrix of the three physical qubits and Γ k m is the measurement strength that determines the time to sufficiently resolve the mean values of the syndromes under constant variance. Specifically, 1/Γ k m is the time needed to distinguish between the eigenvalues of S k with a signal-to-noise ratio (SNR) of 1 [27]. In the Markovian approximation, ξ k (t) is Gaussian white noise, i.e., ξ(t) =Ẇ (t) where W (t) is a Wiener process, with a two-time correlation function ξ k (t)ξ k (t ) = δ kk δ(t − t ), where the · denotes aver-age over an ensemble of noise realizations. In the continuous operation, the observer receives noisy voltage traces with means proportional to the syndrome operator eigenvalues and variances that determine the continuous measurement collapse timescales. Monitoring both error syndromes with streams of noisy signals represents a gradual gain of knowledge of the measurement outcome to diagnose bit-flip errors that occur. We shall refer to the parity of I k (t) as even or odd depending on whether the mean value of I k (t) is positive or negative. In an actual experiment we will only have access to the averaged signals taken at discrete time steps separated by ∆t, which we denote by I k,t at time step t: where ∆W ∼ N (0, ∆t). We shall assume that ρ(t) only changes due to bit-flips at the beginning of each time step ∆t for very small ∆t.
In previous work, Ref. [20] compared the performance of a linear approximate Bayesian classifier and the double threshold model with one threshold fixed at y = 0 and another threshold at y > 0 in correcting the three-qubit bit-flip code for quantum memory. Ref. [21] analyzed the double threshold model with two varying thresholds in correcting the three-qubit bit-flip code, and applied it to quantum annealing under bit-flip errors X q with which the chosen annealing Hamiltonian does not commute. In the current work, we shall study the performance of machine learning algorithms both in quantum memory and in quantum annealing.
The stochastic master equation (SME) [8] governing the evolution of ρ(t) under measurements with a finite rate of information extraction implied by Eq. (1) in the presence of bit-flip errors is given by [16,21] The first term describes coherent evolution of the threequbit state under a Hamiltonian H(t), which can, for instance, be a quantum annealing Hamiltonian. The second term describes the back-action induced by the simultaneous continuous measurement of the error syndrome operators S 1 and S 2 on the three-qubit state, where Γ k φ is the measurement-induced ensemble dephasing rate of the corresponding error syndrome operator S k . The measurement strength Γ k m , is related to the detector efficiency η k as Γ k m = 2Γ k φ η k The first two terms can be obtained by substituting operators c k ∝ S k into the general The third term describes the decoherence of the threequbit state in the presence of bit-flip errors, with γ q , q = {1, 2, 3} denoting the bit-flip error rate of the q th physical qubit. While the idealized measurement signals mentioned above assume no effect induced by physical experimental apparatus in the qubit readouts, there are various imperfections of the measurement signals in practice that make the error diagnosis more challenging. We shall first present the characteristics of these measurement signals from physical experiments below and explain their implications for our purpose.

A. Characteristics of CQEC Measurement Signals
The superconducting qubits are monitored using voltage signals from homodyne measurements of the parity operators that are derived from tones reflected off the resonator (see App. B). The resonator signal is fed into a Josephson parametric amplifier (JPA) in order to increase the signal strength without adding a significant amount of noise. The amplified radio frequency signals are then demodulated and digitized. After a further digital demodulation, the signals are processed with an exponential anti-aliasing filter with a time constant of 32 ns. This filtered signal, which is averaged in ∆t = 32 ns bins, is then streamed from the digitizer card to the computer.
Due to the effects of the amplifier and resonator, we expect that measurements performed on such real superconducting devices will deviate from the idealized behavior predicted by Eq. (1). In particular, we can anticipate the following three imperfections: 1. The noise will possess a high degree of positive auto-correlation at short temporal lags due to the narrow low-pass bandwidth of the JPA and antialiasing filter.
2. When a bit-flip occurs, the syndrome means will change gradually rather than instantaneously as the resonator reaches its new steady state. These The measurement signals of the two syndrome operators S1 = Z1Z2 and S2 = Z2Z3 on the transmon qubits. The even(odd) parity signal, i.e., S k = +1(−1) has a voltage readout that is centered at an arbitrary negative(positive) value, according to Eq. (B3). We note that the experimental voltage readout of even parity is centered at the negative mean by design. The upper figure is the raw voltage signal readout of a single experimental run. The lower figure is the averaged voltage readout over 47, 494 post-selected runs. The qubits are initialized to |100 and an X2 bit-flip is artificially injected at t = 3.0 µs, resulting in a new state |110 . The oscillation pattern is explained in App. E.
periods are referred to as resonator transients to stress their temporary nature, and arise because of time-dependent changes in the measurement strength Γ k m (see App. E).
3. The values of the syndromes will drift over time due to small changes in experimental conditions (e.g. temperature). Unlike the other imperfections, this effect is only noticeable when comparing across quantum trajectories rather than within them.
These non-ideal behaviors in the measurement signals extracted from our typical physical experiments will be incorporated into our simulated experiments in Sec. V. Fig. 1 shows experimental dispersive readouts taken from three transmon qubits [28] over the span of 6 µs [22]. The blue and orange lines are a record of the outputs from the two resonators, each measuring a different pair of qubits for their syndromes. The top figure shows the measurement signals from a single experiment, which contain large amounts of auto-correlated noise. During the experiment an X 2 error was injected at 3.0 µs, flipping the system from |100 to |110 , but the weakmeasurement noise largely obscures its effect on the syndrome values.
To reveal these underlying syndromes, the bottom figure of Fig. 1 shows an average over the measurements from roughly 47,500 experiments, each initialized to |100 and injected with an X 2 error at 3.0 µs. It takes approximately 2 µs after initialization for the syndromes to reach their steady-state values for |100 , as the number of photons in each resonator increases from zero gradually. We ignore this effect in our analysis, as it will only occur once at the start of an experiment. After the X 2 error is injected, the syndromes do not instantaneously jump to a new pair of values but instead enter a transitory period which can include significant oscillations. These transients derive from the time-dependent changes in the measurement rate Γ k m (t) analyzed in App. E. This period lasts for roughly 2 µs, after which the syndromes stabilize at their new steady-state values for |110 .
Depending on the underlying hardware, a measurement signal may be generated on a wide variety of different scales, such as the arbitrary voltage scale in Fig. 1.
To denote a signal generically on any scale, we write the measurement samples as whereS k,t is the scaled mean of the k-th resonator at step t, τ k is the scaled variance of the k-th resonator, and ε t ∼ N (0, 1). In this notation, the physical quantities Γ m and ∆t from Eq. (2) have been absorbed intoS k,t and τ k .

B. Impact of Auto-correlations
Unlike the other imperfections, the challenge posed by auto-correlated signal noise can be characterized theoretically. If the Gaussian noise in I k,t is correlated, then the distribution of noise samples can be parameterized in terms of a covariance matrix Σ whose off-diagonal elements determine the degree of correlation. For simplicity we restrict our analysis to dependencies that are Markovian, such that I k,t depends only on the preceding measurement I k,t−1 , though our conclusions are not limited to this regime. Using a correlation coefficient of 0 < ρ < 1, the joint Gaussian log-density describing I k,t and I k,t−1 is whereĨ k,j ≡ I k,j −S k,j denotes the centered signal sample at step j and A is the log of the normalization constant. We shall assume hereafter that the signal has been rescaled such thatS k,j = ±1. The effect of auto-correlations on error correction is best characterized in terms of how it impacts the usefulness of the syndrome measurements. To be more precise, we know that the purpose of each measurement is to provide some information about whether the underlying syndrome value of the state is 1 or −1. When framed in these terms, we can formalize and quantify a notion of measurement "usefulness" using Bayesian theory, specifically a ratio called the Bayes factor which we denote as φ [29]. This factor can be written in log form as and quantifies how much evidence I k,t gives about the underlying syndrome value if we have already seen the previous measurement I k,t−1 . The larger the magnitude of log φ k,t the more useful I k,t is for our task, with its sign simply indicating whether the evidence supports a value of 1 or −1. Let Q = Σ −1 . By making the substitutions σ −1 = Q 22 and µ =S k,t −Q 12 /Q 22 (I k,t−1 −S k,t ) in the unconditional log-densities −(I k,t −µ) 2 /(2σ)+A, each of the conditional log-densities in Eq. (5) can be written as where A is again the normalization constant [30]. Expanding the numerator and keeping only the terms that depend onS k,t gives where we ignore the other terms since they will cancel when computing log φ k,t . After substituting this representation back into Eq. (5) we get where the value of log φ k,t depends not only on I k,t and I k,t−1 but also on the variance and auto-correlation of the measurements. To see the impact of the auto-covariance more clearly, we compute the expectation value E[log φ k,t ] with respect to a Gaussian distribution centered on the true syndrome value S k,t = ±1. Since Eq. (6) is linear, we can simply substitute in S k,t for I k,t and I k,t−1 to get E[log φ k,t ]. After taking its magnitude, we have which decreases as the value of ρ increases. Eq. (7) shows that positive auto-correlation (ρ > 0) in the signal makes each of our measurements less useful than if the noise had been uncorrelated (ρ = 0), which means that it will take longer for us to determine the value ofS k,t at a given measurement strength. This result can be understood by imagining thatS k,t and I k,t−1 are competing to determine the value of I k,t , with smaller ρ favoringS k,t . The more thatS k,t affects the measurement, the more that the measurement in turn tells us aboutS k,t and thus the more useful it is to us. When ρ is large, the value of I k,t tends to lie very close to the value of I k,t−1 regardless of whetherS k,t is 1 or −1, and therefore the measurement does not reveal much new information about the syndrome.

A. Double Thresholds
The double threshold protocol from [21] uses two standard signal processing methods, filtering and thresholding, to identify errors. The raw measurement signal is first passed through an exponential filter to smooth out oscillations, and then this averaged value is compared to a pair of adjustable threshold values to determine the state of the system. A slightly different double threshold protocol was proposed in [20], which used boxcar averaging and fixed one of the thresholds at zero.
To estimate the definite error syndromes from the noisy measurements, we first filter the raw signals I k (t) to obtain corresponding filtered signals I k (t) according tȯ where τ is the averaging time parameter, and whose discretized version is similar. In the regime where t−t 0 τ where t 0 is at the last filtered signal reset, I k (t) reads as

Thresholds for Error Correction
After filtering the measurement signals, we then apply a double thresholding protocol to the filtered signals I 1 (t) and I 2 (t) that is parameterized by the two thresholds Θ 1 and Θ 2 , where Θ 1 is the threshold for the −1 value of the error syndromes and Θ 2 is the threshold for the +1 value of the error syndromes. If at least one of I 1 (t) or I 2 (t) is found to lie within the interval (Θ 1 , Θ 2 ), we declare to be uncertain of the error syndromes and do not perform any error correction operation. Otherwise, we apply the following procedure, in accordance with the standard approach for error diagnosis and correction. If both I 1 (t) > Θ 2 and I 2 (t) > Θ 2 , then we diagnose the error syndromes as (S 1 = +1, S 2 = +1) and accordingly perform no error correction operation. If I 1 (t) < Θ 1 and I 2 (t) > Θ 2 , then we diagnose the error syndromes as (S 1 = −1, S 2 = +1) and accordingly perform the error correction operation C op = X 1 . If both I 1 (t) < Θ 1 and I 2 (t) < Θ 1 , then we diagnose the error syndromes as (S 1 = −1, S 2 = −1) and accordingly perform the error correction operation C op = X 2 . If I 1 (t) > Θ 2 and I 2 (t) < Θ 1 , then we diagnose the error syndromes as (S 1 = +1, S 2 = −1) and accordingly perform the error correction operation C op = X 3 .
In quantum annealing, we note that the error correction operations are applied immediately after the error syndromes are diagnosed to minimize the aforementioned spurious Hamiltonian evolution. The action of an error correction operation C op , assumed to be instantaneous, changes the three-qubit state ρ(t) according to which applies to other models in our work as well. We note that the parameters {τ, Θ 1 , Θ 2 } constitutes the minimal set of tunable parameters. When the measurement signals I k have white noise, their optimal values in minimizing the logical error rate can be obtained by Eq. (43) in [21] together with numerical optimizations.
We further reset the filtered signals I k (t) to the corresponding initial syndrome value, at the same instant to avoid the transient delay in the filtered signals to reflect the application of the error correction operation on the state. Inherent within any error correction protocol, however, is the implicit assumption that the correction properly removes the error, which may not necessarily be the case if the error was misdiagnosed.
We note that the I k (t) used by the double threshold model in CQEC consists of weighted contributions from every raw signal taken prior to t and after the last correction. The discrete Bayesian model and the RNN-based model that we discuss in this work can both be operated on raw signals, using all historical signals taken prior to a given t. This is in contrast to the projective measurement on ancilla superconducting qubits in discrete QEC that applies a matched filter [31] on raw signals taken only within each detection round.

B. Discrete Bayesian Classifier
One weakness of the double-threshold scheme is that its predictions are essentially all-or-nothing, since there is no in-built quantity that expresses the model's confidence. This contrasts with probabilistic classifiers, which generate probability values for each prediction class instead of only a single guess. By framing the classification problem in terms of probabilities, we can incorporate our knowledge of the error and noise distributions into our model in a mathematically rigorous manner.
Since each qubit in our system will experience either one or zero net flips after every time step, there are eight different ways that a state can be altered by bit-flips and therefore eight different classes that our classifier must track. We denote each of the possible bit-flip configuration using the state that |000 is taken to by the error, such that |001 denotes a flip on the third qubit, |110 denotes a flip on the first and second qubits, and so on. The goal of a probabilistic error corrector is to accurately determine the probability of all eight "error states" at time step t given the measurement histories We write this posterior probability aŝ where s t ∈ {0, ..., 7} denotes the digital representation of the error state at step t.
In the remainder of this subsection we consider a probabilistic classifier constructed using Bayes' theorem, which makes prediction based on the posterior probabilities of the different basis states at each time step [32]. Starting with the knowledge of the initial state, this model uses a Markov chain and a set of Gaussian likelihoods to update our beliefs about the system conditioned on the specific measurement values that we observe.
The Bayesian algorithm described in this section is derived by assuming that the mean of a given measurement I k,t is always determined by the state of the system at the end of the time step. This is equivalent to assuming that errors always happen at the beginning of each time step (see Sec. II). Since our method for generating quantum trajectories follows this assumption, the Bayesian model is theoretically optimal for the numerical tests carried out in Sec. V without mean drift or resonator transients. As the length of the step ∆t between measurements goes to zero, this algorithm converges to the Wonham filter [33], which is known to be optimal for continuous quantum filtering of error syndromes [34]. This filter is similar to the discretized, linear Wonham filter derived in [20], except that our filter does not rely on first-order approximations of the Markov evolution or Gaussian functions.

Model Structure
Using Bayes' theorem, the posterior probability of Eq. (8) can be rearranged into the recursive form where we assume that the occurrence of an error is independent of any previous measurements and that I k,t depends on the error state at time t along with past signal values due to auto-correlations. This recursive expression describes a Bayesian filter which takes prior information about the error state of the system and updates it based on the transition probabilities p(s t |s t−1 ) and measurement likelihoods . The filter can be easily implemented once we have functional forms for these two terms, which we describe next.

Markovian State Transitions
The Markovian assumption inherent in p(s t |s t−1 ) is reasonable, given that the net effect of an additional bitflip error depends only on the error state the system before the error. We assume hereafter that the error rate γ q is identical for all three qubits, i.e., γ q = γ. This allows us to model the errors as a Markov chain [35] with an 8 × 8 rate matrix Q given by where we define our basis such that index i ∈ {0, . . . , 7} corresponds to the error state whose classical binary representation is equal to i, e.g. 5 → |101 . Since Q only gives the rate of transition per unit time, we need to compute the transition matrix J in order to get probabilities for a finite step. This matrix can be derived from Q as where ∆t is the length of the time step. Element J ij gives the probability of transitioning from error state i to error state j across the time step, so we can relate p(s t |s t−1 ) to J as p(s t = j|s t−1 = i) = J ij . Using J, the sum in Eq. (9) can be evaluated to give probabilitiesp(s t ) which take into account the transitions induced by bitflip errors during the time step.

Measurement Likelihoods
The measurement likelihood p(I 1,t I 2,t |s t M 1 t−1 M 2 t−1 ) describes the probability of generating signal values I 1,t and I 2,t given that the system is in error state s t and that we had previously measured the values in M 1 t−1 and M 2 t−1 . Since the noise from each syndrome is independent, we can factor the likelihood as with I 1,t and I 2,t contributing independently to the probability.
If the noise source is assumed to be Gaussian, then the probability density for each I k,t has the form where µ k,t and σ 2 are the mean and variance of the signal conditioned on the past measurements M k t−1 . In practice the auto-correlations rapidly decay, so we only need to condition on a small number of recent measurements. Hence, we let m k,t−1 be the vector of these measurements, and let c be the vector of their corresponding covariance values. Then where 1 is a vector of ones with the same dimension as m k,t−1 , Σ is the covariance matrix of the variables in m k,t−1 , andS k,t is the mean corresponding to error state s t [30]. Since the system always begins in the coding subspace, each error state maps to a definite error subspace and therefore has definite syndrome values regardless of how the logical state was initialized.
After the measurement pair I k,t is received, the Gaussian likelihood functions are used to convert the probabilities from Eq. (11) into the next posteriorsp(s t ) aŝ which will become probabilities after normalization.

Procedure for Error Correction
The probabilities from Eq. (14) can be understood as describing how likely it is that the system is in each of the eight error states based on the judgment of the model. Whenever |000 does not have the highest probability, we can infer that at least one error has occurred and take the appropriate action to correct it. This procedure, which effectively takes the argmax of the posteriors, can be altered if certain forms of misclassification are more costly than others, or if the act of making a correction itself carries some cost. The procedure can also be modified so that it is more robust to imperfections in the signal, as we do in Sec. IV C by introducing the τ ignore and τ streak hyperparamters.
Whenever any correction is made, we must update the model with this information by permuting its probabilities to reflect the applied bit-flip. In our example, a correction on the second qubit would lead us to swap the probabilities between pairs of error states which differ in only the second qubit, e.g., |010 |000 . Without this update the model will continue to recommend the same correction repeatedly, as it does not realize that the state of system has been changed.
A connection can be made between the Bayesian algorithm described here and the maximum likelihood decoder (MLD) commonly used in discrete error correction [36]. Given a specific noise channel and qubit encoding, the MLD is the protocol with the greatest probability of successfully correcting an error, assuming that we have access to projective measurements of the syndromes. The Bayesian model can be viewed as an extension of the MLD to the continuous measurement regime, where the syndrome measurements provide us with incomplete knowledge of the error subspace. As the variance of the Gaussian measurement noise goes to zero, the Bayesian model reduces to the standard MLD protocol for the three-qubit bit-flip code.

Impact of Signal Imperfections
Compared to thresholding schemes, the Bayesian classifier described here is far more sensitive to the assumptions we make about the noise and error distributions. Such sensitivity can be an advantage, since it allows for near optimal performance when our knowledge of these distributions is accurate.
Of course, when our assumptions about the distributions are wrong, the accuracy of the model can suffer significantly. Out of the three imperfections described in Sec. III, only the auto-correlation of neighboring samples is directly accounted for in the model. The resonator transients occur over relatively short time intervals, so they are likely to have only a modest impact on the model's performance. The syndrome drift also has a negative impact, as the mean values of the Gaussian distributions are key parameters in the model. If there is a discrepancy between the actual signal means and our pre-programmed values, then every measurement likelihood calculation will be biased.
We explore the size and significance of these effects for all three of our models in Sec. V.

C. Recurrent Neural Network (RNN)
Neural networks are a subset of the broader family of machine learning methods based on acquiring a learned representation of the data, which consists of parameterized layers of linear transformations and nonlinear acti-vation functions. RNNs are a class of neural network in which the layers connect temporally, combining the previous time step and a hidden representation into the representation for the current time step. They are thus well suited for representation of the time-dependence of continuously measured error syndromes over discrete time steps. Using a training set of labeled signals, the RNN can learn the properties of the weak measurement signal and the structure of the underlying bit-flip channel, which allows it to accurately detect errors as they occur.
The dynamics of a simple recurrent neural network can be expressed by the following equations: For each time step t, the network accepts the input vector x t and, along with the hidden state vector from the previous time step h t−1 , performs a linear transformation parameterized by the weight matrices W h and U h and the bias vector b h before applying a nonlinear activation function given by σ h . The result is the hidden state vector for the current time step h t , which is acted upon by an analogous series of operations defined by W y , b y and σ y to produce the output vector y t . We note that the hidden state h t effectively encodes a description of the history of inputs {x t } t =t t =0 , which therefore allows the network to extract temporal, non-Markovian features from the data.
In our context, we consider the input at each time step to be the vector of measurement signals plus the initial basis state, Moreover, instead of the standard recurrent neural network architecture, we use a long short-term memory network (LSTM) [37], which is a particular type of recurrent neural network that involves cell states and various gates to evade the vanishing gradient problem of standard RNN architecture [38]. Nevertheless, the same principle underlying the standard function of RNN applies. The output y t of the LSTM layer is subsequently passed through a dense layer and a softmax activation to produce the posterior probabilities of the eight basis states p(s t |M k t ), and we select the basis state with the highest posterior as the predictionŝ t .

Training
Training samples for the RNN require accurate labeling of the states corresponding to the measurement signals at every time step. However, in reality, decoherence effects such as amplitude damping and thermal excitation prevent us from knowing the correct state of the system at some arbitrary time. As a result, to train the RNN, we have to resort to measurement signals with a well defined underlying quantum state. This can be achieved by simulating the measurement signals on states in the absence of unwanted decoherence effects, which will be described in details in Sec V. In the simulations, we provide the measurement strength, the single-qubit bit-flip error rate and the initial quantum state as input parameters, and the simulation produces a large number of quantum trajectories to be the training samples of the RNN. We then train the RNN to diagnose bit-flip errors on the threequbit system, and the trained RNN can be subsequently used to actively correct for errors that occurred. That said, the same information used to generate the training samples is also provided as prior knowledge to the double threshold and the Bayesian model. The two models both require an explicit estimation of the measurement strength as well as the assumption of a certain error rate.
We maximize the likelihood of the RNN parameters on the training set by minimizing the cross entropy batch total loss function, which is defined as where p n (s t ) stands for the posterior probability of the true basis state s t at time step t in the n-th sample, while N denotes the mini-batch size and T denotes the total number of steps in each training sample.
To update the parameters to minimize the loss, we perform an iterative training procedure where for each step and parameter w, one applies a gradient descent update of the form w ← w − η(∂L/∂w), where the gradients ∂L/∂w are computed via backpropagation through the computation graph of the network.
In our experiments, the gradient descent update is preformed using the ADAM optimizer [39]. We adopt a twolayer stacked LSTM with a hidden state size of 32. This small hidden size limits the largest matrix-vector multiplication in computations, hence the memory required, and also limits the number of parameters, facilitating the implementation of the network in real-time experiments. We further provide a comparison test on the performance of different hidden state sizes in App. D and show that both smaller LSTM and gated recurrent unit (GRU) architecture [40] offer comparable performance for our purpose. The number of stacked layers of the LSTM/GRU and the hyperparameters, such as the batch size in training, are tuned with the assistance of Ray Tune [41].

Re-calibration Method for Error Correction
When performing active error correction, we once again wish to avoid the delay in the posterior probabil-ities output by the network to reflect the application of an error correction operation C op on the system. In the case of the Bayesian classifier, we permute the elements of the vector of posterior probabilities, which encodes the state of the model, in accordance with the error correction operation. For the RNN, however, we cannot apply a particular transformation to the hidden state such that the vector of posterior probabilities outputted by the network is permuted in analogous manner, since the function mapping the hidden state to the output vector of posterior probabilities is highly nontrivial.
Any such delay in the network remaining unaware of the quantum state having been corrected is harmful, because another error X q occurring during this delay, compounding with the correction C op on the first error, will induce a logical error at the next error correction operation. To see this clearly, considering that the physical qubits are initially in |000 , and the first error X 1 results in the state |100 . After detecting the error, the model makes a correction that instantly returns the state back to |000 . However, the RNN still has the knowledge of the qubits being in |100 until some time later at t realize before accepting sufficient number of x t 's that allows it to predict |000 . If a second error X 2 occurs before t realize , the syndromes become (S 1 = −1, S 2 = −1) because the state becomes |010 , whereas the RNN, only knowing the state in |100 , will eventually predict |101 that has the same syndromes, which is then equivalent to diagnosing an X 3 error. After applying a second error correction C op = X 3 , the physical qubits are now in |111 , constituting a logical error. In other words, since we are not capable of injecting the knowledge of a correction operation into the RNN, a correction operation is equivalent to an error seen by the RNN and active correction effectively increases the bit-flip error rate γ in the eyes of the network. Although the correction is correlated with the detected error, the network is generally trained on quantum trajectories with uncorrelated random bit-flip error instances. As will be explained in V A that a greater γ will induce more logical errors, we conclude that the naive approach of active correction with the RNN suffers from more logical errors. Therefore, we propose the following re-calibration protocol to effectively hide the action of any error correction operation from the network, so that there is no longer any delay in the posterior probabilities to begin with.
We specifically keep track of all the error correction operations that has been applied up to the present t, N q,t = Number of X q corrections applied.
When the measurement signals I 1,t and I 2,t have symmetric noise around their respective mean values and the possible means of I k,t are always equal and opposite, each C op correction changes the mean of I 1,t by a factor of −1 if C op = X 1 , changes the mean of I 2,t by a factor of −1 if C op = X 3 , and changes the mean of both I k,t by a factor of −1 if C op = X 2 . To hide all the corrections done in the past, the measurement signals that are provided as input to the network for all subsequent time steps are then flipped according to N q,t , I 1,t = (−1) N1,t+N2,t I 1,t , which we called the re-calibrated signals. From the perspective of the RNN when taking in I k,t , it appears as if no error correction operation has been applied to the physical qubits.
Given that at some time step we predict a different stateŝ t , we now perform our error correction operation relative to the previous predicted stateŝ t−1 .

Adaption to Resonator Transients for Probabilistic Models
When the possible means of I k,t are not equal and opposite, as occurs in the resonator transients upon applying C op , the re-calibration method breaks down, because flipping the means of either or both I k,t does not produce the means as if there was no correction applied. A solution to this is to impose an ignore time period τ ignore right after the correction is applied at some t. During (t, t + τ ignore ], no input x t is fed into the RNN. As a result, the hidden state of the network is frozen until the ignore time period ends. The re-calibrated signals are accepted by the network only after t + τ ignore , which reduces the risk of getting incorrect predictions during the transients, but effectively increases the detection time of any error that occurs during the ignore period.
Imposing τ ignore should be accompanied by a measure to ensure that the RNN diagnoses any error with sufficiently high confidence so that fewer false alarms of error will be followed by an ignore period τ ignore upon correction. A feasible measure in practice is to determine an error correction operation only if the RNN predicts the same state {ŝ t } t =t+τ streak t =t for a streak of time steps τ streak that is different from the old stateŝ t−1 , which is a discrete quantity that is easy to optimize. The {τ ignore , τ streak } then constitutes a minimal set of tunable hyperparameters for the task of active correction in the presence of resonator transients, which applies to the Bayesian classifier explained in Sec. IV B as well.

V. SIMULATED EXPERIMENTS
To evaluate the effectiveness of the three models described in Sec. IV, we test their error correction capabilities on a large number of synthetic measurement sequences. The motivation for using artificial data instead of real data is twofold. First, by using artificial data we can precisely control the underlying measurement distribution, which allows us to separate out the effects of the different imperfections identified in Sec. III. Second, it is important that we know the true state of the system at every time step, as this is necessary both to train the RNN and to calculate intermediate fidelity values. Such knowledge would not be possible on a near-term quantum computer due to strong undesirable decoherence.
To ensure that our simulations are grounded in reality, we model them on data taken from a superconducting qubit device. Fig. 1 shows measurements taken from this reference data, which consists of approximately 1.6 × 10 6 sequences lasting 6 µs each [42]. The sequences are comprised of 192 measurement pairs (one for each resonator), sampled every 32 ns. The data contains both "flat" sequences, in which no bit-flip occurs, as well as sequences in which a bit-flip is deliberately applied to one of the three qubits to induce a state transition. Since these bitflips are all applied at precisely the same time, we are able to track how the the signal mean changes during the transient period.
Across all of our tests we employ four different simulation schemes, each of which is described below. The schemes are designated with letters A-D in order of how much non-ideal behavior they include, with Scheme A having no imperfections and Scheme D having all three imperfections. In all schemes, we ignore the thermal excitation for each qubit, since a typical excitation rate is on the order of 1 ms −1 .

Scheme A: Idealized Behavior
In our first scheme, the simulated signal simply conforms to the idealized behavior given by Eq. (1). At the beginning of each measurement sequence the system is set to a specified initial state in the coding subspace, and then the state of the next time step is determined by sampling a number n q of bit-flips X q for each qubit from the Poisson distribution, such that n q = exp(−γ∆t)(γ∆t) nq /n q ! where ∆t is the time step size. These errors are applied to the corresponding qubits to get the next state. This cycle of sampling and propagating errors is repeated until we have generated a sufficiently long sequence of states.
To create the corresponding I k,t , we sample a univariate Gaussian distribution at each time step with variance (Γ k m ∆t) −1 and a mean of ±1 determined by the syndrome eigenvalue at that step. Our reference data has Γ k m ≈ 4.7 × 10 6 s −1 , ∆t = 32 × 10 −9 s, η k ≈ 0.5, where Γ k m needed to be estimated from the measurement signals while ∆t was known to us in advance. This sequence of Gaussian samples plus the underlying states provides a complete description of a system in the context of our error correction task.

Scheme B: Auto-correlations
As a first step away from ideal behavior, we consider noise that is correlated across time. The data generation process for this scheme is effectively the same as that of Scheme A, except that the noise must be sampled sequentially in order to correctly capture the autocorrelations. In our reference data we find that significant auto-correlations extend back roughly four steps, with covariance given by c T k ≈ 5.94 · 0.61 0.25 0.1 0.05 whose ith element is at lag-i. These values were found by taking every contiguous subsequence of length five in our reference data and using them all to compute a covariance matrix. We can simulate Gaussian noise with these autocorrelations one step at a time using Eqs. (12,13).

Scheme C: Auto-correlations with Resonator Transients
For our third scheme, we keep the auto-correlations from Scheme B but alter the behavior of the syndrome values so that they include the resonator transients seen in Fig. 1 and explained in App. E. To incorporate these patterns into our simulation, we first extract the mean values of the transient patterns from our reference data, consisting of 94 steps in total, for each of the twenty-four different single-flip transitions. Our sequence generation process is then identical to Scheme B, except that after an error occurs the next 94 measurements are sampled from Gaussians centered on the transient means instead of the syndrome eigenvalues. The pattern that we use is matched to the state of the system before and after the error. After the transient period has elapsed, the means are set back to ±1 and further samples are generated as usual until another error occurs.

Scheme D: All Imperfections
Our final simulation scheme takes the auto-correlations and resonator transients from Scheme C and adds an underlying drift term to the the syndrome means. Since our reference data contains over a million trajectories collected over the span of multiple hours, it is possible to observe significant differences in the syndrome means between trajectories that are separated by large amounts of time, possibly due to temperature fluctuations.
For our experiments we elected to apply a linear drift ∆ i governed by where i is an index that arbitrarily orders the different measurement sequences that we generate and N is the to-tal number of these sequences. This drift term is added to every measurement in the ith sequence, resulting in a uniform shift of the overall signal means. The net drift across all runs represents a 40% change, which is consistent with the magnitude of the drift observed in our reference data.

A. Quantum Memory State Tracking
In quantum memory, it suffices to track the basis states in response to the bit-flip errors that have occurred and only apply error correction operations when needed. We generated 30, 000 trajectories of length T = 20 µs from all four simulation schemes with a pre-defined single-qubit error rate as our testing samples, among which are equal portions of trajectories initialized in one of the eight basis states. While the RNN model employed here is trained on 100, 000 quantum trajectories from the corresponding simulation scheme, the error rate, noise variance and auto-correlations input to the Bayesian model are also estimated from those quantum trajectories. The tunable parameters in the double threshold model are numerically optimized in schemes with imperfections; the filtering time τ typically lies in the range 0.3 − 1.6 µs, with larger τ for smaller γ.
In Fig. 2, we compare the final fidelity F = | ψ T |ψ 0 | 2 against the initial state of the three models in tracking these quantum trajectories subject to bit-flips. The trend is that the final fidelity decreases as a function of the single-qubit error rate γ. This is because the higher the error rate is, the more chances there will be two different bit-flips before the correction to the first bit-flip is made, resulting in a logical error upon the correction, and therefore a lower final fidelity. For instance, a state starting at |000 is flipped to |001 at t 1 and is later also flipped to |011 at t 2 > t 1 , such that t 2 is smaller than t 1 +t detect where t detect is the detection time of the first error. Subsequently, the model perceiving syndromes with (S 1 = −1, S 2 = +1) will eventually make a C op = X 1 correction and change the state to |111 , leading to a logical error. From the above argument, it is also evident that a shorter detection time is beneficial.
From Fig. 2, we see that the RNN and the Bayesian classifier outperform the double threshold in all simulation schemes, whereas the RNN approximates the Bayesian classifier in all schemes. As discussed in Sec. IV B, the Bayesian classifier is the optimal model of the three in Schemes A and B where there are only auto-correlations in the signals, which is validated in this task. The fact that their performances in Schemes C and D are very similar to that in Scheme B indicates that the resonator transient pattern and the drifting of the means do not have a significant effect on all three models.
It is reasonable that the drift has a small negative effect to the two probabilistic models, since the drift is usually on the order of the separation of mean values of the two parities, which is in turn one order of magnitude smaller than the standard deviation of the noise. The large noise variance obscures the drifting means, making the drifted signals appear like more noisy signals with fixed means.

B. Extending T1 Time of the Logical Qubit
Although the models are motivated by correcting bitflip errors, they can also be exploited in extending the T 1 time of the logical qubit in |1 L = |111 . For this task, actively correcting the state is required as opposed to merely tracking the state. While for practical purpose the RNN model is trained on 30, 000 quantum trajectories under bit-flips with a length of T = 120 µs, the Bayesian model, whose parameters are estimated from the same set of trajectories, uses a different transition matrix generated by Q shown in Eq. (F1) which takes into account the asymmetric probabilities of transitions between the ground and excited state. The parameters for the double threshold model is numerically optimized on the same set of quantum trajectories.
For the three-qubit system initialized to the fully excited state |111 , we inspect the population within a Hamming distance 1 away from the initial state, i.e., the population P exc of the set of basis states {|111 , |110 , |101 , |011 }, since these states can be recovered to the initial state by a majority vote. We compare this P exc against the population of the excited state |1 of a bare qubit as a function of time in all four simulation schemes, and the results are shown in Fig. 3. In all schemes, the encoded three-qubit system P exc decays much slower under active correction by any of the three models than the bare qubit excited state population. In all schemes, both the Bayesian and the RNN-based model outrun the double threshold model.

C. Protecting against Bit-flip Errors
Similar to the task of extending the T 1 time of the state |1 L , here we employ the three models to protect the initial state |1 L from bit-flips. Shown in Fig. 4, we compare the population P exc of the three-qubit system against the excited population of the bare qubit in time. For Schemes A and B, both the Bayesian and the RNN-based model have an advantage over the double threshold. Furthermore, in Fig. 4 we extract the initial logical error rate Γ L as a function of γ by computing the time derivative of P exc at 9.6 µs at each γ. In either scheme with any of the three models, Γ L scales approximately quadratic in γ, and we can see a strong suppression of Γ L relative to a bare qubit or the uncorrected three qubits. We remark that, by introducing feedback based on noisy weak measurements, any correction protocol can underperform a majority vote on the encoded qubits without error correction at sufficiently small γ or runtime.
To better understand the performance of the models in this important task, we analyze the detection time spent in true positive detection as well as the number of false alarms when the three-qubit system is in |1 L . The difference between a true positive and a false alarm is illustrated in Fig. 5, which shows the actual and predicted Each data point is averaged over 3, 000 independent quantum trajectories. The three-qubit system is initialized to |1 L = |111 . As a comparison, the bare qubit (purple curve) is initialized to |1 and is subject to a bit-flip rate of γ = 0.04 µs −1 . As a reference, the uncorrected three-qubit system decay curve is shown in red (see App. F). In Schemes A and B, the Bayesian model is the best among the three, and the Bayesian and RNN-based model both outrun the double threshold model. Right: the initial logical error rate ΓL at 9.6 µs as a function of the single-qubit error rate γ. The fitted quadratic curves show a strong suppression of ΓL for all three models in both schemes. states of the system when an X 3 error occurs and when the model falsely detects an X 1 error. When a true error occurs, the system remains in the corresponding error subspace for a duration determined by the detection time of the model, after which the error is corrected. By contrast, when the model falsely detects that an error has occurred due to measurement noise, it improperly applies a bit-flip to the system and thus pushes it out of the code subspace. After more measurements are recorded, the model determines that the system is in an error subspace and fixes its mistake by applying another bit-flip.
As explained in Sec. V A, a shorter detection is favorable and will lead to better error corrections, whereas here we can expect more frequent false alarms arises for models with a shorter detection time as a trade off, since the model is prone to make a correction. This is demonstrated in Fig. 6, where we can see that the best two models, the Bayesian and the RNN-based, both have a shorter detection time and more frequent false alarms at the same time. Nevertheless, for both of these two models, the overall frequency of all false positive detection remains low and is on the order of 0.1 µs −1 . At 1.0 µs an X3 error is applied to the system, and after a small delay the error is detected and corrected. At 3.0 µs the model falsely detects and then "corrects" for an X1 error, which results in the system being temporarily pushed into an error subspace before the mistake is recognized and corrected.
There are visible small constant offsets between the prediction and the system state at the false alarm due to the streak time period imposed in the correction protocol.

D. Quantum Annealing with Time-dependent Hamiltonians
Having demonstrated a clear advantage using the RNN-based protocol for tasks in the quantum memory setting over the double threshold protocol, we now study the performance of our protocol for quantum annealing, using a time-dependent Hamiltonian that does not noncommute with the bit-flip errors. We note that the protocol is also applicable to evolution under quantum gate operations.
In quantum annealing, it is imperative to perform error diagnosis and correction in a manner that is both fast and accurate, in order to avoid accruing these logical errors while single bit-flip errors are being diagnosed and corrected. This is because the action of an error X q effectively transforms the Hamiltonian from H(t) to X q H(t)X q in the Heisenberg picture. Until the error is properly diagnosed and corrected, subsequent coherent evolution of the logical state in the code subspace is due to the modified Hamiltonian X q H(t)X q . If the original Hamiltonian does not commute with the error, i.e. X q H(t)X q = H(t), then such evolution will be spurious rather than as originally intended, causing logical errors to accrue.
For this simulated experiment (see App. C), the annealing Hamiltonian with a strength Ω 0 evolving ρ 0 = |ψ 0 ψ 0 | , |ψ 0 = (|0 L + |1 L )/ √ 2 is chosen to be where a(t) = 1 − t/T and b(t) = t/T . In the code subspace, it is equal to whereas in any error subspace it is equal to the spurious Hamiltonian, We adopt the reduction factor [21] as the metric for evaluating the model performance, which is defined as, whose numerator is the final infidelity of an unencoded bare qubit initialized to |0 under the annealing Hamil- tonian Eq. (18), and whose denominator is the final infidelity of the three-qubit encoded state in the code subspace with respect to the target quantum state. Aṡ a(t),ḃ(t) → 0, the target quantum state becomes the ground state of the target Hamiltonian. As shown in Fig. 7, at relatively low γ, the Bayesian model achieves the highest reduction factor in Scheme A, while both the Bayesian and the RNN-based model outperform the double threshold. However at sufficiently high error rates γ, the encoded qubits under active correction using any of the three models show no improvement over a single unencoded qubit, as expected.

VI. DISCUSSION
We have proposed an RNN-based CQEC algorithm that is able to outperform the popular double threshold algorithm across all tasks for each of the four simulation schemes tested in Sec. V. This result holds regardless of whether the algorithms are protecting a system from bitflip errors or from amplitude damping, and applies in the case of both quantum memory and quantum annealing. The relative performance of the three models does not depend significantly on the underlying error rate or the duration of the experiment, unless either of these values is exceptionally large.
The mathematical simplicity of Eq. (1) is a product of many idealized assumptions, so we can expect that measurements taken from real quantum devices will not necessarily be as easy to describe. Our analysis of superconducting qubit measurements in Sec. III reveals sev-eral examples of non-ideal behavior in both the syndrome and noise distributions, and we expect similar findings in the outputs of other devices. While some signal imperfections can be accounted for in traditional CQEC algorithms, such as the incorporation of auto-correlations into the Bayesian classifier, most of them will not be easy to precisely characterize. It is in these situations that neural networks can best demonstrate their advantage, since they do not require any a priori description of the patterns within the measurement signals, but instead learn them directly from the training data. An interesting direction for further study is the extension of the RNN-based CQEC algorithm to correlated and leakage errors.
A CQEC algorithm should be practical to run on a sub-microsecond timescale, typically using an FPGA or other programmable, low-latency device. The Bayesian model requires division to normalize the posteriors, which is a very costly operation on FPGAs. This makes it challenging to efficiently implement the Bayesian model, although a more practical log-Bayesian approach has recently been developed [43]. The RNN-based model, by contrast, does not require division and avoids this problem. There are many precedents for running RNNs on FPGAs (see e.g. [44]). Since the RNN architecture used in our paper is small in size (more simplifications are discussed in App. D), its computational latency is submicrosecond. Nevertheless, more work will be needed in order to determine how best to interface the RNN with the quantum computer in a feedback loop. For supervised learning, there is the need for generating a sufficient amount of training data that incorporates the error information and the signal features. Further work could focus on determining the minimum amount and type of data that the RNN needs to train effectively, and understand how these needs change as the number of physical qubits in the error code increases.
Given low-latency implementations of the Bayesian and RNN-based models, an obvious next step for future work would be a direct comparison between these CQEC protocols and existing discrete QEC protocols on quantum hardware. Ristè et al. [45] have already demonstrated discrete QEC for a three-qubit bit-flip code on transmons, and recent work by Livingston et al. [22] has implemented a triple threshold CQEC protocol on similar hardware. By running experiments on a given physical device, a full comparison between discrete and continuous CQEC can be made under realistic conditions. Due to the lack of both entangling gates and ancillas, we are optimistic that CQEC could significantly improve the speed and fidelity of many QEC codes.  where r|0 = (2π) −1/4 exp(−r 2 /4) = P 0 (r) is the probes's ground state in the position basis and P 0 (r) is the probability of measuring r when the probe is in the ground state. In the last line, we have used the Hermite polynomials to express the harmonic oscillator's first and second excited states in terms of its ground state.
We determine the probability of measuring a particular outcome r as where the average is taken over the states ρ of the cavity field coupled to the transmons [49]. If we approximate r as a Gaussian variable, we then want to determine the mean and variance of this: Let ∆W be drawn from a Gaussian distribution with variance ∆t. The statistics of the measurement record of r can be reproduced by The voltage operator to be measured will be of the form resulting in a classical voltage where A is a constant scaling factor in units of V · s 1/2 characterising the physical noise power in a certain bandwidth. Using Eq. (B2), the measured voltage V , which is written in terms of has variance that scales as ∆t −1 . The state of the transmons can be inferred from the homodyne measurement voltage in Eq. (B3) [49].
To implement a single parity measurement on two qubits, we dispersively couple two qubits to the same readout resonator. We tune the qubits to have the same dispersive coupling to the resonator so that the states |01 and |10 are indistinguishable on the I-Q plane. By making the dispersive shift χ much larger than the linewidth κ of the resonator, we can make the reflected phase of |00 (close to π) and |11 (close to −π) overlap with one another, making them indistinguishable as well. The reflected phase response is shown in Fig. 9. Altogether we implement a full parity measurement of odd excitations vs. even excitations by measuring the I quadrature.
(E3) These paths are shown in Fig. 10. Strictly speaking, the two sets of solutions apply when there are no dynamics apart from the dispersive measurements.
The measurement strength is defined as [49,52] which scales the separation of the two parity signal means under constant noise variance (see Eq. (1)). In the oddto-even parity transition, the path in phase-space leading up to the steady states forms a tighter spiral as the ratio |χ/κ| gets larger. A tighter spiral translates to a more oscillatory Γ(t), thus leading to a more oscillatory signal mean [10]. Fig. 11, the ring-up transient without clear oscillations is manifested in the measurement strength corresponding to the even-to-odd parity transition in Eq. (E2), whereas the ring-down transient with oscillations is manifested in the measurement strength corresponding to the odd-to-even parity transition in Eq. (E3). They show good agreement with experimental observations, such as those in Fig. 1.

Appendix F: Population of States Subject to
Amplitude Damping or Bit-flips We recall that the population of the excited states P exc is the ensemble population of the states that are at most one bit-flip away from the fully excited state |111 , i.e., P exc = P (|111 ) + P (|110 ) + P (|101 ) + P (|011 ) = P 7 + P 6 + P 5 + P 3 .
Under T 1 decay at zero temperature, the transition matrix evolving the states for time T is J (T ) = exp(Q T ), When the qubit pair goes from an even parity to an odd parity, e.g., |00 → |10 , the blue line is the path of αeg(t) while the blue cross shows the steady state of αgg, obtained from Eq. (E2). When the qubit pair goes from an odd parity to an even parity, e.g., |10 → |00 , the orange spiral curve is the path of αgg while the orange cross shows the steady state of αeg, obtained from Eq. (E3).
where Q is defined as, The double threshold boxcar filter in [20] employs a boxcar averaging of the measurement signals and two thresholds, one fixed at zero and the other at a variable position above zero. We compare the performance of this against the double threshold model (with exponential filter and two variable thresholds) from [21] that was used in this work, by running the state tracking task as described in Sec. V A on Schemes A and D, as shown in Fig. 12. The double threshold method outperforms double threshold boxcar in both schemes at relatively low error rates. The final fidelity with respect to the initial state |000 in Schemes A and D with the double threshold exponential filter (DT) in [21], and the double threshold boxcar filter (DT Boxcar) in [20], as a function of single qubit bit-flip rate γ at an operation time T = 20 µs with a measurement strength Γm = 4.7 µs −1 . Each data point is averaged over 30, 000 quantum trajectories.