Repetitive Readout Enhanced by Machine Learning

Single-shot readout is a key component for scalable quantum information processing. However, many solid-state qubits with favorable properties lack the single-shot readout capability. One solution is to use the repetitive quantum-non-demolition readout technique, where the qubit is correlated with an ancilla, which is subsequently read out. The readout fidelity is therefore limited by the back-action on the qubit from the measurement. Traditionally, a threshold method is taken, where only the total photon count is used to discriminate qubit state, discarding all the information of the back-action hidden in the time trace of repetitive readout measurement. Here we show by using machine learning (ML), one obtains higher readout fidelity by taking advantage of the time trace data. ML is able to identify when back-action happened, and correctly read out the original state. Since the information is already recorded (but usually discarded), this improvement in fidelity does not consume additional experimental time, and could be directly applied to preparation-by-measurement and quantum metrology applications involving repetitive readout.

In the repetitive QND protocol, a controlled-NOT (CNOT) gate is applied to correlate the qubit state to an ancilla, which is subsequently read out ( figure 1(a)). If the readout operator commutes with the qubit's intrinsic Hamiltonian, in other words, if the readout is QND, one can repeat the above process multiple times to increase signal-to-noise ratio, until the desired fidelity is reached.
This protocol is also known as the repetitive readout technique widely adopted in NV research at roomtemperature, where the nuclear spin state (here the 14 N or a 13 C) is repetitively read out with the help of the NV electronic spin [12,19]. In its implementations so far, the spin state was determined by comparing the total photon number collected through all the repetitive readouts with a previously established threshold ( figure 1(b)). The detected photon count numbers are thus divided into two classes, referred to as bright (dark) state of the qubit.
In this threshold method (TM), the readout infidelity can be evaluated from the overlap between the photon count distributions of bright and dark states. Two factors contribute to this overlap: inefficient optical readout [20], including photon shot noise and limited photon collection efficiency; and deviation from the QND condition. The first factor can be improved by embedding the emitter into photonic structures and by using better single photon detectors. The second factor imposes a more fundamental constraint. Indeed, if the readout operator does not fully commute with the system Hamiltonian, back-action from the measurement will eventually limit the number of photons that can be collected before quantum information is destroyed [21,22].
To mitigate this effect, we propose to use the additional information carried by the measurement-induced state perturbation itself. Information about the perturbation is already recorded during typical experiments, in the form of the time trace of photon clicks from the repetitive readouts (figure 1(c)), but is usually discarded in the TM after extracting the total photon number. Identifying the perturbation and tracing back to the unperturbed original state using this information is the key to improving the fidelity of readout.
Unfortunately, finding an elegant analytical approach proves difficult-the complexity of the photodynamics exhibits intrinsic randomness, and the inefficient photon collection process yields noisy data, precluding clean analytical analysis that would take advantage of the additional information. On the other hand, machine learning (ML) is designed to discover hidden data correlations, and it is widely used in classification problems [23]. It has been recently introduced in quantum information tasks to mitigate crosstalks in multi-qubit readout [24], to enhance quantum metrology [25,26], to identify quantum phases of matter and phase transitions [27][28][29], to identify entanglement [30][31][32], and even to determine existence of quantum advantage [33], to name a few. In particular, ML shows success in efficient interpretation of quantum state tomography (QST), by being robust to partial QST and state-preparation-and-measurement (SPAM) errors [32,[34][35][36].
In this work, we apply ML to state discrimination for the repetitive readout of NV center. To design and evaluate the ML method, we use the full information from time trace data generated by quantum Monte-Carlo simulation. We tried different supervised ML methods and mainly focused on a shallow neural network realized using MATLAB ® Neural Net Pattern Recognition tool (npartool). We observed consistent increase in readout fidelity using ML over TM. The improvement in readout fidelity albeit small is robust over a parameter space that covers individual NV differences. One application of our results is in preparation-by-measurement: when one discards less trustworthy measurements, ML yields a more efficient initialization process than TM. of individual time traces before feeding the data to the neural network. W1 (W2) and b1 (b2) are the weights and bias of the hidden (output) layer, which are learnable parameters of the network. The output is the probability p1 (p2) of the state being dark (bright).
Since in our method the training labels are readily available in experiments with very high fidelity [12][13][14][15][16], it can be readily applied to current experiments. Together with the robustness of our method over NV photodynamic parameters, we expect that the improved readout fidelity can be achieved in experiments.

Repetitive readout model and simulation
We consider reading out the native 14 N nuclear spin state through the electronic spin of NV center at roomtemperature as an example. The NV center's ground state is an electronic spin triplet (S = 1), and can be optically polarized to the | = ñ m 0 s state. The other two sublevels | =  ñ m 1 s have additional non-radiative decay channels under optical illumination, allowing optical readout of spin state by fluorescence intensity. The native 14 N nuclear spin is a nuclear spin-1 (I = 1), and couples to the NV center through hyperfine interaction. 14 N does not have optical readout, but it supports a C n NOT e operation (control on nuclear spin and NOT gate on electronic spin): , which correlates the 14 N to the NV state.
In the repetitive readout protocol, the NV starts in | = ñ m 0 s , and a CNOT gate correlates the nuclear spin state to NV. A green laser then reads out the NV state, while also repolarizing it back to | = ñ m 0 s . Under high magnetic field, where the NV and 14 N energies are well separated, this process is approximately QND and can be repeated a few thousand times to accumulate signal, discriminating the bright | = -ñ m 0, 1 14 N in a single shot(figure 1). Still, the high magnetic field cannot fully eliminate back-action of the measurement on 14 N, which is caused by the relatively strong excited state transverse hyperfine interaction ( ) + + --+ A S I S I . This perturbation causes flip-flips between NV and the 14 N, destroying the quantum information. In the TM, this perturbation prevents us from keeping to accumulate useful signal and reduces the fidelity of state discrimination. ML, instead, as we find out, can identify the majority of such flips and therefore improve the readout fidelity. Ultimately, the readout fidelity is limited by flips that occur very early during repetitive readout.
We used simulated data to explore the effectiveness of ML in repetitive readout and to better analyze the source of improvement. To fully capture the photodynamics involved in the repetitive readout process, we employed a 33-level model, considering the NV − electronic and 14 Nnuclear spins and the neutrally charged NV 0 state. The model is described in more detail in the appendix. Most transition rates in the model were accurately measured from independent experiments [37][38][39][40] and we use values from Gupta et al [39]. The excited state NV-14 Ntransverse hyperfine interaction strength and NV − to NV 0 (de)ionization rate at strong laser power were not precisely determined before, and therefore a reasonable range is explored to cover possible variations in individual NVs, based on the results from [12,13,41,42].
In the simulation, we assumed an intermediate magnetic field of 7500G typical for repetitive readout experiments, and a photon collection efficiency of 30%, standard with photonic structures like solid immersion lens or parabolic mirrors on the diamond [43][44][45]. A perfect CNOT gate connecting was assumed. Correspondingly, the dark state is | = + ñ m 1 I , and bright state is | = -ñ m 0, 1 I . We remark that it is possible to use the same protocol to read out 13 C rather than 14 N [13][14][15][16], given wellcharacterized hyperfine interaction strengths [46][47][48][49].

Neural network architecture
The network in nprtool is a two-layer feed-forward neuron network ( figure 1(c)). In all trainings, we used a data set of size 10,000 with a random portion of 15% for validation. The input data is the time trace of single photon detector clicks through the repetitive readout process ( figure 1(c)). Because the total photon count is a good metric for state discrimination, we take the cumulative sum of the time trace of photon detection {x k } before feeding it to the neural network¯= å = . After training, we used a test set of size 4000, which was generated in the same way as the training set but not used in training, to independently test the network. We performed Monte Carlo cross-validation, which typically repeated the aforementioned training process 10 times and the average accuracy was used throughout this work. Error bars represent the standard error of the 10 results.
We found that approximately 12.5 neurons per 1000 repetitions was a good balance between the increase in fidelity and avoidance of overfitting.

Results
We first investigate the influence of the repetition number on readout fidelity. The fidelity F across this manuscript is defined as where F bright and F dark are the percentage of bright and dark states that are correctly read out, respectively. The number of repetition influences the readout fidelity in two ways: 1. A larger repetition number means more photons detected and better separation between photon count distributions of the bright (dark) states ( figure 1(b)). 2. A larger repetition number, however, also implies a longer illumination time and a higher probability of the 14 Nnuclear spin to flip, due to the large transverse hyperfine interaction in the excited state, which mixes the photon count distributions of two initially different states. As a result of these competing effects, there is an optimal repetition number N opt for the TM. On the other hand, the readout fidelity from ML keeps improving as we increase the repetition number even if the increase rate slows down ( figure 2(a)). At N opt , we observed a 0.34% increase in fidelity with ML. Since the time trace input for ML is recorded in all experiments even when intended for TM, this improved fidelity does not consume additional experimental time. One can add more repetitions in the experiment, and harness a further increase as much as 0.57% in readout fidelity (compared to TM at N opt ). The improvement at N>N opt suggests that ML is not only more robust against 14 Nflips, but rather extracts useful information from the flips. This is investigated in more detail later.
As mentioned earlier, the excited state transverse hyperfine interaction strength A ⊥ between NV and 14 N, and (de)ionization rate k ion (k deion ) between NV − and NV 0 under strong illumination have been not yet determined to satisfactory precision. We therefore explored a parameter range to cover realistic values one might encounter in experiment: MHz, where β is a unit-less value proportional to laser power. In the simulation, we choose β such that for any combination of parameters the NV would emit the same total number of photons in the bright state during repetitive readout. Comparisons of TM at N opt , ML at N opt and ML at N=8000 are shown in figure 2(b) under differentÂ k , ion . The trend matches figure 2(a). ML consistently outperforms TM with both repetition numbers chosen.
To better understand how ML achieves higher fidelity, we take a closer look at cases where 14 Nexperienced flip-flops in the excited state, which is a major limit to the TM fidelity. We find the neural network is able to extract information from the time trace input to recognize if a flip has occurred, and recover the original state. Such flips could bring the photon count across the threshold, yielding misclassification when using TM. This is shown in figure 3, where we plot the cumulative sum of the time traces in cases where flip(s) occurred. In figure 3(a), ML correctly assigns all these time traces to their original states, while TM looks only at the total photon count at the end and compares it to the threshold (dashed line), making ∼25% wrong decisions. In figure 3(b), we show instances when ML gave the wrong classification. We notice that in those cases, the 14 Nflip-flops happen at the very beginning, making the time traces indistinguishable from those of the opposite initial state with no flips. There is little hope in correctly reading out these states, posing an ultimate limit to the readout fidelity.
Another important objective of ML is that of generalization. We explore this generalization power by testing the network R trained by {k ion =90 βMHz, A ⊥ =−50 MHz} on data generated with different parameters.
First, we test the network R on different (de)ionization rate {k ion =110 βMHz, A ⊥ =−50 MHz}, obtaining a fidelity of 94.4(1)% from the network R, compared to 96.31(4)% from TM. We attribute this deteriorated performance of ML to the change in the photodynamics. Under the same condition, different k ion change the relative distributions of bright and dark states. This change cannot be compensated by laser intensity, and makes the network R obsolete.
We then tested the network R on data of different transverse hyperfine strengths, A ⊥ ={−40, −30} MHz. Intuitively, a small change in A ⊥ does not change the photoluminescence pattern, but rather modifies the 14 Nflip-flop rate a little, which could be captured by the network, given its ability to recognize the occurrence of flip-flops. Indeed, we observed better fidelity from the network R on A ⊥ =−40 MHz data than TM, and comparable fidelity to TM on A ⊥ =−30 MHz, where the parameter has changed by 40% (table 1). Here we used N opt of the test data for both ML and network R. These results indicated that provided variations in the NV parameters are small, it is possible to use a fixed network R to directly read out any NV, without the need to run experiments to generate the traning data.

Application to initialization by readout
One scenario where even a modest increase in the fidelity can be beneficial is in state preparation-bymeasurement [12][13][14][15][16]. In this is a widely adopted technique, to achieve a higher fidelity of state preparation with the TM, two distinct thresholds are set, < N N dark th and N bright >N th , where N th is the readout threshold. Measurements in between the two thresholds are discarded, as they cannot be assigned to either bright or dark state with enough confidence. This leads to a lengthier state preparation routine. In ML, the neural network assigns each input to a probability p bright (p dark ) of the state being bright (dark). A final step compares p bright , p dark and classifies accordingly. To achieve a higher fidelity, we discard cases where -< < + t p t 0.5 0.5 dark bright , with an adjustable threshold t. We compare the state preparation fidelity from TM and ML, when discarding the  same amount of data, and observe that ML maintains its advantage over TM, and scales more favorably than TM with the ratio of discarded measurements ( figure 4). This enables preparing a high fidelity initial state more efficiently. We observed similar improvement from unsupervised learning (see appendix), agreeing with [50].

Conclusion and outlook
In conclusion, we have shown that ML techniques can exploit the hidden structure in the repetitive readout data of NV center at room-temperature to improve the state measurement fidelity. We used Quantum Monte-Carlo simulation based on a 33-level NV model to generate data for machine learning, and found improved singleshot readout fidelity over the traditional threshold method, that can be attributed to the ML ability to correctly classify a larger number of readout trajectory that are perturbed by the measurement process itself. While we used simulations, generally the training process does not depend on knowledge of the model. In fact, the only information required is the label for the state (| = + ñ m 1 I or | = -ñ m 0, 1 I ), which is readily available in experiments by discarding less trustworthy data [12][13][14][15][16]. One can then use this data to train a network specific to the NV of interest, and expect an increase in readout fidelity in all subsequent repetitive readout experiments, free of any additional experimental time (although at the cost of an increased computational time). Although individual NVs may have slightly different photodynamic parameters, they should be covered by the range we explored in this work, and therefore the improvement in fidelity is expected to be ubiquitous.
In addition, the off-the-shelf MATLAB ® deep learning toolbox we employed greatly reduces the complexities in the neuron network architecture, making this improvement easily reproducible and more accessible to experimentalists.
Though small, the increase in fidelity does not require any additional experimental time, and is readily compatible with experiments using repetitive readout of nuclear spins, including in quantum metrology [51][52][53] to improve sensitivity.
To further shed light on the bright/dark decisions that affect the ML readout fidelity, one could use decision tree learning instead of a neuron network. This could potentially inform optimized readout protocols, with varying illumination times, or help further improve the neuron network architecture. More broadly, ML could be applied to more complex systems, for example to help mitigate crosstalk of fluorescence signals in a solidstate register consisting of a few nearby NV or other color centers [24].

Acknowledgments
This work was supported in part by the NSF grant EFRI-ACQUIRE 1641064 and by Skoltech.
The data that support the findings of this study are openly available at https://doi.org/10.6084/m9.figshare. 9924911.v1. Figure 4. More efficient state preparation-by-measurement. The state readout fidelity increases after discarding less trustworthy measurements and this improves the state preparation. ML always outperforms TM and scales more favorably with the ratio of discarded data. The solid curves are a guide to the eye. Error bars are the standard error of 10 training results, and are smaller than the marker. We used a 33-level model to fully describe the dynamics of NV- 14 Nin the repetitive readout process. This model includes the spin-1 triplet ground and excited states, and singlet metastable state for NV − , the spin-1/2 ground and excited states for NV 0 , and the nuclear spin-1 of 14 N, as illustrated in figure A1. The transition rates directly related to the NV photoluminescence have been precisely determined and reported in various works [37][38][39][40], although with some significant variations. For the simulation we took the values from Gupta et al [39] listed in table A1.
The exact (de)ionization mechanisms under 532nm laser illumination have not been yet determined experimentally, neither have the (de)ionization rate under laser-power comparable to the saturation power (measurement under weak power can be found in [54][55][56]). Here we assume the (de)ionization k ion (k deion ) occurs only in the excited states, and obeys selection rules as illustrated in figure A1. To maintain the experimentally determined 70/30 ratio [54] between the charge states, we set k deion =2k ion . The ionization rate is proportional to the laser intensity, which is swept around k ion ≈90β MHz, in accordance with [13].
When the magnetic field is applied along the NV-axis, the ground state NV- 14 NHamiltonian has negligible effect on the repetitive readout, thus it is not considered in the numerical simulation. The NV − excited state Hamiltonian reads: where S and I are the electronic and nuclear spin operators, Δ es =1.42 GHz is the zero-field splitting of the electronic spin, Q=−4.945 MHz the nuclear quadrupole interaction [57], and γ e =2.802 MHz/G and γ n =−0.308 kHz G -1 the electronic and nuclear gyromagnetic ratios. The hyperfine interaction term is diagonal due to symmetry: where A P =−40 MHz were determined via ODMR experiment [58]. A ⊥ was believed to be similar to A P and is recently measured between −40 and −50MHz [41].
The NV 0 excited state Hamiltonian takes the form: 3 z e z n z 0 2 Figure A1. The 33-level NV model used in our simulation, consisting of 11 electronic spin levels times 3 nuclear spin levels (level spacings not to scale). k r , k 47 (=k 67 ), k 57 , k 71 (=k 73 ), k 72 and k ion are incoherent transition rates connecting the corresponding energy levels. The optical transition rate k r between excited state and ground state are set equal for NV − and NV 0 , and are assumed to be spinconservative (spin non-conservative part is <1% [37]). β is a dimensionless parameter given by the ratio of the laser power to the optical transition rate. ( ) k de ion is the (de)ionization rate. We assume the (de)ionization happens in the excited state and follows the selection rules depicted by the brown arrows. . We mitigate this issue by employing the Born-Oppenheimer approximation [59] in our numerical simulation, and average out the fast oscillation at ω as following.
We define δp mn as the transition probability from the state | ñ m to | ñ n in the time step δt. Starting from | ( Notice that | | ( ) | y á ñ i t 2 is periodic with period 2π/ω, which is much smaller than the time step dt k 1 ij . Thus, we assume only the average effect of this oscillation is seen in each time step, and numerically find Appendix B. Machine learning discussions B.1. Recurrent neural network Recurrent neural network (RNN) is a commonly used architecture specializing in time-series data with the capability to understand the correlation within the time-series. In the main text, we showed results obtained using shallow neural network. In order to see if we gain by exploiting the correlation within the time series we also tested the performance of an advanced recurrent neural network: long short-term memory (LSTM). Due to the nature of recurrent neural network, the training process is very time-consuming and therefore not suitable for exploring multiple parameters in our model. To speed up the training process, we averaged the input time trace data over 100 realizations, to greatly reduce the training set dimension. Indeed, this may have caused some loss of information. The result though still consistently outperforms the TM and is comparable to the shallow neural network shown in the main text (see table B1). One remark is that we did not take the cumulative sum for the input data, because LSTM specializes in time series data and is able to recognize some quasi-periodic patterns.

B.2. Unsupervised learning
In the main text we compared the enhanced fidelities of TM and supervised learning after discarding less trustworthy data. Another possibility is to use unsupervised learning [50]. This method is of interest because unsupervised learning does not require any well-labelled data. We implemented the k-means algorithm that classifies a given data set into k different groups.
We first use the TM readout to obtain a bright (dark) group of measurement trajectories. We then perform k-means on the bright (dark) group to further classify it into k subgroups. The fidelity increases when we discard the smallest subgroup. Compared to the TM, k-means gives better fidelity as shown in figure B1, Table B1. Comparison between the fidelity obtained through TM, ML and LSTM under different parameters. All training and testings were conducted at the N opt of that set of parameters. Overall, the LSTM algorithm has similar performance compared with the shallow neural network.