Artificial Neural Network Syndrome Decoding on IBM Quantum Processors

Syndrome decoding is an integral but computationally demanding step in the implementation of quantum error correction for fault-tolerant quantum computing. Here, we report the development and benchmarking of Artificial Neural Network (ANN) decoding on IBM Quantum Processors. We demonstrate that ANNs can efficiently decode syndrome measurement data from heavy-hexagonal code architecture and apply appropriate corrections to facilitate error protection. The current physical error rates of IBM devices are above the code's threshold and restrict the scope of our ANN decoder for logical error rate suppression. However, our work confirms the applicability of ANN decoding methods of syndrome data retrieved from experimental devices and establishes machine learning as a promising pathway for quantum error correction when quantum devices with below threshold error rates become available in the near future.

The development of quantum processors has made remarkable progress over the last few years with quantum devices consisting of more than 100 qubits currently accessible from multiple developers [1][2][3].In principle, 100 qubits could allow computations intractable on classical supercomputers, however, the computational capabilities of the current generation of quantum processors are limited by high levels of physical noise [4].Several studies have implemented and tested error mitigation strategies to suppress the detrimental impact of noise with varying levels of success [5][6][7][8].Ultimately, the full power of quantum computers can only be unleashed when Quantum Error Correction (QEC) techniques are implemented.These will allow efficient and scalable detection and correction of errors in quantum circuits, leading to fault-tolerant quantum computations [9][10][11][12].Over the recent decades, QEC codes have been theoretically developed to provide a means to suppress errors on logical information through the use of encoding in a larger Hilbert space [12][13][14][15].One of the leading QEC codes is the surface code, which offers a high logical error rate threshold based on nearest neighbour interactions between qubits on a two-dimensional lattice [10,16].The implementation of surface code based QEC requires the classical processing of syndrome data -related to the physical error locationsto find appropriate corrections for physical qubits.However, this step, known as decoding, is a computationally intensive task.Recent work has theoretically shown that Artificial Neural Network (ANN) based decoders can facilitate fast and scalable decoding [17][18][19][20][21][22][23][24] which is crucial to prevent the accumulation of errors during any quantum computation.The next major milestone is to implement an ANN based syndrome decoder on quantum processors to directly benchmark their performance.This has only been reported by one recent paper to-date, which is based on quantum devices developed by the Google team [25].
In this work, we develop an ANN based syndrome decoder and experimentally implement it on IBM Quantum Pro-cessors.Further, we assess its performance through comparison against the well established graph-based Minimum Weight Perfect Matching (MWPM) technique, using Py-Matching [26].Our work shows that, in principle, ANN based syndrome decoders can efficiently process syndrome measurement data from IBM devices and suggest appropriate corrections -achieving a crucial step in the pipeline of QEC on quantum computational devices.
Historically, the development of surface code literature has been primarily based on the square lattice arrangement of qubits [10,16,27], however the architecture of the IBM Quantum Processors is built on a heavy-hexagonal (HH) arrangement of qubits, as shown in Figure 1 (a).The motivation for such a qubit layout was to reduce the local connectivity of qubits.This addressed the physical difficulty of controlling many connections to each qubit and aimed to reduce cross-talk noise [28].However, the HH format required the modification of the traditional square surface code construction to a hexagonal architecture, with ancillary qubits -changing the underlying circuit structures for syndrome measurement.In 2020, Chamberland et al. laid out the foundational framework for QEC on heavy-hexagonal and heavy-square lattices of low-degree locally connected qubits [28] -introducing the HH QEC code.This original HH code was optimised to minimise the number of required physical qubits by removing some ancillary qubits on the boundaries of the hexagonal lattice and maintaining a lattice connectivity of at most, three [28].However, IBM have developed increasingly large devices on HH lattices, without the original optimisation of boundaries [29], as shown in Figure 1 (a), as the original code layout was incompatible with being realised in the bulk of a HH lattice.This created a discrepancy between the HH code proposed by Chamberland et al. and the HH layout of physical qubits in IBM devices (see Supplementary Material Section S1 for details on the adjustment made).To address this disparity, we have modified the existing HH code, by adjusting the original prescription's boundaries to fit with the bulk.This conforms with the IBM Quantum Processor layout, which is a crucial step in the direct implementation and benchmarking of our ANN decoder on IBM devices.A recent work by Sundaresan et al. has also looked at the modified HH code for distance three measurements [30].However, our work is  show the qubits of a HH code, with orange circles representing the data qubits and light/dark grey circles representing the ancillary flag and measurement qubits respectively.Connecting lines represent the connectivity of two-qubit gates within the lattice.c. shows multiple cycles of the HH error syndrome measurement in the presence of circuit noise, including state preparation, readout and idle qubit errors.d. illustrates the circuits for X and Z gauge operator measurement of the HH code.e. represents an ANN based syndrome decoder as developed in this work.A large input layer takes the measurements over d cycles, and it linearly decreases over four layers to an output which is the size of the number of data qubits.f. shows a possible correction being sampled from the prediction give by the ANN based syndrome decoder.The appropriate correction is then applied to the IBM device.
distinct as we investigate HH code threshold plots and implementation comparison between distance three and five codes based on direct measurements on IBM devices.
Figure 1 (b) schematically illustrates a distance three patch of the adjusted HH code shape as described within our work, where data qubits (orange) store useful information and ancilla qubits (grey) are used to facilitate syndrome measurements.These measurements are used to locate errors on physical qubits within the HH lattice.Typically, the syndrome measurements are collected in multiple rounds before they are decoded to find appropriate corrections for data qubit errors and also in the syndrome measurement process itself.Figure 1 (c) schematically shows many cycles of the HH code being executed and corresponding syndromes measured for each cycle.The circuits that are used to measure the syndrome in both the X and Z basis are shown in Figure 1 (d), with the physical qubits numbered in Figure 1 (b) illustrated within a dashed box.
The data collected from syndrome measurement over several cycles is processed by a classical syndrome decoding method.This prescribes adequate corrections to fix physi-cal errors in data qubits and restore the logical state of the lattice.The construction of an efficient and scalable syndrome decoder is a challenging computational problem and has recently been the focus of intensive research [31,32].One of the leading syndrome decoder algorithms, MWPM, calculates corrections by matching pairs of changed stabilisers.It has received extensive development in many square lattice surface code studies [16,26,[33][34][35][36]. Chamberland et al. implemented the MWPM algorithm to the original HH layout to compute logical error rate curves for both X and Z logical errors [28].
We benchmarked the adjusted HH code using the MWPM decoder from the Python package PyMatching [26] and compared it to the Chamberland et al. work.In Figure 2, odd distances, d, of the code between three to eleven are tested and the lowest clear crossover point can be seen at approximately 0.0007 in the X logical error plot on the left.This will be the benchmark for thresholds for the adjusted HH QEC code, as only distances three and five are experimentally tested.Note that the threshold if computed based on increaseing code distance would be slightly higher ( 0.001).We used the MWPM as implemented by PyMatch- ing to confirm Chamberland et al.'s X logical error threshold of 0.0045, and a very similar threshold of 0.005 was found.Details of this can be found in the Supplementary Material Section S2.The addition of 2d − 2 extra ancilla qubits and 2d − 2 of CNOTs has lowered the threshold physical error probability further by a small amount.
Despite promising performance, it has been regularly discussed that the MWPM algorithm may not be fast enough for quantum state coherence times on current devices [27,[37][38][39].Even the best adaptations of this algorithm are slow in the large distance regime of QEC codes.The development of fast and scalable syndrome decoders is beginning its investigation, with recent proposals attempting to address the real-time decoding challenge [32].Machine Learning (ML) based syndrome decoder construction has gained significant momentum in recent years, with some studies indicating that a faster and scalable syndrome decoding method may be possible by leveraging the computational efficiency and flexibility of ANN algorithms.For example, in the case of square surface code lattices, it has been shown that ANN syndrome decoders can offer highly promising performance when suggesting suitable corrections [17,19,38,[40][41][42], including testing on experimental data [25].The low-level decoders developed in [17,38,40] were built in a similar manner to this work.They each show the ability of an ANN to learn the relationship between syndrome data and corrections after being given multiple training instances.The ANN decoder developed in this work was constructed using the Python package TensorFlow [45].The decoder consists of an input, two hidden, and an output layer.The choice for the number of hidden layers is based on the decoder performance.Limited overfitting of training data occurred when two hidden layers were included.Utilising exclusively dense layers is the simplest layer structure of a neural network and requires no additional pruning or alterations [17].This methodology allows for the quick proof of concept construction of an ANN syndrome decoder for physical devices, and can give suitable corrections without much pre-/post-processing.Given that the input layer takes the entirety of the syndrome measurement at once, there is no need to explicitly distinguish between bulk stabilisers and boundary stabilisers when training the network.
The network is able to learn the direct relationship between observed syndrome patterns and appropriate corrections without needing to perform auxiliary tasks after corrections are applied -similar to the MWPM algorithm.The MWPM algorithm can provide exact corrections by pairing −1 eigenvalue stabilisers without the need for any pre-/post-processing, but is lacking in the speed of suggestion, especially as the distance of the code increases.to feed in each X and Z stabiliser measurement separately, for each measurement cycle.The size of the output layer allows a value for both X and Z errors for each physical qubit.At the smallest distance, three, the size of input and output layers are the same, however the input layer size grows significantly faster than the output layer when the distance of the code is increased.Each entry in the input corresponds to a single stabiliser measurement, with the total equalling the number of stabilisers, n multiplied by the number of cycles, d; d 2 d 2 + 2d − 3 .Similarly, each output pair corresponds to a single data qubit requiring X and Z correction respectively.The total output size is 2d 2 .
Each layer is activated with the ReLU activation function, excluding the final layer which incorporates a Sigmoid function, to return values between 0 and 1.The BinaryCrossEntropy loss function was used, allowing the network output to be interpreted as a probability that an error was present at each qubit.Each value in the output of the final layer will be a value between 0 and 1, which are then processed in two ways.First, the values are truncated, such that each value in the correction suggestion is exactly 0 or 1, which corresponds to a given correction being not required or required, respectively.If this correction is consistent with the final syndrome measurement cycle, the truncated prediction is kept.If not, the predictions given are sampled using a Bernoulli Trial, and this is repeated until an appropriate correction is given [40].Sampling of a prediction could take many re-tries if the network is uncertain with its prediction.Therefore a cut-off point is used, where after n re-samples, if no appropriate correction is given, resampling is stopped and it is assumed that a logical error has occurred in that instance [40].Although there is theoretically a 50% chance that a logical error has occurred, for benchmarking purposes, the occurrence of a logical error is assumed and the additional logical errors are reflected in Figure 3. Re-sampling can be a major overhead computationally, furthering the need to cut-off early, before qubits decohere within the structure.Given that this work only considered small distances of the HH QEC code, truncation of predictions often produced an appropriate correctionnot always requiring re-sampling.The coherence time of current qubits is on the order of microseconds and this work's re-sample time is also on the order of microseconds, forcing re-sampling to be avoided as much as possible [46].This dense ANN methodology is fast enough to produce corrections within the coherence time of physical qubits in the lattice for these small distances [46].The time taken to find corrections increases with code distance, and may not appropriate for large distance codes.Instead, CNN techniques can be employed for decoding large distance codes [18,20,47,48].
Our ANN decoder was rigorously trained on millions of simulated noise patterns, using uniform depolarising Pauli channels.The uniform depolarising noise model was simulated with an even chance, p 3 , to select from the three Pauli gate errors X, Y and Z.Each qubit can experience each of these errors, and each CNOT on the lattice can experience some tensor product of two Pauli gate errors and the identity, excluding I ⊗ I.No bias or other error factors were included in this training.During training, circuits were modelled such that when Pauli errors occur on a state, |ψ⟩, it may be denoted as E |ψ⟩ where E is the combination of errors on a single qubit.The goal of error correction is to detect and apply the appropriate correction to |ψ⟩ to turn the string of errors E into the identity, I, or to return the lattice to an equivalent logical state.We compute a correction E c such that the correction succeeds if E c E ∈ G where G is the corresponding gauge group.This is simulated within this work by tracking each error which occurs on every qubit and multiplying the Pauli gate errors, where two of the same give the identity; X 2 = Y 2 = Z 2 = I, and XZ = ZX = Y up to a global phase.
The ANN decoder developed in our work provides appropriate corrections based on the syndrome measurements over d cycles of the adjusted HH code.The ANN is able to functionally learn how stabiliser inversions are related to error chains within the lattice, including on the boundaries of the lattice where chains abruptly end.This work has explicitly shown that a dense ANN syndrome decoder can input an exact stabiliser syndrome measurement and return a prediction related to an appropriate correction on par or better than suggestions from the MWPM algorithm.
First evaluation of the model was done with a similar error model to the training; an uniform depolarising noise model.The underlying physical error, p was varied to test the performance at different rates.Further, the decoders were then tested on imported device error models from IBM quantum experience; each physical error rate was given for qubits and two-qubit gates which was then used as the underlying error probability p.This was implemented for each individual qubit and CNOT, instead of uniformly across the lattice.Finally, the circuits as defined in Figure 1 (d) were constructed to fit distance three and five HH QEC codes and executed on multiple IBM devices.Figure 3 (a) and (b) display experimental and theoretical results from our ANN syndrome decoder.The decoder is tested on a simulated lattice of qubits in the form of IBM devices which suffer from uniform circuit-based noise (blue and orange line plots) and also on device noise models de-rived from error rates provided by five of IBM quantum processors (marked open circle points).In these plots, similar crossover behaviour is observed, and thus it can be inferred that the ANN syndrome decoder is able to decode the HH QEC code with the same overall properties as the MWPM algorithm.Note that the threshold for the ANN syndrome decoder is approximately 0.0005 for X logical errors, and hence reduced by a a small amount compared to the MWPM threshold of 0.0007 from Figure 2 (a).In future, more sophisticated ML-based syndrome decoders, such as CNN decoders, can be designed to improve the threshold and scale to larger distances [20,42,44,47].
In Figures 3 (a The horizontal uncertainty for each marking corresponds to the possible values of average physical error for each available sub-graph location, chosen with a heuristic described in Supplementary Material Section S3, with the marking corresponding to the median location.Interestingly, the markings are in the approximate region of the simulated noise curves.This suggests that the ANN syndrome decoder is likely to be able to decode actual noise approximately as well as simulated noise.Due to the above threshold error rates of current physical machines, distances above five were not tested, since this would only increase the logical error rate and may not provide additional insight.3 (a) and (b), as well as experimental points (coloured circles) for a direct comparison.For the experimental points, the adjusted HH QEC code syndrome measurement circuits were created and run on physically realised IBM devices.Each circuit was initialised twice, once for X measurements and once for Z measurements, and 10,000 shots were run for each case.The number of logical errors which occurred after the pass through of the ANN syndrome decoder was lower on average that the simulated noise models of the same devices for distance three, and roughly similar for distance five.Given that the points are still all within the same area or lower, it would follow that if the devices error rates were below the threshold of approximately 0.0005, that increasing the distance of the code, and using a suitable ANN syndrome decoder, would facilitate fault-tolerant quantum computation [34].
Note that in Figure 3 (b), that the device derived error models seem to consistently provide lower logical error rates than equivalent uniform error models.This suggests that there is some intricate phenomenon occurring which may be related to sub-graph location choice.Compared to what is expected under the uniform noise model, this results in the reduction of the rate of Z logical errors, which corrupt X logical operator values.This is not observed in Figure 3 (d) however, as the experimental data is not lower than the simulated uniform noise curve on average.Crosstalk and relaxation errors are missing in the simulated noise model but are possibly present on the physically realised devices, perhaps leading to this variation between experiment and simulation [49,50].
Despite the expeditious advances in quantum hardware, fault-tolerant quantum computation is still distant on the horizon.However, this work, for the first time, showed that the adjusted HH code which matches the IBM quantum machine structure, is able to be decoded by both the MWPM algorithm, and an ANN syndrome decoder.A dense ANN was shown to be compatible with the adjusted HH code and to perform experimentally accordance to the error rates present on the devices.The experimental results in this work showed that stabiliser circuit decoding approximately followed the theoretical curve's trend.It is therefore likely that lowering the physical error rate below the threshold will allow for arbitrary suppression of logical errors with code distance increase.This work's dense style ANN lays the foundation of ANN decoding on physically realised IBM quantum machines.
In future, our work could be extended with the benchmarking of larger distance code implementations on IBM devices to demonstrate the expected drop in logical error rates with respect to code distance.However, this would require larger physical devices and devices with error rates below the code threshold.A second line of study could be to implement and test more sophisticated ML-based decoders -such as CNN syndrome decoders -on quantum devices.In summary, our work has opened new avenues for experimentally realised, ML-based syndrome decoder implementation on quantum processors.This will be instrumental in realising fault-tolerant quantum computing in the near future, where larger size and lower error rate devices are anticipated to be available.

Supplementary Material for
"Artificial Neural Network Syndrome Decoding on IBM Quantum Processors"

S1. HEAVY HEXAGON ADJUSTMENT
Across the structure of the HH code, qubits are labelled as either data, flag or measurement qubits.These different qubit types are what facilitate the locating of errors in the HH code.These form the basis of the stabiliser formalism for QEC codes.Although IBM quantum processing devices have been developed for some years, the HH code which directly corresponds to the physical layout has not been discussed often, with only a few current works directly implementing the adjusted HH structure on superconducting transmon qubits [30,50].Within the main text, it is stated that the HH boundary optimisation was not included when IBM physically realised their quantum devices.
SUPPLEMENTARY FIG.S1.Heavy Hexagon Boundary Adjustment.The adjustment is made from the original HH QEC code structure (left), to the HH QEC code structure which fits to current IBM devices (right).The dashed box highlights the difference in boundary between the un-adjusted and adjusted codes.Circles represent qubits in the lattice, with larger and smaller being data and ancilla respectively.Yellow (X) and green (Z) squares refer to the stabilisers as measured by the flag and measurement qubits in the lattice.CNOTs are drawn on the lattice with corresponding directions for acts and controls.
In the Figure S1, the boundary optimisation shown on the left is removed on the right.The structure shown on the right hand side is physically implementable on IBM devices.Within the adjusted HH lattice on the right of Figure S1, there are three types of stabiliser generator; the X-type Bacon-Shor style operators; the weight-four Z-type plaquette operators, found in the bulk; and the weight-two Z-type edge operators; and n ∈ N ≤ d, and i + j = even in the second set.Here, i, j refer to the Z Gauge Measurement SUPPLEMENTARY FIG.S2.Heavy Hexagon Gauge Measurement Circuits.These two illustrated circuits describe the gauge operator measurements within the HH lattice.Orange circles represent data qubits, white circles represent flag qubits and black circles represent measurement qubits.These are numbered to match the numbering of qubits in Figure 1 of the main text.Flag/measurement qubits are initialised and measured at the beginning and final timesteps with CNOTs connecting the qubits at corresponding timesteps.lattice of data qubits, with i as rows and j as columns.The stabiliser group, as used in QEC codes, is sufficiently defined by the stabiliser generators which form the entire group after all multiple combinations.Given the boundary conditions of the device, the edge operators are found along the top and bottom of the lattice when arranged in the alignment of Figure S1.This is to ensure that operators do not act on non-existent qubits.The result of measurement of the stabilisers across the lattice is the syndrome measurement.These generators mutually commute, allowing for their collective simultaneous measurement.Given that there are many ancillary qubits on the lattice, gauge operators are defined to localised areas to measure the local parity, and the stabilisers of each kind measure the parity of gauge operators of each kind.The gauge operators are defined as; and; for X and Z gauge operators respectively, where i, j ∈ N ≤ d, m ∈ N ≤ d−1 2 .A constraint of i + j = odd must be used for the first term in the X gauge operator set.The measurements of these gauge operators and hence stabilisers can be facilitated by the gauge operator circuit diagrams illustrated in Figure S2.PyMatching to confirm the threshold value to later confirm with the threshold value of the adjusted HH structure.In Figure S3, the threshold of 0.005 is given for the X logical error rate, which can be compared to the one given by Chamberland et al. of 0.0045 [28].These values are very similar and hence the value of 0.005 was used for comparison in the main text.

S3. SUB-GRAPH LOCATIONS
The five devices which were tested are capable of sustaining more than d = 5.Brisbane, Cusco, Nazca and Sherbrooke all contain 127 qubits and therefore can hold d = 7, and Seattle has 433 qubits and holds d = 13.Consequently, the position of smaller sub-graphs on these devices is important, since each qubit and pair of connected qubits are no longer uniform in their physical error probability.As illustrated in Figure S5, different sub-graph locations have different error rates, shown by the different qubit and connection colours.Given the variability of each sub-graph location, we constructed a systematic heuristic test which aimed to capture the overall suitability of each sub-graph location within the lattice of qubits.Using MWPM, each possible location was tested with only one simulated error source at a time.We were able to show how the logical error rate would be affected by each source.The results from this experiment were averaged and are shown in Table S2.
This table describes how the logical error rate would be affected by each physical error source, given the same underlying physical error rate, p.Using this test, we were able to heuristically rank every possible sub-graph location within each devices structure.Note that this test was completed with simulated noise only, and is not directly or explicitly related to the physically realised devices.For the plots generated in the main text, the rankings were used to create the horizontal uncertainty for circle device markings.The lowest ranked sub-graph's average physical error rate was used as the lower uncertainty value and highest ranked sub-graph's average physical error rate as the upper.The marking was placed on the median sub-graph's average error rate.

FIG. 1 .
FIG.1.Neural Network Decoder Framework.a. shows the lattice connectivity of qubits of a 127 qubit device developed by IBM with colour range denoting error probabilities associated with single and two qubit gates; lighter being more error prone.The shaded section represents a subsection of this device where the average error rate is lowest, in a region which supports a d = 3 HH error correction code.Dotted outlines indicate some other possible subgraph locations.b. show the qubits of a HH code, with orange circles representing the data qubits and light/dark grey circles representing the ancillary flag and measurement qubits respectively.Connecting lines represent the connectivity of two-qubit gates within the lattice.c. shows multiple cycles of the HH error syndrome measurement in the presence of circuit noise, including state preparation, readout and idle qubit errors.d. illustrates the circuits for X and Z gauge operator measurement of the HH code.e. represents an ANN based syndrome decoder as developed in this work.A large input layer takes the measurements over d cycles, and it linearly decreases over four layers to an output which is the size of the number of data qubits.f. shows a possible correction being sampled from the prediction give by the ANN based syndrome decoder.The appropriate correction is then applied to the IBM device.

FIG. 2 .
FIG. 2. Benchmarking of the adjusted Heavy Hexagon Code with MWPM.Both the threshold and psuedo-threshold for X logical errors (left) and Z logical errors (right) for the adjusted HH code are shown; decoded by MWPM as implemeted by PyMatching.Error bars are assigned with a probit corresponding to 97.5%.

FIG. 3 .
FIG.3.Neural Network Decoder Implementation on Adjusted Heavy Hexagon Code.Threshold plot for the adjusted HH code decoded by an ANN showing error rates of the X logical operator (a.) and Z logical operator (b.).Each point refers to an error model derived for each IBM device.The horizontal value of the points shown are the overall error rate of the specific sub-graph location chosen, and the horizontal uncertainty shows the range of overall error rates of each possible sub-graph location on each device, with the point placed on the median heuristic sub-graph score.Vertical confidence is found with a probit corresponding to 95.0%.c. and d. refer to the HH QEC code experimental circuit plots.The top right hand corner of a. and b. is enlarged, and the points which refer to the circuits running on the IBM devices are also marked.Unfilled circles refer to the simulated noise model corrections and filled circles refer to the transpiled circuits run on devices.
) and (b), the blue circle markings correspond to distance three sub-graphs and orange for distance five.Each data point has an alphabetical label showing the name of IBM device: b = ibm brisbane, c = ibm cusco, n = ibm nazca, s = ibm sherbrooke, and se = ibm seattle.

Figure 3 (
Figure 3 (c) and (d) plot results based on direct measurements from the IBM quantum processors.The plots show both device noise simulations (open circles) from Figure3(a) and (b), as well as experimental points (coloured circles) for a direct comparison.For the experimental points, the adjusted HH QEC code syndrome measurement circuits were created and run on physically realised IBM devices.Each circuit was initialised twice, once for X measurements and once for Z measurements, and 10,000 shots were run for each case.The number of logical errors which occurred after the pass through of the ANN syndrome decoder was lower on average that the simulated noise models of the same devices for distance three, and roughly similar for distance five.Given that the points are still all within the same area or lower, it would follow that if the devices error rates were below the threshold of approximately 0.0005, that increasing the distance of the code, and using a suitable ANN syndrome decoder, would facilitate fault-tolerant quantum computation[34].

Figure
FigureS4illustrates the overall layout of a d = 5 adjusted HH code with some data errors and corresponding stabiliser measurements.
The logical error influence of each individual physical error source, given the same physical error rate.