Generation and verification of 27-qubit Greenberger-Horne-Zeilinger states in a superconducting quantum computer

Generating and detecting genuine multipartite entanglement (GME) of sizeable quantum states prepared on physical devices is an important benchmark for highlighting the progress of near-term quantum computers. A common approach to certify GME is to prepare a Greenberger-Horne-Zeilinger (GHZ) state and measure a GHZ fidelity of at least 0.5. We measure the fidelities using multiple quantum coherences of GHZ states on 11 to 27 qubits prepared on the IBM Quantum ibmq_montreal device. Combinations of quantum readout error mitigation (QREM) and parity verification error detection are applied to the states. A fidelity of $0.546 \pm 0.017$ was recorded for a 27-qubit GHZ state when QREM was used, demonstrating GME across the full device with a confidence level of 98.6%. We benchmarked the effect of parity verification on GHZ fidelity for two GHZ state preparation embeddings on the heavy-hexagon architecture. The results show that the effect of parity verification, while relatively modest, led to a detectable improvement of GHZ fidelity.


INTRODUCTION
In the race to scale-up quantum computers and demonstrate quantum advantage, an important technical milestone is the generation of entanglement across a device. Entanglement -or the inability to factorise a multi-qubit system into separable states -is typically seen as the essence of what differentiates quantum behaviour from classical. Indeed, it has been shown that quantum systems with low or no amounts of entanglement can be simulated efficiently on a classical computer [1][2][3]. For this reason, the ability to generate and maintain genuine multipartite entanglement (GME) is fundamental for quantum information processors (QIPs) to outperform classical computers.
There are various benchmarks that indicate the capabilities of a given QIP. For example, quantum volume [4] is a holistic number that takes into account qubit number and error rates of a device. Although bipartite entanglement can be rigorously quantified, it remains an open problem to do the same for GME. To demonstrate and validate GME, we select a state that is known to be entangled across multiple qubits and assess how well a device can construct that state. In particular, Greenberger-Horne-Zeilinger (GHZ) states are well suited to this purpose -in that they are GME states whose fidelity on a QIP can be efficiently estimated. The fidelity estimate comes from a combination of measuring the populations of the qubit states as well as their coherences. An approach used to detect GME is to use GME witness operators [5][6][7]. A negative expectation value with respect to a target state is a sufficient but not necessary condition for the state containing GME. It has been shown that measuring a GHZ fidelity of at least 0.5 is equivalent to measuring a negative GME witness expectation value, hence implying that the state exhibits GME [8].
In this work we create large GHZ states up to 27 qubits on a physical device and measure their fidelities. The experiments are performed on the IBM Quantum ibmq montreal device, which consists of 27 superconducting transmon qubits [9]. The device is from the series of IBM Quantum Falcon processors and was recently benchmarked at having a quantum volume of 64 [10]. When constructing states on QIPs, the actual entanglement of the states within the devices may be acceptably high, however the observed entanglement could be betrayed by erroneous measurement results. Using a quantum detector tomography (QDT) technique, measurement errors can be sufficiently understood and classically inverted to estimate the pre-measurement states [11]. We employ a well-known quantum readout error mitigation (QREM) technique to implement measurement correction within our experiments [12] which has previously been used in the certification of an 18-qubit GHZ state [13]. We investigate and justify the assumptions of this method when applied to the prepared noisy GHZ states to ensure that the corrections made do not inflate the actual fidelity of the states within the device. GME has been demonstrated copiously in the literature across many different QIP architectures. A plot summarising this history of results for state sizes N ≥ 3 qubits within gate-based quantum systems is shown in Figure 1. With QREM applied, we record a fidelity of 0.546±0.017 with 98.6% confidence for being above the 0.5 threshold for a 27-qubit GHZ state, which appears to be the largest demonstration of GME to-date [8,. Error bars represent the standard error (of the mean). Beyond implementation of QREM, we investigate the use of parity verification on the fidelity of the GHZ states created. Parity verifica-tion is a fundamental error detection protocol used within quantum error correction schemes towards the realisation of large-scale fault-tolerant quantum computing. Ancilla qubits are used to measure the parity of state qubits to detect errors, enabling erroneous computations to be corrected or discarded. We benchmark the effects of parity verification on entanglement generation for various sized GHZ states prepared on the ibmq montreal device. This work highlights the technical achievement in quantum hardware and the positive progress towards the realisation of practical quantum computers.

RESULTS
Detecting genuine multipartite entanglement in GHZ states GHZ states are highly entangled states. They can be prepared in gate-based quantum devices by initialising a single primary qubit to the |+ state and the other qubits to the |0 state, then CNOT gates are iteratively applied from the primary qubit (or any other qubit that has already had a CNOT applied in this manner) to each other qubit involved in the state, as shown in Figure 2. A GHZ state can be expressed as where N is the number of qubits in the state. This state has a convenient structure for detecting genuine multipartite entanglement since its density matrix only consists of non-zero entries in each of the the four corners. The GHZ fidelity, F , can be calculated by measuring two observables population P and coherence C as where P = ρ 00...0,00...0 + ρ 11...1,11...1 can be directly measured as the GHZ populations and C = |ρ 11...1,00...0 | + |ρ 00...0,11...1 | can be measured through parity oscillations [8,22], or multiple quantum coherences (MQC) [13,[42][43][44]. A GHZ fidelity greater than 0.5 is sufficient to demonstrate that the state exhibits GME [8]. The following description of the process for measuring the fidelity is adapted from the details presented in Appendix B. To measure the coherence, we use the method of MQC due to its promising robustness to noise by using a refocusing π-pulse similar to a Hahn echo [45] to refocus low frequency noise and reduce dephasing [13]. Since X gates stabilise the GHZ state, this does not affect fidelity computations, and the usual second pulse used to cancel the first in refocusing pulse sequences can be omitted. The coherence can be calculated by measuring the overlap is produced by rotating each qubit of the state ρ by φ about the Z-axis. Essentially, S φ is the probability of measuring the zero state after encoding the GHZ state, applying the φ phase rotations, then decoding the GHZ state. For an ideal GHZ state, the phase produced by rotating each individual qubit accumulates and is equivalent to adding a phase of N φ to the state, i.e.
1 √ 2 (|00 . . . 0 +e −iN φ |11 . . . 1 ), which reduces the overlap signal to where the constant term corresponds to the diagonal elements of the density matrix and the cosine term corresponds to the off-diagonal corner elements. By Fourier transforming S φ we can obtain the MQC amplitudes where φ = πj N +1 for j = 0, 1, . . . , 2N + 1 to detect up to frequency N + 1 and N = 2N + 2 is the number of angles φ in the summation. The coherence is then calculated as C = 2 √ I N . There is a slow free rotation that occurs in idle qubits throughout the computation. To help alleviate these drift effects, a refocusing π-pulse applied between the GHZ encoding and decoding steps is used. The procedure to compute each of the overlap signals S φ consists of the following steps.
1. Encode the GHZ state by applying the preparation circuit over the desired qubits.
2. Apply a refocusing π-pulse as an X gate on each qubit.
3. Apply the phase gate rotation of φ about the Zbasis.
4. Decode the GHZ state by applying the inverse of its preparation circuit.
5. The overlap signal S φ is then the probability of measuring the zero state |00 . . . 0 .
Further details involved in expressing the fidelity concretely are presented in Appendix B.

Quantum readout-error mitigation (QREM)
On noisy intermediate-scale quantum (NISQ) hardware, measurement represents one of the largest singlecomponent error sources. This is especially significant in low-depth circuits where the number of qubit readouts is comparable to the total number of gates applied. Typical readout error rates of a few percent per qubit can quickly scramble the results of any quantum algorithm. In particular, they may obfuscate the results of an otherwise relatively well-performing device. Fortunately, however, readout error is far more addressable than other quantum errors on QIPs. First, excluding the case of mid-circuit State Size (N) [14,15] [16] [17] [18] [13,19] This work [20] [ FIG. 1. History of experimentally prepared quantum states exhibiting N -qubit GME, where N ≥ 3, with at least 95% confidence in gate-based quantum systems. The year is the date of first publication, to the best of our knowledge. The plot includes superconducting [13][14][15][16][17][18][19], ion trap [8,[20][21][22][23], photonic (polarisation) [24][25][26][27][28][29][30][31], photonic (multiple degrees of freedom (DoF)) [32,33], nitrogen-vacancy (NV) centres in diamond [34], neutral atom [35,36], and quantum dot [37] systems. The circled marker for N = 27 in 2021 refers to the results of this work. measurement, readout errors do not propagate. Secondly, and more importantly, readout error can be treated as classical error to a good approximation for typical superconducting devices. We employ the quantum readout error mitigation (QREM) procedure developed in [12] in our experiments.
Readout error can be difficult to characterise. Formally, laboratory measurements output estimates of the object Tr[E(U ρU † )], for some noisy state ρ, intermediate unitary circuit U , and positive operator-valued measure (POVM) elements E. Typically, this POVM is the set {|0 0|, |1 1|}, i.e. measurement in the Z-basis. Any deviation from the ideal case in this object cannot be narrowed down to any one of the three components without further investigation. Even without intermediate operations, it is impossible to tell where an error falls on the continuum between a faulty state preparation with a perfect measurement, and perfect state preparation with a faulty measurement.
This noise falls under a category termed state preparation and measurement (SPAM) error. There are techniques equipped to self-consistently characterise SPAM [46], but are typically computationally costly. Instead, it often suffices as a good approximation for modern hardware to assume the former end of the spectrum: that is, a device capable of perfectly delivering a |0 state but with some error in measuring it. The magnitude of the different error sources are often two or more orders of magnitude apart, justifying the assumption. The benefit of this approximation is that the conditional probability distribution p(measurex|preparedy) can be characterised simply by preparing all states y, and counting all measurements x. Whilst the number of preparation states to consider grows exponentially in the number of qubits, it can simplify matters greatly to assume that each faulty POVM element has a tensor product structure. That is, that the readout errors are either local-only or have correlations with limited spatial extent. In the case of solely local errors, the characterisation can be performed in a constant number of circuits -by preparing and measuring both |0 ⊗n and |1 ⊗n . For limited local connections, this is constant with the number of qubits and exponential in the extent of the correlations.
For this work, we operate under the first-order regime of local errors. Each state |ψ is transformed according to |ψ = A local |ψ , where A local := n i=1 A i . Each stochastic matrix A i , is called the calibration matrix for qubit i and is defined as where the notation p i (x|y) indicates the probability of measuring the state |x given the prepared state |y for qubit i. The calibration matrices A i are constructed using the QDT [11] technique. In principle, the QREM procedure consists of collecting a vector of measurement outcomes and multiplying this by A −1 local in order to obtain the pre-readout state. In practice, however, imperfections in the calibration and the mitigation -caused at the very least by finite sampling error -can result in a final probability vector with negative elements. To combat this, a projection method is usually used to find the clos- FIG. 2. Example preparation circuits for 7-qubit GHZ states. Whenever a CNOT is applied within these circuits, the GHZ state grows in size by 1 qubit. (a) An ine cient embedding where all CNOTs are applied from the primary qubit, resulting in a CNOT depth of 6. (b) An optimal embedding where CNOTs are applied in parallel from qubits that are already included in the growing GHZ state, resulting in a CNOT depth of 3. When constructing GHZ states, we aim to prepare them using a similar embedding to this optimal one modulo the qubit layout of the quantum device.
dure. The maximum di↵erence in findings is small when compared to the error bars of our results. The initial measured probability vector can only contain probability values that are at least as large as the equivalent of a single shot, 1/(shot count), due to the finite number of shots, hence the maximum number of states with non-zero probability is equal to the shot count. However, applying the A 1 local to the probability vector often results in a high number of states having very small non-zero probabilities (sometimes much less than 1/100 of a shot). The amount of memory required to precisely store a single probability vector after applying A 1 local no longer scales with the number of shots. Instead, the memory scales as 2 N for N qubits, since sparse vectors are no longer useful. To help alleviate this computational resource requirement, we apply the inverse calibration matrices for each qubit A 1 i sequentially on the relevant elements of the probability vector. After every application of A 1 i , probability values that are below some small threshold (⇡ 1/10 of a shot) are set to zero. This method considerably reduced the overall computation time while only slightly deviating from the original results.

Parity Verification
We examine the extent to which the fidelity of GHZ states can be improved through error-detecting state preparation techniques. In particular, we employ the parity-checking technique used in [44] for the [[4, 2, 2]] code. The goal here is to extract information about errors occurring within the GHZ state without disturbing the state itself. This can be achieved with a parity check between qubits of the state, where two CNOTs (control (c) ! target (t), denoted CNOT c t ) are performed which are controlled on GHZ state qubits and target a single ancilla qubit initialised in the ground state. In the case of an even number of bit-flip errors, parity-checking will leave the state of the ancilla qubit unchanged. Alternatively, for an odd number of bit-flips, the state of the ancilla will flip, signalling a detectable error within the GHZ state. More explicitly, a parity check on qubits q 1 and q 2 using ancilla qubit a will result in the transformations Measuring the ancilla in the |1i a state implies a bit-flip error has occurred within the GHZ state, allowing the erroneous shot measurement to be discarded from the results. In the case of measuring the |11i q1,q2 |0i a state, the two bit-flip errors are not detected. However, if p is the single-qubit probability of a bit-flip occurring for qubits q 1 and q 2 , then the probability of bit-flips occurring on both q 1 and q 2 is small p 2 . For example, assuming bit-flip probabilities are moderate at p = 0.25, if |0i a is already measured (assuming no readout errors), then there is a probability of (1 p) 2 /((1 p) 2 + p 2 ) = 0.9 that the state has no bit-flip errors |00i q1,q2 and a probability of p 2 /((1 p) 2 + p 2 ) = 0.1 that the state has two bit-flip errors |11i q1,q2 . Under the typical framework of quantum error correction, the syndrome measurement of an ancilla qubit will be fed forward to correct the original state. Since error correction is not achievable within the current restrictions of quantum hardware, we instead focus on detection so that runs can be post-selected. This is usually seen as the condition under which fault-tolerance can be demonstrated on near-term quantum devices [44].

Fidelity of GHZ states with parity verification in the ibmq montreal device
In this section, we measure the fidelity on a range of GHZ state sizes prepared on the ibmq montreal device FIG. 2. Example preparation circuits for 7-qubit GHZ states. Whenever a CNOT is applied within these circuits, the GHZ state grows in size by 1 qubit. (a) An inefficient embedding where all CNOTs are applied from the primary qubit, resulting in a CNOT depth of 6. (b) An optimal embedding where CNOTs are applied in parallel from qubits that are already included in the growing GHZ state, resulting in a CNOT depth of 3. When constructing GHZ states, we aim to prepare them using a similar embedding to this optimal one modulo the qubit layout of the quantum device.
est physical probability vector (on the unit simplex, with positive elements summing to 1) [47]. To summarise: a stochastic calibration matrix is constructed for each individual qubit; the tensor product of the inverse of each calibration matrix is multiplied by the measured results vector; finally, the closest probability vector is found, representing the best estimate of the quantum state before readout error. In Appendix A, we estimate the size of the approximations made in performing QREM using this simplified procedure. The maximum difference in findings is small when compared to the error bars of our results.
The initial measured probability vector can only contain probability values that are at least as large as the equivalent of a single shot, 1/(shot count), due to the finite number of shots, hence the maximum number of states with non-zero probability is equal to the shot count. However, applying the A −1 local to the probability vector often results in a high number of states having very small non-zero probabilities (sometimes much less than 1/100 of a shot). The amount of memory required to precisely store a single probability vector after applying A −1 local no longer scales with the number of shots. In-stead, the memory scales as 2 N for N qubits, since sparse vectors are no longer useful. To help alleviate this computational resource requirement, we apply the inverse calibration matrices for each qubit A −1 i sequentially on the relevant elements of the probability vector. After every application of A −1 i , probability values with magnitude below some small threshold are set to zero. A threshold value of 1/10 of a shot was used, which equates to a probability of just over 1.2 × 10 −5 for 8192 shot experiments. This value was chosen because it was high enough to considerably reduce the overall computation time, while the approximation remains close to the full calculation. When testing this approximation on a 19-qubit state, the average computation time for each application of QREM dropped from 66 sec to 3.8 sec (performed on a laptop with 2.70 GHz Intel i7-7500U CPU and 16 GB RAM), and the fidelity was found to match the full calculation by up to five decimal places. The reduced computation time enabled each sample from the 27-qubit GHZ state to be computed within two days.

Parity Verification
We examine the extent to which the fidelity of GHZ states can be improved through error-detecting state preparation techniques. In particular, we employ the parity-checking technique used in [48] for the [[4, 2, 2]] code. The goal here is to extract information about errors occurring within the GHZ state without disturbing the state itself. This can be achieved with a parity check between qubits of the state, where two CNOTs (control (c) → target (t), denoted CNOT c t ) are performed which are controlled on GHZ state qubits and target a single ancilla qubit initialised in the ground state. In the case of an even number of bit-flip errors, parity-checking will leave the state of the ancilla qubit unchanged. Alternatively, for an odd number of bit-flips, the state of the ancilla will flip, signalling a detectable error within the GHZ state. More explicitly, a parity check on qubits q 1 and q 2 using ancilla qubit a will result in the transformations Measuring the ancilla in the |1 a state implies a bit-flip error has occurred within the GHZ state, allowing the erroneous shot measurement to be discarded from the results. In the case of measuring the |11 q1,q2 |0 a state, the two bit-flip errors are not detected. However, if p is the single-qubit probability of a bit-flip occurring for qubits q 1 and q 2 , then the probability of bit-flips occurring on both q 1 and q 2 is small p 2 . For example, assuming bit-flip probabilities are moderate at p = 0.25, if |0 a is already measured (assuming no readout errors), then there is a probability of (1 − p) 2 /((1 − p) 2 + p 2 ) = 0.9 that the state has no bit-flip errors |00 q1,q2 and a probability of p 2 /((1 − p) 2 + p 2 ) = 0.1 that the state has two bit-flip errors |11 q1,q2 . Under the typical framework of quantum error correction, the syndrome measurement of an ancilla qubit will be fed forward to correct the original state. Since error correction is not achievable within the current restrictions of quantum hardware, we instead focus on detection so that runs can be post-selected. This is usually seen as the condition under which fault-tolerance can be demonstrated on near-term quantum devices [48].
Fidelity of GHZ states with parity verification in the ibmq montreal device In this section, we measure the fidelity on a range of GHZ state sizes prepared on the ibmq montreal device and observe the effects of using parity-checking qubits to verify them. The following experiments prepare GHZ states and parity verification on ibmq montreal 's qubit layout via the diagrams shown in Figure 3. The parity verification CNOTs are performed directly after preparing the GHZ states. In the case of measuring the coherence, these CNOTs are in the middle of the circuit, since there is a GHZ decoding sequence of gates applied afterwards. Ideally the parity qubits would be measured immediately after their corresponding CNOTs have been applied to help reduce parity qubit error and abandon the computation as soon as possible in the case of an error, however superconducting devices do not yet robustly support this functionality and hence the parity qubits are measured at the end of the computation along with the other qubits. The population is calculated by summing the measured 00 . . . 0 and 11 . . . 1 occupancies from the prepared GHZ states and the coherence is calculated by using the MQC method. Additionally, QREM was used on GHZ state qubits to reduce the effects of noise due to measurement errors. The parity-checking qubit cannot have QREM applied because it is a single shot measurement that directly affects whether the shot is discarded. For all of the experiments, the standard errors were calculated from 8 independent runs of the experiment where all circuits were performed using 8192 shots. All computations were performed on the Spartan high performance computing system [49]. The corresponding CNOT circuit depths and counts for each experiment are shown in Table I. GHZ states of incrementally increasing sizes from 11 to 19 qubits were prepared using the embedding shown in Figure 3a. The smallest state of 11 qubits is the minimum size required to include a parity-checker qubit within the ibmq montreal device's layout. The initial Hadamard is performed on qubit 2 and CNOTs grow the state in the up-right and down-right directions. The parity qubit is at location 13 and was added to the circuits by applying two CNOTs with control qubits 12 and 14 and target qubit 13 directly after preparing the GHZ states. The resulting fidelity, population and coherence measurements were obtained within the same device calibration and are shown in Fig. 4, the overlap signals for state sizes 11, 13, 15, and 17 for the calculation of the coherences are shown in Figure 6a. Performing the parity-checking verification requires the CNOT circuit depth to increase by one for states of size 11 to 14, while the depth is unchanged for states of size 15 to 19 since the parity-checker CNOTs are performed in parallel to the preparation circuit. When using the parity check, there is a noticeable improvement to fidelity for state sizes that do not require the CNOT circuit depth to increase, namely for state sizes 15 to 19, where the average fidelity increases by 0.051 ± 0.007 with QREM and 0.031 ± 0.004 without QREM. The population values gain a consistent advantage through parity verification, while the coherence values appear mostly unchanged for sizes 15 to 19 and slightly decrease for sizes 11 to 14. The decrease in coherence is likely due to the inclusion of the additional CNOTs increasing the CNOT circuit depth and introducing noise. Measurement error on the parity-checking qubit could also be a contributing factor by resulting in a small number of the wrong circuits being discarded. We also observe that for these prepared states, the measured population and its change due to QREM are much larger than for the coherence. These observations are expected because dephasing is the dominant error channel in superconducting qubits, rather than population errors. Therefore we expect the coherence to be the limiting factor in generating these states to high fidelity. The effects of QREM on population are likely more drastic in their improvement than with the coherence because readout errors are significantly more likely to occur from |1 to |0 compared to the other way around. This leads to population measurements being far more sensitive to measurement error than the coherence curve (since the occupancies of both |00 . . . 0 and |11 . . . 1 are required to measure the population while only the occupancy of |00 . . . 0 is required for the coherence).
To further explore the effects of parity verification on larger states, GHZ states were prepared for incrementally increasing sizes from 19 to 25, with and without two parity-checking qubits. The ibmq montreal device has a convenient layout for preparing larger GHZ states, since it allows a relatively high number of branches when constructing them, allowing more CNOTs to be computed in parallel as shown in Figure 3b. The initial Hadamard is performed on qubit 13 and CNOTs grow the state in the four directions up-left, up-right, downright, down-left. The parity-checking qubits are located at 2 and 24 and were added to the circuits by applying, directly after preparing the GHZ states, four CNOTs, two with control qubits 1 and 3 and target qubit 2, and two with control qubits 23 and 25 and target qubit 24. Due to the parallel nature of this state embedding, each  13 12 , CNOT 13 14 , CNOT 12 10 , CNOT 12 15 , CNOT 14 11 , and CNOT 14 16 . The states are incrementally grown from the initial 19 qubits by including qubits in the order: 9, 6, 20, 17, 0 and 26. The 26 and 27-qubit states do not use qubits 2 and 24 as parity checking qubits, instead the 26-qubit GHZ state includes qubit 2 with a CNOT 3 2 in its construction while the 27-qubit state additionally includes qubit 24 with a CNOT 23 24 gate.  -checker qubits)  19  7  18  8  2 2  14  36  15  40  20  7  19  8  2 3  14  38  15  42  21  7  20  8  2 4  14  40  15  44  22  7  21  8  2 5  14  42  15  46  23  7  22  8  2 6  14  44  15  48  24  7  23  9  2 7  14  46  16  50  25  7  24  9  2 8  14  48  16  52  26  7  25  --14  50  --27  7  26  --14  52 -- and observe the e↵ects of using parity-checking qubits to verify them. The following experiments prepare GHZ states and parity verification on ibmq montreal 's qubit layout via the diagrams shown in Figure 3. The parity verification CNOTs are performed directly after preparing the GHZ states. In the case of measuring the coherence, these CNOTs are in the middle of the circuit, since there is a GHZ decoding sequence of gates applied afterwards. Ideally the parity qubits would be measured immediately after their corresponding CNOTs have been applied to help reduce parity qubit error and abandon the computation as soon as possible in the case of an error, however superconducting devices do not yet robustly support this functionality and hence the parity qubits are measured at the end of the computation along with the other qubits. The population is calculated by summing the measured 00 . . . 0 and 11 . . . 1 occupancies from the prepared GHZ states and the coherence is calculated by using the MQC method. Additionally, QREM was used on GHZ state qubits to reduce the e↵ects of noise due to  13 12 , CNOT 13 14 , CNOT 12 10 , CNOT 12 15 , CNOT 14 11 , and CNOT 14 16 . The states are incrementally grown from the initial 19 qubits by including qubits in the order: 9, 6, 20, 17, 0 and 26. The 26 and 27-qubit states do not use qubits 2 and 24 as parity checking qubits, instead the 26-qubit GHZ state includes qubit 2 with a CNOT 3 2 in its construction while the 27-qubit state additionally includes qubit 24 with a CNOT 23 24 gate.
prepared on the device (with embedding shown in Figure 3b). The experiments were performed on the same calibration as the previous experiments relating to the GHZ state sizes of 19 to 25 and the results are shown in Figure 5, with corresponding overlap signals for the calculation of coherence shown in Figure 6b. The measured fidelities using QREM were found to be 0.580±0.022 and 0.546 ± 0.017 for the 26 and 27-qubit states respectively. These fidelities are above the 0.5 threshold with confidence levels of 99.6% and 98.6% respectively, thus GME was detected in both states.
We suspect it is possible to extend parity verification to consist of three or more qubits with majority rules, instead of a single qubit, to reduce measurement error.  Figure 4. Each plot compares combinations of applying quantum readout-error mitigation (QREM) and using parity-checker qubits 2 and 24 to verify the equality of states of qubits 1 and 3, and qubits 23 and 25 respectively. The parity ∆ refers to the the difference between using and not using parity verification. The preparation of each state without parity verification requires the same CNOT circuit depth since the additional qubits can be included by applying CNOT gates in parallel. When including the two parity-checking qubits, the CNOT circuit depth increases by one for sizes 19 to 23 and increase by two for sizes 24 and 25. The standard errors were calculated from 8 independent runs of the experiment where all circuits were performed using 8192 shots. All corresponding CNOT circuit depths and counts are displayed in Table I. (a) The population is calculated as the probability of obtaining |00 . . . 0 or |11 . . . 1 upon measurement of the prepared GHZ state. Parity verification using the two parity-checking qubits is shown to significantly increase the population values for all state sizes from 19 to 25 qubits. (b) The coherence is calculated using the quantum multiple coherences (QMC) method. No increase in coherence is observed using parity verification for state sizes 19 to 25. For state sizes 19 to 21 in particular, the measured coherence decreases slightly. (c) The fidelity is calculated as the average over the population and coherence. When using parity verification, the fidelity increases on average by 0.072 ± 0.009 with QREM and 0.045 ± 0.007 without QREM over state sizes 19 to 25. This could be beneficial because current measurement error mitigation techniques cannot be applied to parity qubits due to the parity being a single shot measurement. However, it may not be worth it due to introduced CNOT errors potentially being larger than the original measure-ment error.
FIG. 6. Measured MQC overlap signals for various size GHZ states prepared on the ibmq montreal quantum device and corrected using QREM. The experimental results and standard errors were calculated from 8 independent runs of the experiment performed on the same device calibration where all circuits were executed using 8192 shots. The noise filtered data are calculated by taking the Fourier transform of the experimental results, zeroing frequencies that are not 0, ±N , then inverting the Fourier transform. There is a phase shift present in the measured S φ values that is likely caused by free rotation in idle qubits. The refocusing π-pulse is applied to the encoded GHZ state before decoding to help nullify these qubit drift effects, however the GHZ encoding and decoding operations take slightly different amounts of time to compute due to pulse alignment restrictions in the software. This phase shift is more pronounced when parity verification is applied since the parity-checking CNOTs are performed before the refocusing π-pulse, offsetting the number of gates before and after refocusing. (a) GHZ state is implemented using embedding 1 shown in Figure 3a. Parity verification uses qubit 13 as a single parity-checker (PC) qubit to verify the states of qubits 12 and 14. (b) GHZ state is implemented using embedding 2 shown in Figure 3b. Parity verification for states of sizes 19 to 25 qubits uses qubits 2 and 24 as parity-checkers to verify the equality of states of qubits 1 and 3, and qubits 23 and 25 respectively.

DISCUSSION
A variety of GHZ states were prepared on the 27qubit ibmq montreal device and the GHZ fidelities, pop-ulations and coherences were measured. We report the preparation of a 27-qubit state with a GHZ fidelity of 0.546 ± 0.017, demonstrating GME across the full de-vice with a confidence level of 98.6%. This is currently the largest state prepared on a quantum system that has been reported to exhibit GME. We further benchmarked the effects of parity verification on the fidelity, population and coherence. Two GHZ state preparation embeddings on the physical qubit layout were used. The first embedding generated GHZ states of incrementally increasing sizes from 11 to 19 qubits while using a single parity-checking qubit. When the parity-checker qubits did not require the CNOT depth of the circuit to increase, which was the case for state sizes 15 to 19, the fidelity increased by an average of 0.051 ± 0.007 with QREM and 0.031±0.004 without QREM. The fidelity was mostly unchanged for sizes 11 to 13 and was higher for size 14. The lack of fidelity improvement for state sizes 11 to 13 was most likely due to the noise introduced by the CNOTs of the parity-checking qubit and their implementation resulting in the CNOT circuit depth increasing by one for state sizes 11 to 14. The population values increased for all state sizes when the parity-checker qubit was used, while the coherence values appear mostly unchanged for state sizes from 15 to 19 and slightly decrease for sizes from 11 to 14. The second embedding generated GHZ states of incrementally increasing sizes of 19 to 25 qubits while using two parity-checker qubits. It utilised the layout of the ibmq montreal device by performing as many CNOTs as possible in parallel to minimise the CNOT circuit depth. The parity-checking qubits increased the CNOT circuit depth for state sizes 19 to 23 by one and for sizes 24 and 25 by two. We observed that the paritychecking procedure with two qubits increased the fidelity slightly for all state sizes resulting in an average increase of 0.072 ± 0.009 with QREM and 0.045 ± 0.007 without QREM. In particular, the 25-qubit state, which was the largest state tested using parity checkers, had its fidelity increase from 0.601 ± 0.015 to 0.664 ± 0.016 with QREM. The population values consistently increased for all state sizes, while the coherence values decreased slightly for state sizes 19, 20, 21 and 25 qubits and were unchanged for state sizes 22, 23 and 24. The results show that the effect of parity verification led to a detectable improvement of GHZ fidelity, although relatively modest on current IBM Quantum hardware.

AUTHOR CONTRIBUTIONS STATEMENT
GJM, GALW, CDH and LCLH conceived the experiments. GJM obtained and processed the main results, while GALW obtained and processed the QREM analysis of Appendix A. GJM and GALW wrote the manuscript with input from CDH and LCLH.

DATA AVAILABILITY
The datasets generated and/or analysed during the current study are available from the corresponding author on reasonable request.

ADDITIONAL INFORMATION
Non-financial competing interests: The authors are supported by the University of Melbourne through the establishment of an IBM Q Network Hub at the University.
In the main text, we briefly outlined some simplifying assumptions made in order to implement quantum error mitigation more efficiently. Respectively, these were: measurement error is significantly larger than preparation error, measurement error is predominantly classical, and measurement error is uncorrelated between qubits. Here, we examine these assumptions on chosen subsets of the device. To this end, we employ gate set tomography (GST) for characterisation. GST is a selfconsistent extension of regular quantum process tomography (QPT) which includes the SPAM operations in its maximum likelihood estimation. A characterisation of the ith qubit is performed across an entire gate set, whereby the preparation and measurement operators are also estimated. Importantly, this allows us to robustly evaluate the effects of the liberties we have taken. For an overview of GST and its implementation, see [46,50,51]. We use the wellknown quantum characterisation, verification and validation (QCVV) Python package PyGSTi for this part of the implementation [52].
Testing these assumptions takes place in three parts. First, we perform GST across a range of qubits to obtain an estimate for ρ 0 i , E 0 i , and E 1 i , where ρ 0 i is the initial density matrix, and E m i is the POVM element corresponding to when the detector clicks with the result m. Next, using these results, we compare the magnitudes of errors in ρ 0 i with E m i . We also determine the off-diagonals of E m i ; non-zero values correspond to coherence in the POVM effect, which is a strictly non-classical error. These values roughly represent the information discarded when the decision is made to use the stochastic calibration matrix (Eq. 5) rather than the complete POVM. From the POVM, we construct the calibration matrix A POVM i . This has the same definition as A i in Equation 5, except its values are obtained with a more comprehensive procedure. We compare these matrices with the end result obtained from the simpler method in the main text. A similar investigation into the rigours of QREM has been conducted in [53]. Ideally, the GST estimates for the noisy probes would be used as the basis for QREM, however for our purposes this would have greatly increased the experimental overhead. The number of experiments required for the chosen GST characterisation of a single qubit is 550, whereas the prepare-and-measure path takes only two circuits with no post-processing.
Finally, having validated the use of the simpler, cheaper stochastic matrix, we examine the assumption of locality of the measurement error. To do this, we select sets of four qubits. We construct the full calibration matrix across the four qubits -i.e. by preparing {|i } 15 i=0 and measuring. We then compare this to the tensor product of the four local calibration matrices. We note in hindsight that there have been recent developments for characterising correlated measurement error in an efficient manner [54]. Although our results do not suggest that the tensor product structure significantly impacts the outcome, this could be a more desirable method in future.

SPAM magnitudes and the calibration matrix
Across the device, we performed GST on six different qubits. From this, we obtain estimates of the initial density matrix ρ 0 i , and the two-outcome POVM {|0 0|, |1 1|}. We also obtain estimates for the gates that comprise the process, but these are not relevant to the discussion. LetĒ m i denote one of the noisy twooutcome effect operations for the ith qubit. Our aim is firstly to determine how classical the error onĒ m i is, as well as estimating the size of the error made in assuming perfect state preparation. The latter is not necessarily as simple as finding the size of the preparation error. Rather, we use our estimatesĒ m i to determine the calibration matrix that we would obtain if state preparation were perfect. Using the Born rule, the elements of this matrix A POVM i are: A subtlety in this practice is thatρ 0 andĒ m i are not uniquely fixed by any possible experiment; a type of gauge freedom exists in the estimate. Let Λ(·) be some error channel acting on either SPAM operation, and let G be some gate sequence. If Λ commutes with G then we have (A2) That is, there is a class of estimates for ρ 0 i and E m i which are completely physically equivalent under commuting transformations. An error map which fits this description is the depolarising channel E p : Therefore, in comparing the GST-produced calibration which preserve the positivity of ρ 0 i . Since these representations of the SPAM operations are physically indistinguishable, we can determine the closest A POVM i to A i for each qubit i. The remaining difference between the two will estimate the error produced in constructing the calibration matrix using the simpler methods of the main text. For qubits 1, 3, 12, 14, 23, and 25, we summarise the results of this process in Table II. The first three data columns respectively give direct estimates of the POVM 0-effect, the GST-derived calibration matrix, and the simplified calibration matrix. The final two columns respectively quantify from these the size of the approximation made from assuming perfect preparation and from assuming classical errors. The experiments were conducted with 8192 shots. Each of these values are small, both compared to the sampling error, and compared to the results in the main text.
We further examine these assumptions by conducting simulated experiments using the experimentally characterised data. Here, we aim to replicate the conditions of our results as much as possible in order to determine if there is any possibility of overestimating entanglement by using QREM. The procedure for this is as follows: • Generate a set of possible 6-qubit lab states: ρ = (1−p)·ρ GHZ +p·ρ noise for a range of p and different noise (in particular: maximal mixtures and random states), • Simulate out MQC with measurement error by using the GST estimates of the measurement probes {E i } to produce the complete coherence curve, • Then, using the measure-and-prepare calibration matrix A local , we apply QREM and invert the noise of the measurement effects, • Finally, determine the final error-mitigated fidelity, and compare to the actual state ρ.
We performed this across a variety of different ρ noise and values for p, each sampled 500 times. The maximum average fidelity overestimate found with QREM was 6 × 10 −4 . We suspect that the MQC method of estimating state fidelity is particularly robust to erroneous QREM. The reason for this is that QREM carries the possibility of overestimating the number of |0 ⊗n readouts. However, MQC creates a curve measuring |0 ⊗n amplitudes from peak to trough and then estimates the fidelity from the frequency here, which is more robust to translations of the entire curve. We compare the full calibration matrix to that which is acquired by taking the tensor product of all local terms. In the matrix plots, we subtract off the identity matrix in order to better resolve finer details.

Correlation of measurement error
The final assumption made in the name of circuit reduction is that of locality of the errors. In the previous subsection, we justified that single-qubit measurement errors can be accurately characterised for the purpose of QREM on this hardware using only two circuits. An extrapolation to n qubits would still require 2 n circuits to create the full calibration matrix. Instead, if we assume that measurement errors on different qubits are uncorrelated, we can take A local = n i=1 A i for the n qubit system.
To check whether this is valid, we collected calibration matrices for four sets of four qubits and compared them to the tensor product of their marginals. Here, A full is the calibration matrix for all 16 prepared and measured states. Meanwhile, A local is the tensor product of local calibration matrices obtained from the preparation and measurement of |0000 and |1111 . These results are summarised in Figure A.1. Here, we present matrix plots of each calibration matrix with the ideal case subtracted off. Each construction presents an almost identical depiction of the noise. One exception is the small occurrence of non-zero elements in A full for the set (15,18,17,21) which is not present in A local -however we believe this is not large enough to warrant accounting for. The difference between the local and full calibration matrices are quantified by their Frobenius distance. In all cases, this difference (representing the difference between 256 matrix elements) is small, and we are confident that results are not affected by making this simplifying assumption.
Appendix B: Using multiple quantum coherences to detect GME within noisy GHZ states GHZ states defined over arrays of qubits are more susceptible to noise than graph states defined over the same layout of qubits [55]. So to detect entanglement, the graph state is generally the preferred option. However the GHZ state has a convenient structure that can be utilised for determining the more strict GME property, by calculating its fidelity. The fidelity between a desired pure state ρ ideal and the actual state ρ is a similarity metric which can be written as When ρ ideal is a GHZ state, a fidelity value that is greater than or equal to 0.5 implies that ρ is GME. The fidelity of a GHZ state can be written as where the population P = ρ 00...0,00...0 + ρ 11...1,11...1 can be directly measured as the sum of the |00 . . . 0 and |11 . . . 1 occupancies from the GHZ state and the coherence C = |ρ 11...1,00...0 | + |ρ 00...0,11...1 | can be indirectly measured through Multiple Quantum Coherences (MQC) [13] or parity oscillations [8,22]. In this work, we use MQC since a refocusing π-pulse similar to a Hahn echo [45] can be used to refocus low frequency noise and reduce dephasing [13]. This is because MQC is more robust to noise and scales better with respect to measurement error mitigation. MQC is a technique that has been adapted from solid state nuclear magnetic resonance [42] to many-body correlations and quantum information scrambling in trapped ions [43,44]. More recently, it has been used to detect 18-qubit GME in the IBM Q System One quantum device [13]. MQC works by utilising the amplified phase accumulated when phase rotating individual qubits. To help properly understand MQC and how it can be applied to calculate the fidelity, we will go through, in detail, steps to express the fidelity more concretely. The following working has been adapted from Wei et al. [13] and Gärttner et al. [44]. We begin by writing down the following expression for the density matrix noting that ρ i,j = 0 when i, j < 0 or i, j > N . The coherence sectors ρ q account for the coherence between states |m and |m = m − q such that A m − A m = q. Note that ρ being Hermitian implies that ρ † q = ρ −q , and each ρ q being orthogonal to one another implies Tr(ρ q ρ p ) = δ q−p Tr(ρ q ρ −q ). Although the states ρ q cannot be directly observed, we will show that the multiple quantum coherence amplitude defined as I q := Tr(ρ q ρ −q ) = Tr(ρ −q ρ q ) = I −q , can. Since the density matrix of an ideal GHZ state only has non-zero values on each corner, the fidelity can be expanded as follows with population P = ρ 00...0,00...0 + ρ 11...1,11...1 and coherence C = 2 √ I N . Now that we have expressed the coherence with respect to the quantum coherence amplitude I N , we will show how I N can be measured. It can be calculated from the overlap signals S φ := Tr(ρ φ ρ), which is a directly measurable quantity where ρ φ := e −i φ 2 j σ j z ρe i φ 2 j σ j z is produced by rotating each qubit of the state ρ by φ about the Z-basis. For an ideal GHZ state, this is equivalent to adding a phase of N φ to the state, that is 1 √ 2 (|00 . . . 0 + e −iN φ |11 . . . 1 ), which has the overlap signal reduce to where the constant term corresponds to the diagonal elements of the density matrix and the cosine term corresponds to the off-diagonal corner elements. To see the behaviour of the overlap signal when more states are included, we can consider the case of general pure states and express ρ as (B28) When we substitute a 0 , a N = 1/ √ N and a i = 0 for i = 0, N , the expression reduces down to the ideal case for GHZ states shown in Equation B19. By including amplitudes of non-all-zero and non-all-one states, lower frequencies are introduced into the behaviour and the N frequency component is dampened.
The amplitude I N can be derived from S φ as follows where Eqs. B4 and B6 are used to help obtain Equation B30. Fourier transforming this result gives I q = N −1 | φ e iqφ S φ | where N is the number of angles φ within the summation. To ensure that the frequency N is detectable, measure S φ for φ = πj N +1 , where j = 0, 1, . . . , 2N + 1 which can detect up to frequency N + 1.