Optimisation of diamond quantum processors

Diamond quantum processors consisting of a nitrogen-vacancy (NV) centre and surrounding nuclear spins have been the key to significant advancements in room-temperature quantum computing, quantum sensing and microscopy. The optimisation of these processors is crucial for the development of large-scale diamond quantum computers and the next generation of enhanced quantum sensors and microscopes. Here, we present a full model of multi-qubit diamond quantum processors and develop a semi-analytical method for designing gate pulses. This method optimises gate speed and fidelity in the presence of random control errors and is readily compatible with feedback optimisation routines. We theoretically demonstrate infidelities approaching $\sim 10^{-6}$ for both single-qubit gates and a two-qubit CZ gate. Consequently, our method reduces the effects of control errors below that of the unavoidable decoherence that is intrinsic to the processors. Having developed this optimal control, we simulated the performance of a diamond quantum processor by computing quantum Fourier transforms. We find that the simulated diamond quantum processor is able to achieve fast operations with low error probability.


Introduction
Diamond is a promising architecture for quantum information processing [1,2,3,4,5,6] and quantum sensing/ microscopy [7,8,9,10] at both cryogenic and room temperautres. Optimised diamond quantum processors are crucial building blocks for large-scale diamond quantum computers and the next generation of quantum sensors and microscopes that are enhanced by embedded quantum memories and signal processing [11,12,13]. To date, diamond quantum processors have been used to implement quantum error correction codes [3,4,14], quantum algorithms [15,16], detection of metallo-protein molecules [17] as well as quantum simulation of the helium hydride cation [5] and topological phase transition of a quantum wire [6]. For technology applications, diamond quantum processors are distinguished from other quantum architectures due to their ability to operate in ambient conditions and with relatively arXiv:2002.00545v1 [quant-ph] 3 Feb 2020 simple microwave, radio-frequency and off-resonant optical control systems [18]. The resulting improvements in complexity, robustness and cost make diamond one of the most flexible and widely applicable quantum technology platforms.
Diamond quantum processors consist of a nitrogen-vacancy (NV) centre with a local cluster of hyperfine-coupled nuclear spins. These coupled nuclear spins include the intrinsic N nuclear spin of the NV centre and isotopic 13 C lattice impurities. Quantum computations are realised by using the electron spin of the NV centre as a quantum bus that initialises, mediates interactions between, and reads-out the coupled nuclear spins, which act as the physical qubits. Scaling of the diamond quantum architecture requires a mechanism to couple multiple NV centres and their qubit clusters. Coupling of separate NV centres has been demonstrated at cryogenic temperatures using photons [19,20], and at room temperature using magnetic dipole coupling between proximate NV centres [21,2]. Spin chains [22] and coherent spin transport [23] have also been proposed as coupling mechanisms.
Previous work in optimising the quantum control of either the NV centre's electron spin or coupled nuclear spin qubits has already demonstrated excellent control fidelities, using techniques including dynamical decoupling [24,25,26,27,28,29,30,31] as well as numerical gate pulse-shaping techniques, such as the Chopped RAndom Basis (CRAB) quantum optimisation algorithm [32,33] and the GRadient Ascent Pulse Engineering (GRAPE) algorithm [34]. Experimental application of the CRAB algorithm to the NV electron spin has yielded ultra-fast single-qubit gates (ie beyond the rotating wave approximation) with fidelities of 0.95 ± 0.01 and 0.99 ± 0.016 for π/2 and π pulses, respectively [35]. The GRAPE algorithm has been used to demonstrate single electron spin operations with fidelity F ≈ 0.99 [2] and generate entangled states of three nuclear spins with fidelities exceeding 85% [4]. Moreover, an average single-qubit gate fidelity of 0.999952 and two-qubit gate fidelity of 0.992 has been reported using composite pulses and a modified GRAPE algorithm, respectively [36]. While some of the demonstrated gate fidelities are impressive, even with a gate fidelity of 0.999, for a fault-tolerant logical qubit to achieve logical error rates comparable to classical computers, effective surface code error correction is anticipated to require up to 10 4 physical qubits [37,38]. Therefore, there is a strong motivation to push for further reduction in gate errors.
There are two different optimisation problems to address to improve the performance of diamond quantum processors: improvement of initialisation/ readout fidelities and improvement of gate fidelities and speeds. The initialisation/ readout fidelities are optimised by selecting the 13 C nuclear spin lattice sites that have the longest nuclear spin relaxation time during the projective single-shot optical readout process employed in diamond quantum processors [4]. Broadly speaking, the best lattice sites are those whose hyperfine field is well aligned to NV centre's axis. As will be discussed later, this alignment is also important to achieve high gate fidelities. As such, there is some correlation between the choices made to optimise initialisation/ readout fidelities and gate fidelities. Consequently, for simplicity, in the following we will not discuss the optimisation of initialisation/ readout fidelities, but instead assume a particular selection of nuclear spin qubits with well-aligned hyperfine fields, and focus on the problem of optimising the gate fidelities and speeds.
Optimisation of the processor gate operations requires simultaneous maximisation of gate speed and minimisation of: (1) spurious effects of control fields on qubits other than the target qubit(s) (ie cross-talk), (2) the effects of random control field errors (ie those caused by fluctuations in amplitude, frequency and phase), and (3) the effects of decoherence. Additional practical requirements are that the gate design: (A) complies with the physical constraints of the control systems, (B) is readily incorporated into a feedback-based optimisation routine that uses measurements to optimise the actual physical processor (and not just models of the processor) and supports updating of the optimisation during operation to adjust for system drifts, and (C) is parameterised so that the degree of convergence to optimisation limits can be deterministically and systematically assessed (in order to support design decisions concerning the costs and benefits of further optimisation) and the dominant error modes can be diagnosed (to improve processor design). As GRAPE involves direct numerical solving of pulses using model systems, it does not readily achieve the practical requirements (B) and (C) and relies heavily on the accuracy of its models [39]. On the other hand, the nature of CRAB allows the integration of (B) but not (C) directly due to its inherent reliance on random numbers, and the number of free parameters required to optimise the control field [33].
We propose a different approach to this optimisation problem. Our approach has two steps. The first step is to generate a complete semi-analytical basis of pulses that comply with (A) and minimises (1). The second step is to find the linear combination of these basis functions that minimise (2) for a given pulse length. This linear minimisation is fast and can include measurements of the processor and therefore readily complies with (B). Furthermore, since the basis is complete, the dimension of the non-trivial basis functions provides a clear parameter to analyse convergence to optimisation limits and interpretation of different error modes, and thus also complies with (C). The principal strategy for minimising the effects of decoherence (3) is to maximise the gate speed, and thus minimise the time over which decoherence accumulates, whilst ensuring that the infidelity introduced by control errors is less than that introduced by decoherence. We have adopted this strategy rather than directly targeting the primary source of qubit decoherence (eg via dynamical decoupling) because the primary source is the NV centre's electron spin, whose strong interaction with the nuclear spins is required for the implementation of both one-and two-qubit gates. Indeed, owing to the strong interaction, the qubit coherence time 1/T 2,n is limited by the electron spin relaxation time 1/T 1,e [40,41], which is approximately 1.8 ms at room temperature [42]. Note that our discussion here, and the results of this paper, are in the context of room-temperature operation of diamond quantum processors. At cryogenic temperatures, weakly coupled nuclear spins can instead be employed as qubits and as a result, their decoherence is influenced by other mechanisms [43], at the cost of slower gate speeds.
In this paper, we report (i) a demonstration and analysis of our optimal design approach and (ii) simulation of an optimised diamond quantum processor. In section 2, we first discuss the operating principles of a diamond quantum processor before presenting a complete model of a processor, its gate operations and control system errors. Section 3 demonstrates the generation of gate basis functions, while section 4 demonstrates the optimisation of gate fidelities in the presence of control errors. In section 5, the effects of decoherence on a diamond quantum computer are investigated via master equation simulations and in section 6, we simulated the performance of 3 and 5 qubit quantum Fourier transforms (QFTs) on a diamond quantum processor. Fidelities of QFTs were chosen as a simple performance metric because QFTs are the foundation of many quantum algorithms. Thus, the fidelities of QFTs are basic indicators for the system's performance with more sophisticated algorithms.
2. Quantum control model of diamond quantum processors

Operating principles of diamond quantum processors
The NV centre is a point defect in diamond consisting of a substitutional nitrogen and an adjacent carbon vacancy [44]. Its electronic structure consists of a ground state spin triplet ( 3 A 2 ) and an excited state spin triplet ( 3 E) with two intermediate singlet levels ( 1 E and 1 A 1 ) . There exist spin-selective non-radiative intersystem crossings between the triplet and singlet levels, which lead to initialisation of the electronic spin state upon optical excitation of the centre's 3 A 2 → 3 E transition as well as read out via the differing fluorescence intensities of the spin states (see Ref. [44] for further details). In addition to high fidelity optical spin initialisation and readout, the NV centre also has the longest electron spin coherence time of any solid state spin at room temperature (T 2 ≈ 1.8 ms) [42].
Each NV centre is coupled to a register of one or more nuclear spins, which we use as qubits [3,4]. The quantum register consists of the NV centre's intrinsic nitrogen nuclear spin and nearby 13 C nuclear spins. Hyperfine coupling between the NV electron spin and the nuclear spins results in a splitting of the electronic and nuclear energy levels. This splitting depends on the particular hyperfine coupling strength between each nucleus and the NV electron spin, and also on the respective electron and nuclear spin states [45,46] (see figure 1b). We choose a register with non-overlapping hyperfine couplings, allowing the use of frequency selectivity to individually address each nuclear spin qubit in the register.
Key requirements for universal quantum computation are the initialisation and readout of the qubits, as well as the ability to apply single and two-qubit gate operations. In diamond quantum computing, each of these processes relies on high-fidelity quantum gates on the electron and nuclear spins. Initialisation and readout of a diamond quantum register is performed via a projective, single-shot readout of the nuclear spin qubits. This measurement scheme involves initialising the electron spin, entangling the nuclear spin qubits with the electron spin using a C n NOT e gate and then readout of the electron spin [49]. Single-qubit gate operations are realised using radiofrequency (RF) pulses. These  Figure 1: (a) Conceptual design of a diamond quantum computer (adapted from [18,47,48]. At the device scale (top), the quantum computer contains a diamond chip with an array of quantum processing nodes. Each node is formed by magnetically coupling a surrounding cluster of 13 C nuclear spin qubits to the NV centre. Optical initialisations and readout of these quantum processing nodes via their NV centres are done using an optical system placed below the diamond chip. At the scale of a single node (middle), surface mirowave structures are used to realise single and two qubit gate operations. Internode two-qubit gate operation is mediated using spin quantum buses which are realised through chains of substitutional N defects. At the cluster scale (bottom), the NV centre consists of a substitutional N defect (blue) adjacent to a carbon vacancy (transparent). The nuclear spins of the N defect and cluster of nearby 13 C atoms are depicted in blue and orange respectively, while the NV centre's electronic spin is depicted in red. (b) The hyperfine structure of the NV centre which arises from the interaction of two nearby 13 C nuclear spin qubits. Optical initialisation and readout of the NV centre's electron spin are realised via a combination of spin-conserving optical transitions and spin-selective radiationless decay [44]. Using microwave pulses, this capability can be extended to the nuclear spin qubits by selectively swapping the electronic and nuclear spin states [4]. The computational and auxiliary subspaces are defined to be the |−1 and |0 electronic spin projections, respectively. Single qubit gate operations are realised in the computational subspace using spectrally-selective microwave pulses. An entangling conditional-z (CZ) two-qubit gate operation is realised via selective 2π microwave pulses that involves, but does not occupy, the auxiliary subspace [4].
pulses correspond to the R x and R y gates where they are the rotations about the x and y axes respectively. Other single-qubit gates can be constructed from combinations of these rotations. The intrinsic properties of the NV-nuclear spins system allows direct application of a CZ gate via microwave (MW) pulses. This entangling CZ gate is achieved by performing a selective 2π pulse conditional on the nuclear spin register being in a particular state [4]. The CZ gate can be combined with single-qubit gates to realise any other two-qubit gate. The splitting in the 3 A 2 triplet ground state results in two types of subspaces which we identify as computational subspace and auxiliary subspace (figure 1b). The natural computational subspace is either of the m s = ±1 states as they have non-zero hyperfine interactions, thus allowing the nuclear spin qubits to be individually addressed through frequency selectivity [50]. Whilst the choice of either the m s = ±1 state as the computational subspace is arbitrary, the m s = −1 state is more often selected as it requires lower microwave frequencies for qubit gate operations. Single-qubit gates are realised in the computational subspace while a two-qubit CZ gate utilises, but does not occupy, the auxiliary subspace [4].

Model Hamiltonian
The Hamiltonian H I of the nuclear spins coupled to the NV centre is where γ i is the gyromagnetic ratio of the i th nucleus, B 0 is the background static magnetic field aligned with the NV axis, B 1 (t) is the applied radio frequency field, A i is the hyperfine tensor of the i th nucleus with S being the dimensionless electron spin operator and I i is the dimensionless nuclear spin operator of the i th nucleus. For this model, we apply the secular approximation as a very strong magnetic field is applied along the z-axis during the operation of this quantum computer. Therefore, the nuclear spin Hamiltonian in the computational subspace simplifies to We simplify the expression by diagonalising the nuclear spin Hamiltonian in the computational subspace via a rotation of the spin operators about the angles defined by their hyperfine interactions. Assuming that only nuclei with hyperfine fields nearly aligned with the NV axis are chosen, we perform small angle approximations and by undoing a rotation about z-axis for further simplifications, this yields where ω i is the transition frequency of the i th nucleus. We also transform the Hamiltonian into the interaction picture using the following transformation operator Thus, for single-qubit gate operations, the Hamiltonian for the computational subspace in the interaction picture is given by The auxiliary subspace is involved to perform an entangling CZ gate and this is enabled via the NV electron spin. The effective Hamiltonian for two-qubit gate operations is given by When defining the Hamiltonian above, we ignore the interactions with the m s = +1 state of the electron spin and the direct interaction between the nuclear spins and the microwave field B 1 (t). It is possible to do this because the microwaves are far detuned from these transitions. Likewise to the Hamiltonian for single-qubit gate operations, we transform H into the interaction picture with the transformation operator T is given by The transformed Hamiltonian for two-qubit gate operations is then given by where 0 and 1 are the m I = −1/2 and +1/2 nuclear spin projections, respectively. We use the notation where the most left entry of a tensor product corresponds to the first qubit, i.e |q 1 , q 2 , . . .

Control Pulses and Gate Operations
Focusing on single-qubit gate operations within the computational subspace, the applied radio frequency field B 1 (t) can be parametrised as a linear combination of oscillating components where We can therefore write the evolution operator for the i th nucleus as where τ denotes the gate time while X i and Y i parametrises the rotations about the x and y axes respectively, which realises the gate operation. If the j th nucleus is the intended target, then the operations on all other nuclei (i = j) are simply identity operations. Thus we have : for i = j, and for i = j We impose the restriction that ω = ω j where ω j is the transition frequency of the targeted qubit. Using these gate parametrisations, we introduce the following inverse Fourier transforms, where As the signal is finite in time domain, we can enforce that a(t) and b(t) are zero outside t ∈ [−τ /2, τ /2]. This enables us to change the limits of the time integral to ±∞ and pass the time integral through the frequency integral. We also use the following identity for further simplifications. For a(t) and b(t) to be real functions with well defined phases, we enforce Using the above conditions, we arrived at the expressions where : for i = j where X T and Y T are the intended angles of rotation on the x-axis and y-axis respectively. At this point, there is no loss of generality in a(t) and b(t) because we have not yet applied the rotating wave approximation, and can describe any control signal.

Statistical Model of Gate Errors
In reality, the control fields have noises and the frequencies of the qubits fluctuate slowly between the computational shots. Thus, the real control field can be written as where , φ and δ are free parameters representing the amplitude, phase and frequency noises respectively. Let the target evolution operator for a single qubit gate on the j th nucleus within an N qubit cluster to be The actual evolution operator is defined as The analytical expression for infidelity can be written as This expression for indelity compares an ideal gate to an experimental gate [51]. It is used as it requires minimal computation when compared to other figures of merit, and is equivalent to the usual performance function implemented in the GRAPE algorithm [34]. Hence, the infidelity expression can be written as we can then expand the above expression, and keep only the terms up to second order, which gives The control signal has a general form of As such, the set of linear coefficients c n and d n that minimise the infidelity caused by the amplitude, phase and frequency noises can be solved. Assuming the noise in the qubit frequencies δ, pulse amplitudes (1 + ) where is the fractional error of the pulse amplitudes and the phase noise φ are described by Gaussian distributions centred at zero with their respective standard deviations of σ δ , σ and σ φ , the analytical expression for the average infidelity can be written as The minimum average gate infidelity is found when are satisfied for each function in the expansion.

Generation of Basis Functions
In this paper, we use frequency-shifted sinc functions as an ansatz for our control pulses in the frequency domain. Sinc functions were chosen as they represent pulses of finite duration in the time domain. As previously demonstrated in equation 2.26a and 2.26b, the key property is the amplitude at very specific frequencies (ω i , ω j ). The interference of frequency-shifted sinc functions allow us to cancel the pulse amplitude at certain frequencies whilst at the same time amplifying other frequencies. The parametrisation of a(ω) and b(ω) is given by where f (n) , g (n) are the pulse amplitudes in the frequency domain, µ (n) and ν (n) are the frequency shifts and (n + 1) is the total number of basis functions used in the optimisation procedure.
If the frequency shifts for the n th solution are defined as then the above simply becomes a Fourier Cosine series, which is known to be a complete basis of even functions. However, as seen in equation 2.19a and 2.19b, the a (n) (0) and b (n) (0) terms would then be where for n ≥ 1, a (n) (0) and b (n) (0) terms will always be 0. Hence, without any contributions from the a (n) (0) and b (n) (0) terms, generating an optimal pulse from only the a (n) (2ω j ) and b (n) (2ω j ) terms would require extremely large amplitudes.
One possible method to overcome this complication is to use Kadec's 1/4 Theorem, where an additional small frequency shift is introduced to the Fourier series whilst retaining the completeness of the Fourier series. It was shown that when the additional frequency shift is bounded by a maximum value of 0.25 (2π/τ ), the inequality will still be able to generate a continuous set of sinc basis functions [52]. In this paper, we have chosen the upper bound to be 0.2 (2π/τ ) as it enables us to resolve better solutions for the pulse amplitudes (see figure 2).  (1) for a π rotation about the x-axis with τ = 1 µs and n = 1. This is the first step of the optimisation procedure where we generate optimal solutions for the pulse amplitudes in the absence of control errors. Due to the experimental hardware constraints, the maximum pulse amplitude in the time domain is set to be approximately 25 Mrad/s. Consequently, the corresponding search range for f (n) and g (n) is limited to f (n) , g (n) ∈ [−5, 5] as the amplitude in the time domain has an additional factor of 2 √ 2π from the inverse Fourier transform of the basis functions. An addtional shift of 0.2 (2π) allows us to resolve better solutions with lower infidelity compared to no additional shift and a 0.1(2π) shift within a bounded search range for the amplitudes. See text for more discussions regarding the optimisation procedures.
The frequency shifts are then redefined as As a result, there are many solutions in a (n) (ω) and b (n) (ω) that we can consider and these solutions form a linear basis for the construction of optimal pulse functions. These optimal pulse functions are found by determining the linear coefficients that minimise the effects of the pulse errors on average. As an example, consider a two-qubit system which consists of 15 N and 13 C nuclear spins. Their respective hyperfine interactions are given by A N ≈ 2π × 3 MHz [45] and A C ≈ 2π × 0.413 MHz [4]. The background static magnetic field is chosen to be B 0 = 0.62 T [4]. Using these parameters, we demonstrate the optimisation procedure for an X gate targeted at the 13 C nuclear spin (X 2 ). Since we are performing rotations only in the x-axis/ y-axis, we assumed there are no mixed signals in the pulse and thus, there are no contributions from the b(ω)/ a(ω) components. Hence, the equations that are satisfied by the solutions are given by where ω N = A N + γ N B 0 and ω C = A C + γ C B 0 . The first step of the optimisation routine is to generate a set of basis functions that minimise the infidelity landscape in the absence of control errors against the pulse amplitude, f (n) and g (n) in the frequency domain as described by equation 2.24. This step minimises crosstalk between the qubits. The search range for the pulse amplitudes is constrained by the design of our MW/RF system and this corresponds to the shortest gate time that we can perform for our qubit gate operations. In the time domain, the pulse amplitude, a(t) is determined by γ i B 1 (t) where γ i is the gyromagnetic ratio of the nuclear spin qubits and B 1 (t) is the maximum amplitude of the oscillating MW/RF magnetic field. Implementing the typical values used in an experiment, the pulse amplitude a(t) is thus limited to a maximum value of approximately 25 Mrad/s. The corresponding search range for f (n) and g (n) is then limited to f (n) , g (n) ∈ [−5, 5] as the amplitude in the time domain has an addtional factor of 2 √ 2π from the inverse Fourier transform of the sinc basis functions. Similarly, the set of basis functions for a two-qubit CZ gate is generated using the same procedure. In this case, the pulse amplitude is limited to approximately 80 Mrad/s. The search range for f (n) is bounded in the region of f (n) ∈ [0.5, 15] as the intrinsic infidelity expression used to describe two-qubit gate operations, equation B.1 is symmetrical. Using f (n) = 0, the optimisation will be stuck in a local minimum and clearly, f (n) = 0 corresponds to no physical pulse. Thus, in order to find sensible solutions, we shifted the initial search boundary by 0.5 to stimulate the optimisation to find another local minimum. The equations that are satisfied by the solutions are given by equations A.9, A.10, A.11 and A.12.
As seen in table C1, the pulse amplitudes are dependent on the angle of rotations and gate time. This is consistent with the formulation of our sinc basis functions where larger amplitudes are expected to generate greater angle of rotations for a given frequency shift and gate time. We are also able to resolve solutions with lower infidelity for smaller angle of rotations at shorter gate times as the pulse amplitudes are smaller. We observed a trend, shown in table C1 and C2, where at short gate times, higher order basis functions have solutions which correspond to the lower or upper bound of the allowed values. These pulse amplitudes minimise the infidelities within the search range imposed on them as set by physics. While some of the pulse amplitudes correspond to the maximum or minimum allowed values, the critical factor in this optimisation routine is the linear combinations of these generated basis functions as discussed in the following section.

Optimal Gates for Non-Ideal Control System
Using the computed optimal solutions, the infidelities of an X gate, Hadamard gate and a CZ gate are calculated. We approximated the standard deviations of the Gaussian phase, amplitude and frequency noises to be σ φ = 10 −3 /2π MHz, σ = 10 −3 , σ δ = 2π × 10 −3 MHz for nuclear spins and σ δ = 2π × 27.5 × 10 −3 MHz for the electron spin [42]. Phase noises are excluded from the calculation of average infidelities for a CZ gate as we are performing a 2π pulse. The first order effects due to the phase errors are negligible. As shown in figure 3, the average infidelities for an X gate, a π/2 rotation about the y-axis and a Hadamard gate fluctuate with a single basis function, since the function parameters depend on the local infidelity landscape during the initial basis function computation. However, the optimised linear combinations of two or more basis functions yield infidelities of approximately 10 −6 for an X gate and a Hadamard gate. Infidelities up to 10 −7 can be achieved for a π/2 rotation about the y-axis. The infidelities are monotonically decreasing with increasing number of basis functions and only 3 basis functions are required for the infidelities to converge to the optimisation limits. Thus, this demonstrates the capability of this optimal control method, and allow us to systematically assess the degree of convergence to the optimisation limits. For a system with two qubits, using more than 3 basis functions does not significantly lower the infidelities. In general, we expect to achieve infidelities on this order (of 10 −6 ) for any single-qubit operation.
Given that this method can achieve low infidelities, the main objective is to shorten the gate time without significantly affect the infdelity, in order to minimise the effects due to decoherence. We analyse the overall amplitudes of the respective linear combinations of the sinc basis functions in order to find the minimum gate time for which the pulse amplitude can still be practically generated. Using estimated constraints described in the previous section, the maximum threshold for the amplitude is set to be log 10 (25)≈ 1.40. Figure 4 depicts the overall amplitudes of the basis functions in the linear combinations for different basis size and gate times. At some gate times, the linear combinations have larger amplitudes when more basis functions are used. Since the optimal control method optimises the linear combinations of the basis functions such that the infidelities are monotonically decreasing, therefore, if the amplitude in the initial basis function is large, then it may result in larger amplitudes for the subsequent basis functions when the basis size is increased. Based on the generated data, we determined that a 0.5 µs X gate with an infidelity of ∼ 10 −6 can be achieved with just two basis functions (figure 4b). Despite being able to perform a 0.5 µs X gate, we need to take into account various gate times required for different types of rotations where slower gates are required for bigger angle of rotations. Thus, on average, a conservative estimate for the fastest single-qubit gate that we are able to perform with ∼ 10 −6 infidelity is approximately 1 µs. As per the single-qubit gate case, the maximum threshold for the amplitude of a two-qubit CZ gate is set to be log 10 (80)≈ 1.9. From figure 5, on average, gate infidelities of 10 −4 ∼ 10 −6 can be achieved with 4 or more basis functions for gate times greater than 1.5 µs. The calculated infidelities are monotonically decreasing with increasing basis size. However, more basis functions are required to fully demonstrate the convergence of these infidelities to the optimisation limits. Based on the optimisations shown in figures 5 and 6, the fastest two-qubit CZ gate that we can perform with an infidelity of ∼ 10 −6 is 1.5 µs and it requires 6 linear combinations of basis functions. Figure 5: The performance of a two-qubit CZ gate in the presence of frequency and amplitude noise. The infidelities are plotted as a function of total number of basis functions M and gate time τ . On average, gate infidelities of 10 −4 ∼ 10 −6 can be achieved by using linear combinations of at least 4 basis functions for gate times greater than 1.5 µs. While the infidelities are monotonically decreasing with an increasing number of basis functions used in the optimisation, more basis functions are required to demonstrate the convergence of these infidelities to their respective optimisation limits.
(e) (f) Figure 6: Plots of the overall amplitudes, A, which generate the minimum averaged infidelity of a CZ gate ranging from 1 basis function (a), to 6 basis functions (f). Similar to an X gate, at certain gate times, the overall amplitudes are much larger when the number of basis functions are increased and they are dependent on the initial solutions generated in the minimisation of the intrinsic infidelity expression (equation B.1).

Gate Performances in Presence of Decoherence
As mentioned in section 1, the nuclear spin qubits undergo pure dephasing due to the relaxation of the electron spin. Thus, the coherence time of the qubits is bounded by the relaxation time of the electron spin. Here, we introduce the master equation which is also known as the Lindblad equation [53,54] where the sum over m is the summation of the decoherence mechanism over the individual nuclear spins. Instead of solving the differential equations, we can express the Lindblad equation as a vectorized density matrix [55] where G is the decoherent part of the Lindblad equation with the form of and The overline denotes the complex conjugate, † is the adjoint, H is the Hamiltonian of the qubit system and I is the 2 N × 2 N identity matrix where N is the number of qubits in the system. For a time dependent Hamiltonian, the Linblad equation is given by The Lindblad operator L describing the dephasing of the nuclear spin qubits induced by a random electron spin flip can be written as where the nuclear spin T 2 is defined by the relaxation time T 1 of the electron spin (1.8 ms) and σ z,m is the Pauli matrix for the z component of each nuclear spin.
To assess the effects of decoherence due to dephasing, we consider an example with a perfect X gate. The aim now is to solve for ρ(t) and calculate its state fidelity defined as where ρ I is the ideal density matrix without the effects of decoherence and ρ is the simulated density matrix of the system. As shown in figure 7, the errors in both cases have magnitudes of approximately 10 −3 ∼ 10 −4 . These errors are much larger than the errors caused by the effects of frequency, phase and amplitude noises as demonstrated in section 4. The effects of decoherence also increase with longer gate times. Since we can't remove the electronnuclear spin coupling as it is required for the selective operations, this leads to the conclusion that for a given electron T 1,e time, the only solution to improve the gate fidelity is to make the gate operations faster.

Simulation of QFT On A Diamond Quantum Processor
In this section, we set a benchmark for the optimal performance of a diamond quantum processor by simulating quantum algorithms. The benchmark will provide us with insights into the limits of the processor, which can be used to aid the design and comparison of the device in the near future.
Building on the results from section 5, as the errors due to decoherence are much larger, we can simulate quantum algorithms on the diamond quantum computer without considering the gate errors. The key metrics for simulation will be the error probability and the total computational time (ignoring initial loading time of the computational control systems). These two metrics are chosen since computational time is the primary resource and the error rate is the key quality of a quantum computer. We have chosen to compute quantum Fourier transforms (QFT) for simulation as QFT is widely used in quantum algorithms. A further motivation is the similarity to the algorithms used for enhanced quantum sensing using a register of nuclear spin qubits. The initial state was chosen in a way such that an output state of |001 (3 qubit QFT) and |00001 (5 qubit QFT) will be the only outcome with a probability of 1 [56]. U (θ) denotes the phase gate of θ. The 2 dots joined with a line denotes a controlledphase gate with their respective phases written in the circuit. We can write the controlled-phase gate in this notation as the matrix representation for this operation is the same regardless of which qubit is the control/target qubit, i.e C 1 PHASE 2 = C 2 PHASE 1 . For a 3 qubit quantum Fourier transform, gate operations are performed on the first three qubits only (starting from the top).
The simulations of QFT on the diamond quantum computer are done using equation 5.5 with electron relaxation time of T 1,e ≈ 1.8 ms. These simulations will be iteratively solved for multiples of the fastest single and two-qubit gate times by only considering the effects of decoherence. The total number of pulses required for 3 qubits (QFT3) and 5 qubits (QFT5) quantum Fourier transforms after the decomposition into rotations about the x and y axis and CZ gate are 75 and 195 for QFT3 and QFT5 respectively (Appendix D).
The total computation time on a diamond quantum computer can be broken down into shot time and initialisation/readout time. Shot time can be regarded as the total duration of the pulses required for an experiment. Assuming single-qubit gate times of 1 µs and CZ gate time of 1.5 µs, the optimal pulse duration for QFT3 and QFT5 are approximately 79.5 µs and 208 µs, respectively. For the initialisation/readout time (single-shot readout), we are going to apply M number of readout cycles per qubit. Thus, the total time for initialisation/readout is given by where n is the number of qubits, M is the number of readout cycles applied per qubit and t c is the time per cycle. Time per cycle is based on the time of the optical pulse required for readout (t opt ) and the time of the microwave pulse (t mw ) required to perform the CNOT gate for repetitive measurements. These two time quantities are approximately 1 µs each. M is chosen to be 500 as it has the same magnitude as other numbers of repetition which achieve an initialisation fidelity of 0.99 given a specific relative shift of the initialisation threshold [4]. The time required for a single-shot readout is given by T QFT3 = 79.5 × 10 −6 + 3 × 500 × 2 × 10 −6 ≈ 0.0031 s (6.2) T QFT5 = 208 × 10 −6 + 5 × 500 × 2 × 10 −6 ≈ 0.0052 s (6.3)  Figure 9: (a) Single-shot simulated fidelity of 3 qubits and 5 qubits quantum Fourier transform. We assumed perfect initialisation/readout fidelity. (b) Simulated total computation time of QFT3 and QFT5 assuming optimal gate time of 1 µs and 1.5 µs for single-qubit gates and CZ gate, respectively.
As depicted in figure 9a, simulation of QFT3 is able to achieve higher fidelity (≈ 0.962) than QFT5 (≈ 0.845) when they are simulated using the optimal gate time of 1 µs for single-qubit gates and 1.5 µs for CZ gate as QFT3 has smaller circuit size. Implementing a lesser number of gates will introduce less errors during the evolution of the quantum states, thus an algorithm with a smaller circuit size will achieve greater output state fidelity. The larger decay constant in the fitted model of QFT5 indicates that it is important to perform optimal control on the pulses to obtain the fastest gate possible with the lowest infidelity, as well as optimising our circuit size. Consequently, this will give us the best result when an experiment is performed using an actual diamond quantum computer. To simulate the performance of a diamond quantum computer over time, we first create a binomial distribution of probabilities with the respective optimal values for QFT3 and QFT5 simulated on a diamond quantum computer. Then, we simulate the probability distribution for a range of shots using Markov chain Monte Carlo method. Markov chain Monte Carlo allows us to approximate the probability distribution of the fidelities through random sampling. With short computation time per shot for QFT3 and QFT5, the simulations of QFT3 and QFT5 on a diamond quantum computer are able to converge to their respective optimal values of 0.94 and 0.84 in less than 10 s. These results can be used as a benchmark for comparison with other quantum computing architectures in the future.

Conclusion
In summary, we have presented a complete model of a diamond quantum processor, the gate operations and their implementations. We have developed a semi-analytical optimal control method which theoretically produces the fastest gate with highest fidelity up to date in the absence of decoherence. This optimal control method uses frequency-shifted sinc functions as an ansatz for the control pulses. We have demonstrated this optimal control method on an X gate, a Hadamard gate and a CZ gate. We find that the errors due to the effects of decoherence are much larger than the gate errors. Moreover, the simulated performance of a diamond quantum computer shows promising results where it can perform fast computations with low error probability. Our results will aid the design and the development of diamond quantum computers and enhanced quantum sensors. A future extension will be implementing a feedback control system as an optimal control method which tunes the pulse parameters based upon the output of a physical diamond quantum computer.

Acknowledgments
We acknowledge support from the Australian Research Council (DP170103098 and DE170100169) as well as the Australian National University, University of Canberra and Charles Sturt University through the Discovery Translation Fund 2.0, managed by ANU Connect Ventures.
f (1) f (2) f (3) f (4) f (5) 0.25 0.500 0.500 0.500 0.500 0.500 0.500 0. 50 A swap gate can be constructred from 3 CNOT gates. They can be written as Here, a and b denotes any two qubits in the system. Phase gate can be written as For a controlled-phase gate, one way to construct them is as follows where θ is the intended phase. Note that the operations for these gate decompositions are performed from the right to the left. Using these decompositions, the total number of pulses required for QFT3 and QFT5 is 75 and 195 respectively.