On the Quantum Performance Evaluation of Two Distributed Quantum Architectures

Distributed quantum applications impose requirements on the quality of the quantum states that they consume. When analyzing architecture implementations of quantum hardware, characterizing this quality forms an important factor in understanding their performance. Fundamental characteristics of quantum hardware lead to inherent tradeoffs between the quality of states and traditional performance metrics such as throughput. Furthermore, any real-world implementation of quantum hardware exhibits time-dependent noise that degrades the quality of quantum states over time. Here, we study the performance of two possible architectures for interfacing a quantum processor with a quantum network. The first corresponds to the current experimental state of the art in which the same device functions both as a processor and a network device. The second corresponds to a future architecture that separates these two functions over two distinct devices. We model these architectures as Markov chains and compare their quality of executing quantum operations and producing entangled quantum states as functions of their memory lifetimes, as well as the time that it takes to perform various operations within each architecture. As an illustrative example, we apply our analysis to architectures based on Nitrogen-Vacancy centers in diamond, where we find that for present-day device parameters one architecture is more suited to computation-heavy applications, and the other for network-heavy ones. Besides the detailed study of these architectures, a novel contribution of our work are several formulas that connect an understanding of waiting time distributions to the decay of quantum quality over time for the most common noise models employed in quantum technologies. This provides a valuable new tool for performance evaluation experts, and its applications extend beyond the two architectures studied in this work.


INTRODUCTION
Quantum communication promises to fundamentally enhance internet technology by enabling application capabilities that are impossible to attain classically. On the one hand, quantum communication could be used to link quantum processors at large distances, enabling quantum internet [29,59] applications such as secure communication [3,16], improved clock synchronization [30], or secure delegated quantum computation in the cloud [10]. On the other hand, quantum communication could connect quantum processors at short distances in order to link several smaller quantum processors together to form one more powerful quantum computing cluster [24].
To support distributed quantum applications, the architecture of a quantum network node should be capable of two key functions: first, it should enable local quantum computation, i.e., the execution of quantum gates and measurements, at each end node [59] in the network on which applications are run. Second, it should enable the generation of quantum entanglement between any two nodes in such a network. Entanglement is a special property of the state of two quantum bits (qubits), which cannot be simulated using any form of classical communication between the nodes. A typical quantum network application consists of both local quantum computations (a) single-device architecture (b) double-device architecture Fig. 1. Two possible architectures for a quantum processor interfaced to a quantum network: in the first, the processor and the network device are the same device (single-device (SD) architecture). This device may have an internal logical or physical division into a computing or networking component. An example of a physical division is the use of a subset of its qubits for networking and others purely for computing. An example of a logical division is a scheduler switching between both functions but networking and computation are performed using the same qubits. In the second, two separate devices are used (double-device (DD) architecture). An application interacts with the system by making three types of requests: local quantum computations (on the computing component/device), network operations (entanglement generation), and movement (state transfer) of generated entanglement into the processor for further processing. The latter requires cooperation from both processing and network devices. For SD, a move could be achieved simply by transferring the state to another set of qubits on the same device. For DD, a move is much more complex, and can be realized e.g., using entanglement generation between the processor and the network devices, followed by teleportation. and the generation of entanglement, where different applications may have more demand for local quantum processing, or for entanglement generation. An example of an application that is computation-heavy is secure delegated quantum computation [10]. In contrast, quantum key distribution (QKD) [3,16,50] forms an example of an application that is network-heavy, i.e., it is dominated by entanglement generation and the only local operations are measurements. Balancing local and networked operations (entanglement generation) can also be important in the efforts to build a quantum repeater [2,8,9,23], i.e., a special quantum node that can eventually enable entanglement generation over arbitrarily long distances [18,35]. In this case, proposals for such repeaters employ both local quantum operations (e.g., to perform entanglement purification [4,14,15]), as well as entanglement generation with neighbouring repeater nodes. When analyzing the performance of quantum networks, one is typically interested in understanding traditional performance metrics, such as the throughput or latency of entanglement generation and local gate execution. Importantly, however, the performance analysis of quantum technologies also demands a characterization of the quality of the quantum execution, i.e., how noisy quantum states and operations are. Such a characterization is motivated both by long-term fundamental aspects of quantum applications, as well as the more short-term technological limitations of present-day quantum devices. In the classical world, a system is typically constructed in such a way that all errors are eliminated towards the application [13]. That is, an application sees essentially noise-free network transmissions and CPU operations. For many quantum network applications, however, noise-free transmission and quantum gate execution are not compulsory. A good example is QKD [3,16,50], where noise at the quantum level is dealt with using classical error correction, after measuring the quantum state, in a way that is specific to the application.
In quantum networked systems, fundamental tradeoffs exist between the quality of the quantum execution, and standard performance metrics such as throughput and latency. A key performance metric in a quantum network is the quality of entanglement (see Section 2.2.4) being generated between two remote network nodes, where one can choose to trade a higher throughput of entanglement generation, against a lower quality of the resulting entanglement and vice versa [13]. On a quantum processor, we furthermore want to understand the quality of a quantum gate's execution, and consequently the quality of the quantum program being executed. If quantum devices were perfect, a quantum gate could be executed with perfect quality, that is, the output is precisely as intended and no noise has occurred. In practice, however, technological limitations mean that gates on all present-day quantum processing platforms are noisy. Such noise can stem from inherent imperfections of the device (constant noise), as well as a time-dependent contribution that depends on the waiting time before the quantum state can undergo further processing. The latter form of noise is especially relevant when analyzing networked quantum processors, as is the focus of this work, where we frequently need to wait for a signal from the remote node before processing can continue. However, it also arises when trying to analyze any form of scheduling algorithm on an advanced quantum processor. In the quantum literature, the quality of a quantum state is measured by its fidelity, and the quality of executing a gate by its gate fidelity (see Section 2.2.4). Intuitively, the fidelity is a number in the interval [0, 1] that measures the closeness of the state (or gate) to a desired target implementation. The larger the fidelity, the closer we are to the target implementation, i.e., the higher the quality of the quantum state (or gate). In this work, we focus on these quantum performance measures -specifically, we study gate fidelity in distributed quantum architectures, as well as the fidelity of entanglement generated by applications that run on them.
Given the need to perform both local quantum operations as well as network operations in order to realize distributed quantum applications, we here consider two different general architectures for interfacing a networked quantum processor to a quantum network (Figure 1). In the first, which we call the single-device (SD) architecture, the same device is used to perform both network operations as well as local quantum computation (Figure 1a). This is the case in all present-day implementations, such as networked quantum processors based on Nitrogen-Vacancy (NV) centers in diamond [43], or Ion Traps [31]. Abstractly, one can think of these as quantum processors that have two different types of qubits: communication qubits (networking component) with an optical interface for remote entanglement generation, and storage qubits which can only be used for local processing. Limits on experimental control typically prohibit the simultaneous execution of local (two-)qubit gates, and entangling operations. That is, while entanglement generation is in progress, local quantum processing is on hold, and vice versa. The time necessary for local gate execution only depends on the local processing speed. However, the time required for entanglement generation depends on the physical distance to the remote network node. Consequently, in a situation in which the remote node is at a distance, local processing may need to be suspended for a significant amount of time while entanglement generation is in progress.
In the second architecture, we hence consider a scenario in which the system is enhanced by the introduction of a dedicated network device solely used for the purpose of entanglement generation with remote network nodes ( Figure 1b); we refer to this as the double-device (DD) architecture. In this architecture, the network device is linked internally to the processor. While it is not important how this is achieved physically for our general analysis, we provide an example architecture in which both devices are based on NV centers in diamond ( Figure 2) where the internal interface is realized by teleporting the entanglement of the network device into the processor. This requires an additional step of producing entanglement between the network device and the processor to perform the teleportation transfer. Yet, since this entanglement needs to be produced only at very short (on-chip) distances, generation is fast. This means that remote entanglement generation via the external networking device and computations on the processor only need to be suspended for a short amount of time when the entanglement is transferred from the former to the latter. We remark that our analysis is fully general and could also be used to understand physical systems that divide the processor into networking and computing "zones" like segmented ion traps [41,48] or ones that use two different physical systems, such as for example NV centers in diamond for the processor, but a simpler device such as a quantum memory based on atomic ensembles [49] as the network device.
When deliberating such architectural choices, several considerations are of concern: first, it is clear that performance may depend on whether we execute a computation-heavy, or a networkheavy application. Indeed, it is clear that in the case of quantum key distribution, where there are no local quantum gates being executed and we simply measure the entanglement right away, the DD architecture may only introduce an unnecessary overhead in implementation. Second, we expect that the performance of both architectures depends on the inherent quality of the quantum devices used to realize them. One key concern is the ability of the quantum device to store quantum states during waiting times: a lower memory lifetime means that waiting times have a much larger impact on the quality of execution. Similarly, the quality of the interface between the processor and the network device is of concern in DD architectures, as it may reduce the quality of the entanglement being transferred. Finally, while the DD architecture may be of great intuitive appeal, it is much more cumbersome to realize experimentally since one additional device must be constructed. This raises a very practical question as to what achieves more benefit to application performance: implementing the double-device architecture, or investing efforts into improving the quality of the components (e.g., to achieve higher memory lifetimes) in the single-device architecture.
Here, we make the following contributions in analyzing the two architectures: □ We provide mathematical formulas for computing the gate and entanglement fidelities for standard quantum noise models. These formulas can be applied to any quantum performance analysis problem, where one would like to understand how a waiting time affects the quality of quantum gates and entanglement. As such, they allow standard methods from performance analysis that determine the waiting time distribution to be carried over to the quantum domain. □ For the two architectures introduced above, we determine the most defining characteristics and operational features. We then incorporate these features into a model that is representative of both architectural designs -specifically, we employ a continuous-time Markov chain (CTMC) to model entanglement generation in a regime where local quantum computation consumes negligible time. This is well motivated in the regime where the distance between processors is large as in a quantum internet, and the time required to produce entanglement dominates with respect to the time to perform local quantum gates. In this case, we obtain analytical expressions for the qubit waiting time distribution, subsequently allowing us to apply our fidelity computation method to obtain expressions for the average gate and entanglement fidelities for the two architectures in closed form. We later relax the assumption that local gates take negligible time, and explore the effects of more time-consuming computation via simulation. The latter is relevant when the processors are physically close. □ Using the aforementioned analytical results, we determine the strengths and weaknesses of the two architectures. Our analysis can be used to examine general tradeoffs between the quality of the quantum devices, the application behaviour (computation or network-heavy), and the resulting fidelities for quantum gates and entangled states being produced. In a regime where DD quantum state transfer operations are more noisy than SD ones (e.g., when they rely on imperfect gates), we find that the SD architecture is the more suitable option for applications that are network-heavy, while the DD architecture benefits computation-heavy applications. While the DD architecture outperforms the SD design in terms of average gate quality, its more complex design makes it harder to implement in practice. We provide sufficient conditions indicating how much the quantum memory lifetime of the components used to implement the SD architecture would need to be improved, in order to achieve the same performance as the DD architecture. □ We apply our analytical techniques to evaluate the performance of the two architectures under the assumption that they are realized using the Nitrogen-Vacancy center in diamond platform -a strong candidate for implementing near-term and future networked quantum nodes [1,21,22,40].
We explore the effect of state transfer operations on entanglement fidelity and where possible, present the pre-move and post-move entanglement fidelities in closed form. We validate our analytical results with NetSquid [12], a discrete-event quantum network simulator.
The rest of this paper is organized as follows: in Section 2, we discuss related work and cover the relevant quantum background. In Section 3, we introduce the CTMC that is used to model both architectures, and discuss the modeling assumptions. In Section 4 we determine the amount of time that a qubit must spend waiting in storage before it is processed. This waiting time distribution, along with a noise model, can be used to obtain the average gate and entanglement fidelities -in Section 5, we introduce a method for accomplishing this in a general setting and subsequently apply it to the architectures in Figure 2. In Section 6 we show that when the two architectures have memories of identical quality, the DD architecture always outperforms the SD architecture in terms of average gate fidelity. Interestingly, it is possible that an SD-architecture device with better memories (and thus with a similar performance to a DD device with poorer quality memories) may be the more economical option in terms of manufacturing cost. For this reason, in Section 6 we also present sufficient conditions that, if satisfied, ensure the SD outperforms the DD architecture in terms of gate fidelity. In Section 7, we present analytical and simulation results and make numerical observations in a variety of settings. We make concluding remarks and discuss extensions of the problem in Section 8.

Related Work
In practice, for any physical platform implementing a networked quantum processor, the quality of quantum gates and states is determined experimentally (see e.g., [1,22,28,37,40,44], among a multitude of others). The objective of these measurements was to characterize one specific setup, but not to explore tradeoffs of potential architectural designs. For a string of quantum repeaters with the goal of producing entanglement over long distances, some analytical studies exist that characterize the quality of very specific quantum states, and study their distribution to guarantee a minimum threshold quality see, e.g., [7,20,33,52]. Some analytical studies also exist for the so-called quantum switch, [56][57][58], wherein the authors study the maximum possible rate of entanglement switching and the expected number of entangled qubits in storage. These works are very different in spirit since they focus on the creation of quantum entanglement over long distances, and not on tradeoffs between network and computation operations as we do here. We emphasize that in this work we do not assume that the quantum architecture is used for any specific purpose, and make very few assumptions on the physical platforms used to realize potential architectures. Instead, our goal is to abstract these details in the form of configurable modeling parameters, e.g., the demand for entanglement, or the rate at which it is successfully generated. At the time of writing, we are also not aware of a study that considers the interactions between quantum computation and networking within a single system, and how contention for resources and processing time affects fidelity.

Quantum background
2.2.1 Qubits, Quantum States, and Quantum Gates. Here, we provide the necessary formalism needed in this work, and refer to e.g., [39] for a more in-depth introduction. Quantum information is encoded using quantum bits, or qubits, in contrast to the usage of bits in traditional computing. In addition to holding information in the form of discrete values such as 0 or 1, qubits may hold quantum states that are linear combinations of these values. A (pure) quantum state can be expressed as a vector | ⟩ ∈ C of length 1, where is often considered to be finite dimensional in quantum technologies. For qubits the dimension is = 2 , where it is customary to label basis elements of C by strings = 1 , . . . , ∈ {0, 1} . The Dirac notation |·⟩ is used to represent a vector and is referred to as a ket while the conjugate transpose, ⟨·| = |·⟩ † = (|·⟩ * ) , is referred to as a bra.
Quantum information is manipulated through the application of quantum gates. A quantum gate is represented by a matrix ∈ C × , where is unitary, i.e., is an identity matrix of dimension . Applying a quantum gate gives us the state | ′ ⟩ = | ⟩. Quantum computing and networking applications are realized by applying a series of quantum gates to one or several qubits and then performing a measurement of the qubits to read out information in the quantum states.

Noisy Quantum States and Operations.
Noisy Quantum States. A convenient way of representing a quantum state | ⟩ is as a density matrix = | ⟩⟨ | which is obtained by taking the outer product of the ket and the bra of the state. Importantly, the density matrix formalism allows for the expression of noisy quantum states. For example, a probabilistic process that prepares a desired state | 0⟩⟨0 | with probability 1 − , but fails and instead prepares | 1⟩⟨1 | with probability , results in a noisy quantum state = (1 − )| 0⟩⟨0 | + | 1⟩⟨1 |. In general, the set of all quantum states on a -dimensional quantum system (including noisy ones) corresponds to the set of matrices S = ∈ C × , ≻ 0 is positive semi-definite and normalized Tr( ) = 1 .
Noise Processes. Using this formalism we can now express the effect of noise on a quantum state. As an example, imagine a noise process that transforms a quantum state initial that is placed into a quantum memory at time = 0, into a noisy quantum state noisy after a waiting time . Qubits are susceptible to environmental noise that can inadvertently change their quantum state. Such noise can arise due to imperfect shielding of qubits from external influence as well as imperfect implementations of quantum gates. Mathematically, the set of all possible noise processes corresponds to the set of completely positive trace preserving maps (CPTPM) Λ : S → S. The effect of environmental interaction on quantum states over time is often referred to as decoherence and is modeled through the use of noise models describing Λ. Common noise models include Λ = D depolarizing noise, which drives a quantum state towards the maximally noisy state, also called the maximally mixed state I 2 . This state is the quantum equivalent of white noise. Here, time dependence is often expressed by letting = 1 4 (1 − − ), for a fixed characterizing the quantum memory storing the quantum state. This allows one to express the noise incurred in a quantum memory storing a qubit, after a waiting time of has elapsed. A model of depolarizing noise is often used as a worst case estimate, when the physical noise process is insufficiently characterized.
In the literature describing implementations of quantum memories, we often have more information about the noise process of the quantum memory device. This noise is generally modeled as dephasing and damping noise (or a combination of both). Dephasing noise is expressed as where similarly = 1 2 (1 − − 2 ) is used to express time-dependence, for a fixed 2 characterizing the memory. This can be understood as an analogue of the classical binary symmetric channel, where a flip operation (here, ) is applied with some probability .
Another common model is the amplitude damping noise channel where 0 , 1 have the form and = (1 − − 1 ) for a fixed 1 characterizing the effects of the amplitude damping channel. This can be understood as the quantum analogue of a noisy channel of one-sided error which preserves | 0⟩⟨0 |, but damps | 1⟩⟨1 | to | 0⟩⟨0 | with an error probability .
In most physical implementations of quantum devices, both P and A occur and the noise is described by a composite model where, in general, 2 < 1 [1,11,25,39,54]. Larger values of , 1 , and 2 correspond to a quantum memory with a longer memory lifetime.
Noisy Quantum Gates. The effect of a noisy quantum gate can be described in an entirely analogous fashion. Note that in terms of the density formalism, the effect of applying a gate on a quantum state can be expressed as where we follow convention and use both to denote the unitary matrix, as well as the CPTPM as indicated by context. When modeling noise in quantum gates it is customary to model a noisy implementation E as the ideal gate, followed by possibly time dependent noise N . That is, E = N • , where is the ideal implementation of the gate. We will follow this custom here.
As an example, consider a situation in which we perform the gate = , but then incur a waiting time of before the next quantum operation is applied. If the total noise process (inherent noise in the gate, plus noise due to waiting) is described by dephasing and damping noise C , then the initial state initial is transformed to the noisy state after the waiting time of has elapsed.

Entanglement.
Most quantum applications rely on a special property known as entanglement that qubits can share. Mathematically, a state ∈ C × ⊗ C × of a combined quantum system of nodes (or qubits) and is called separable if and only if it can be written as a classical mixture (i.e., convex combination) of tensor products of single-node states (i.e., = ⊗ for some distribution { } , and states { } on and { } on ). Intuitively, separable states have only classical correlations between and , since we may toss a coin according to and then prepare individual states on and without any form of quantum interaction between them. Any state that is not separable is called entangled. In general, |Ψ⟩ = 1 √ | ⟩ ⊗ | ⟩ is a maximally entangled state between two -dimensional systems and . Such entangled states form the primary building block of most quantum network applications.

Quantum Quality Measure: Fidelity.
Fidelity of a state. The fidelity of a quantum state measures how well this state approximates a specific target state | ⟩. It is the relevant quantity used to understand how a fixed noise process (e.g., during the preparation of the state) affects its quality, or how the quality of an already prepared state decreases as a function of a waiting time that this state spends in a quantum memory. Specifically, the fidelity of a state to a target state | ⟩ is defined as [17,55] = ( , | ⟩) = ⟨ | | ⟩ (10) such that = 1 iff is identical to the target state | ⟩. The fidelity lies in the interval [0, 1] and larger values of indicate that is closer to the target state | ⟩.
Gate fidelity. The average gate fidelity measures how well a real-world implementation E approximates a desired target gate , and is defined as (see e.g., [38]) where Ψ is the Haar (uniform) measure on the set of quantum states. That is, the gate fidelity measures how well the implementation approximates the target gate when applied to a specific input state |Ψ⟩, averaged over all possible input states. When E = N • (see above) for some time-dependent noise N , we will also use the shorthand to denote the resulting average fidelity. Eq. (12) is the relevant quantity when we are interested in the question: provided we had to wait time after executing the gate (e.g., due to a scheduling decision), what is the resulting gate fidelity? We remark that since is unitary, the case where timedependent noise is applied before the execution of the gate instead reduces to simply studying ∫ Ψ⟨Ψ|N (| Ψ⟩⟨Ψ |)|Ψ⟩. As we will see, our later formulas apply to both cases. Entanglement fidelity. The entanglement fidelity [51] measures the quality of an initially maximally entangled state after it was stored in a noisy memory on node (or ) for time . Specifically, for a noise process I ⊗ N (no noise on , and time-dependent noise N on system ), the entanglement fidelity is defined as where |Ψ⟩ is the maximally entangled state defined above. We remark that the case of noise on and can always be dealt with by observing that for any matrix applied to , this can be translated to applying to : That is, noise on and can be understood by applying both types of noise to the system in succession. It turns out that the gate fidelity, and entanglement fidelity are related as [51] orig (E, ) = where is the dimension of and . For qubits, = 2.

MODELING THE ARCHITECTURES
We first provide a summary of the architecture attributes to be considered in the modeling and analysis of our problem. First, motivated by the limits of implemented quantum devices, in the SD architecture all operations must be performed sequentially: e.g., computation may not be performed when entanglement generation or a state transfer are in progress. In the DD architecture, entanglement generation and computation (assumed to be independent of each other) may be performed in parallel, as these operations take place in separate devices. When a state transfer operation is in progress, however, both devices in the DD architecture must wait until its completion before servicing another computation or entanglement generation request. This is motivated by the same limit that prevented the simultaneous execution of network and computation operations in the SD architecture: a network operation is needed at the computation device in the DD architecture to transfer the entangled state, but this time only for the time needed to produce entanglement with the very close network device. In both architectures, a state transfer is required before a new entanglement request may be serviced, as this frees up the communication qubit required for further entanglement generation attempts.
To define the state space of our problem, it is helpful to classify the processes that make use of the quantum processors as follows: ( ) entanglement generation, ( ) state transfer operations (we interchangeably refer to these as moving operations), and ( ) computation. Each class of operations (or jobs) is associated with an arrival rate and a processing rate, as specified in Table 1. We assume that all request arrivals are Poisson and all processing times are exponentially-distributed.  4 5 Fig. 5. Example queue occupancy: , , and represent computational, entanglement, and moving job, respectively. The numbers in each slot represent the processing order: job is currently being executed, but all jobs may be processed before the next request's processing begins. The second request is fifth to be processed, since the first request must be followed by a moving operation.
represents the demand for entanglement, i.e., the entanglement request rate either from a user or an application. The corresponding parameter represents the rate at which (remote) entanglement is generated -this is a function of the link length. Entanglement generation attempts at the elementary link level are often modeled as Bernoulli trials with some fixed success probability gen for each attempt -see, e.g., [7,20,52,57]. In [58], the authors model the time between successful generation attempts as an exponentially-distributed random variable (r.v.) to accommodate their use of a CTMC when modeling a quantum network node; we adopt their convention here.
When entanglement is successfully generated, we assume that the state of the entangled qubit is eventually moved from the networking component to the computing component for processing, e.g., in the single-NV example, the state is transferred from the electronic spin to the carbon spin. Since this moving operation may not be requested immediately, we introduce as the moving request rate. The time to physically perform such moving operations is exponentially-distributed with parameter ; in general is lower for DD architectures than for SD ones, as the former requires more complex gate sequences to perform device-device state transfers: e.g., in the double-NV example, NV-NV entanglement generation is required. Finally, computational jobs arrive according to a Poisson process with parameter , and their processing times are exponentially-distributed with parameter . Note that, save for entanglement generation, our use of the exponential distribution in modeling processing times is largely motivated by the resulting simplicity of the model and its analysis. A more realistic way to model state transfers for an SD design implemented with an NV center in diamond, for instance, would be to assume that their service times are deterministic (albeit, for the double-NV design, the use of the exponential distribution is well-justified due to the need to generate on-chip entanglement when servicing state transfer requests). In practice, the sojourn time distribution in each state is determined not only by the application, but also by the physical platform (e.g., NV center in diamond, ion traps, atomic ensembles). Depending on the latter, it is possible that even the time it takes to perform a local state transfer is a random variable. In the interest of keeping the assumptions as general as possible and the results interpretable, we opt for the exponential distribution. However, if necessary, one may accommodate arbitrary service time distributions by modeling the architecture as a semi-Markov process. Depending on the specific application, it may also be appropriate to include computational jobs as an additional phase of the QBD process. Several other extensions of our model may also be considered depending on the use case -see Section 8 for further discussion -but lie outside the scope of this work.
In summary, the processing rates , , and depend on the properties of the architecture, while the request arrival rates , , and depend on application demands. When the value is high or is low, the application may be thought of as networking(entanglement)-heavy, while high values of and low values of correspond to a computation-heavy application. In general, is much greater than and even for the DD architecture, as local gates are far less time-consuming than entanglement generation in a network where processors are separated by large distances. For this reason, when constructing the model we make a simplifying assumption that = ∞, i.e., we assume that computational job processing times are negligible. As a consequence, computational jobs may be processed whenever the processor(s) would otherwise be idle (i.e., waiting for a moving request to arrive), as well as in-between events (e.g., immediately after the completion of a moving request but right before the entanglement generation of the next entanglement job) as their processing does not affect the rest of the system. As a result, a computational job need only to wait for the completion of a single event -either entanglement generation or a moving requestand as soon as this event has been completed, all computational jobs that are in the queue may be processed instantaneously 1 . An example is shown in Figure 5. These modeling assumptions ultimately allow us to obtain all necessary performance measures in closed form; however, in Section 7 we remove the assumption on negligible processing times for computational jobs and observe the effects on the average fidelity numerically. Lower values of may be used to model "atomic" gate sequences which must not be interrupted by any other task or operation.
With the aforementioned assumptions, we may model both the SD and DD architectures as / 3 /1 queueing systems, where the arrivals correspond to entanglement requests and are a Poisson process with rate , and the service times are hypo-exponentially distributed with three service stages each of which are exponentially-distributed with parameters , , and . Figure 4 depicts the CTMC representing this queueing system: each state of the form / corresponds to outstanding entanglement requests in the system, with the first job in the th stage of its processing. Entanglement requests are processed according to a first-in, first-out (FIFO) policy: the first stage ( = 1) is entanglement generation, the second ( = 2) is awaiting the arrival of a moving request, and the third ( = 3) is the execution of a moving request. Note that the next entanglement request cannot begin processing until all three stages of the previous request have been completed, since the communication qubit must be freed before entanglement generation may be attempted again. State 0 corresponds to the case with no outstanding entanglement requests.
Note from the CTMC that a computational request only waits whenever its arrival coincides with the system being in states of the form /1 and /3 in the SD architecture, or if the system is in a state /3 in the DD architecture. Thus, by the memoryless property of the exponential distribution, a computational job's waiting time distribution is given either by ( ) = − or by ( ) = − , depending on the state of the system upon arrival. We may obtain the probabilities of arrival into states /1 and /3 from the stationary distribution of the CTMC. We remark that when computing the entanglement fidelity, we are interested in the amount of time a newly-entangled qubit must wait for a state transfer request to arrive, i.e., we require the waiting time distribution, and not the sojourn time distribution of that qubit (which also includes the amount of time it takes to perform the moving operation). The reason is that the specific gate sequence used to perform the state transfer in a given architecture already implicitly accounts for the the time it takes to execute the gates. A note on mathematical notation: in the remainder of the paper, we use superscripts (1) and (2) to denote parameters corresponding to the SD and DD architectures, respectively. E.g., (1) corresponds to the moving rate in the SD architecture, while (2) 1 and (2) 2 refer to the memory lifetimes of the DD architecture.

WAITING TIME DISTRIBUTIONS
The CTMC shown in Figure 4 has been studied in literature: specifically, it is a quasi birth-death (QBD) process with the special property that one of the blocks in its generator matrix is a rank 1 matrix, allowing us to compute the rate matrix explicitly using the results in [32]. For completeness, we include the rate matrix derivation and ergodicity condition for this Markov chain in Appendix A.1. For the following computations, assume that the mean drift condition is satisfied so that a stationary distribution exists. Recall from the discussion in Section 3 that to determine the waiting time distribution of a computational job, it suffices to compute the stationary probabilities of states /1 and /3, which we label /1 and /3 , respectively. Specifically, rather than having to derive the individual stationary probability of each state in phase 1 or 3, we need only to compute the aggregate probabilities We are now ready to compute the waiting time distributions of computational jobs in both types of architectures. Recall that in the SD architecture, a computational job may be processed immediately, as long as there is no ongoing entanglement generation or moving job in progress. Specifically, if a computational request arrives while the QBD process is in a state /2, then it is processed immediately (the waiting time is zero), and otherwise, the request is queued behind the ongoing entanglement or moving request. Thus, the waiting time (conditioned on the stage ) for a computational job in the SD architecture is given by 1 , where 1 is the indicator function, is an exponentially-distributed r.v. with mean 1/ and probability density function (p.d.f) ( ), and 1 is an exponentially-distributed r.v. with mean 1/ (1) and p.d.f. 1 ( ), with (1) the rate of completing a moving request (after its arrival, meaning that this is the rate of solely executing the gates to fulfill a moving request) in the SD architecture. Then, by the law of total probability (cf. Eq. (2.26) in [42]), the marginal p.d.f. for the waiting time of computational jobs in the SD architecture is given by where ( ) is the Dirac delta function, defined as Next, we consider the waiting time distribution of computational jobs in the DD architecture, wherein a computational request need only wait if a moving request is actively being executed. Letting 2 be the p.d.f. of an ∼ ( (2) ) r.v., this distribution is given by When using 1 and 2 to compute average gate fidelity in Section 5.2, we will use the definition where ≤ ≤ and ( ) is a function continuous on the interval [ , ].
Another quantity of interest is the waiting time distribution of a newly-entangled qubit while it awaits a state transfer request. For both architectures, this is given by ( ) = − .

FORMULAS FOR COMPUTING QUANTUM FIDELITIES
We now provide several general formulas that can be used to link an understanding of waiting times to the gate and entanglement fidelities for the standard noise models used to describe quantum devices. Our formulas for the gate fidelities can be applied to the situations described in Section 2.2.4, where we need to wait for a time before or after executing a quantum gate. In this work, this waiting time occurs since we need to suspend quantum processing when performing network operations (see Section 3). However, we remark that our formulas are applicable to any other situations where such waiting times arise, such as for example in the analysis of algorithms for scheduling gates on a quantum processor. We emphasize that our formulas also apply to a situation where one would simply want to understand the average reduction in quality when storing a qubit in memory, which corresponds to applying the trivial gate = I.
Using the known link between gate and entanglement fidelities [51] in Eq. (14), these formulas can also be directly applied to understand how the quality of an entangled link decays as a function of a waiting time . The use case in this work is to understand the decay of entanglement due to waiting for gate or move operations (see Section 3). However, it can also be applied to any other situation where a waiting time is incurred before the entanglement can be processed.
To the non-quantum expert, it may come as a surprise that the formulas above only depend on the noise, but not on the specific gate . As we will see, this is simply a consequence of being unitary, combined with the fact that we uniformly average over the set of all quantum states and this average does not change when a unitary is applied.
Given an understanding of the waiting time distribution, one may then readily compute the average gate fidelity due to waiting as where ( ) is the measure over waiting times resulting from a specific model, and N is the quantum channel.

Derivation
We remark that there are several methods to obtain the same result, and for completeness we present a self-contained derivation using only elementary facts from quantum information theory in Appendix E.1. Here, we make use of a result in [5] for the case where the quantum channels are applied to qubits, as relevant for the standard noise models above. From [5] we have that for any one qubit quantum channel E used to approximate a gate that orig (E, ) = 1 2 Tr where , and are the Pauli matrices defined in Section 2. When E = • N (i.e., noise in the gate is modeled by applying first a noisy channel N followed by the ideal implementation of ) we have that since is unitary ( † = † = I), Using (25), the Pauli matrices, and the definitions of the quantum noise channels (see Section 2), matrix algebra then yields the claimed formulas above. For the case where E = N • (i.e., the noisy gate is modeled by first applying the ideal gate and then applying a noise process N , a common convention in quantum technologies), we also obtain where this time we have made use of the fact that the Haar (uniform) measure on the set of quantum states is invariant under the application of a unitary , since a unitary simply permutes the set of quantum states | ⟩ = | ′ ⟩. We remark that in quantum information operations do not generally commute and N • ≠ • N for most choices of and N .

Application to our problem
As discussed in Section 2.2, fidelity acts as a measure by which we may evaluate the performance of single-and double-device architectures. For computation requests, we determine an associated gate fidelity that reflects the quality of the quantum gate(s) that are applied in the computation. For moving requests, we consider the fidelity of the entanglement that is delivered to applications.
We evaluate the average gate fidelity (Eq. (24)) for computation requests on the SD and DD architectures using the composite noise model C and the waiting time distributions in (17), (18): where we use (19) to evaluate the integrals above. We also evaluate the average fidelity of the entanglement that is moved into memory. The waiting time distribution for moving requests for both architectures is given by ( ) = − , so that

ANALYTICAL EVALUATION
In Section 4, we derived the waiting time distributions of computational jobs in each of the architectures, and in Section 5 we determined how these distributions translate into the average gate fidelity. Using these results, it is possible to compare the average gate fidelity of the SD to that of the DD architecture. Specifically, we will show that when computational job processing times are negligible (i.e., when the queueing systems are both represented by the CTMC in Figure  4) and the memories in both architectures are of identical manufacturing quality, then the DD architecture always outperforms the SD architecture in terms of the average gate fidelity. Our standard assumption that ≤ (2) holds for the following discussion.
Proposition 1. When = ∞, the mean drift conditions (15) for the SD and DD architectures are satisfied, and (N , ) is identical for both architectures, the DD architecture yields a higher average gate fidelity than the SD architecture.
See Appendix B for a proof of Proposition 1. It is worth emphasizing that the result in Prop. 1 holds when the two Markov chains modeling the architectures are stable; i.e., while the DD architecture yields better performance for average gate fidelity, cf. (1) it is also more difficult to ensure its stability, since in general, (1) > (2) . For the remainder of this section, we focus on the composite noise model for storage introduced in Section 5.2. Now, suppose that the characteristic memory times of the two architectures are not identical. As discussed previously, in such cases it is possible that an SD architecture with higher-quality memories is the more cost-effective option while also yielding a higher average gate fidelity than a DD architecture with memories of poorer quality. This brings up a natural question: given a DD architecture with fixed characteristic memory times (2)   1 and (2) 2 , what conditions must the memory lifetimes of an SD architecture satisfy in order to outperform the former in terms of average gate fidelity? From Eqs. (26) and (27), we see that (1) > (2) when For the following discussion, it is useful to keep in mind that for a constant > 0, lim →∞ /( + 1) = 1/ . This implies that both sides of the inequality above are positive. Recall from previous discussion that the remote entanglement generation rate is a smaller value than the moving rates (1) and (2) . From (28), it is easy to see that the value of plays a significant role in determining how much the memory lifetimes must compensate to improve the performance of the SD architecture.
To obtain a more interpretable and intuitive understanding of how high the SD architecture memory times must be, we derive a sufficient condition in terms of solely (1) 2 , (2) 1 , , (1) , and  . This condition serves as a good bound to (28) when the memory lifetimes (2)   1 and (2) 2 are not too far apart from each other, and becomes tighter as increases.
Proposition 2. Assume is greater than one and < (2) < (1) . Then the SD architecture on average achieves a higher gate fidelity than the DD architecture when See Appendix C for a proof of this proposition. Condition (29) tells us that the faster the entanglement generation rate, the smaller the memory requirements are on the SD architecture to ensure that it outperforms the DD architecture. This is intuitive since in the SD architecture, entanglement generation is the most time-consuming and therefore the most detrimental operation to the gate fidelity. Condition (29) also tells us that the faster the state transfer requests are executed in the DD architecture, the better memories are required for the SD architecture -an intuitive consequence of the fact that the most detrimental operations to gate fidelity in the DD architecture are moving requests.

SIMULATION AND NUMERICAL OBSERVATIONS
Our goal in this section is to study the average gate and entanglement fidelities for the two architectures in a variety of settings in order to gain an understanding of regimes that are most suitable to each. We will also explore differences in manufacturing quality, and examine cases where it is preferable to use an SD design of better quality than a DD design with poorer quality. This question is especially relevant when cost-effectiveness is an important factor, as the DD design is expected to be the more expensive option (when comparing to an SD design of identical quality).
We use MATLAB to simulate the architectures and obtain the waiting time distributions for computational and entanglement jobs. We then use NetSquid [12] to simulate the storage of qubits according to the obtained waiting time distributions. NetSquid is a discrete-event network simulator for quantum information; it provides a hardware-validated model of the NV center in diamond platform, and we use this model to evaluate the gate and entanglement fidelities.
Our analytical results apply to the case when computational jobs have negligible processing time ( = ∞); thus, simulating the case where < ∞ provides additional insight. When < ∞, computational jobs may no longer be processed instantaneously. Thus, we require an additional rule in handling multiple computational requests in the queue. Motivated by the fact that entanglement generation with remote nodes is the most time-consuming operation for our architectures, we endow state transfer (moving) jobs non-preemptive priority over computational jobs, i.e., when a moving job arrives while a computation is in progress, the former begins processing immediately after the completion of the computational request, even if other computational jobs were already in the queue prior to its arrival. In all other cases, jobs processed according to a FIFO policy.
For the following discussion, all rates are in terms of (# arrivals)/sec and (# jobs processed)/sec, unless otherwise specified. In all simulations, each run lasts for 10 5 s, and each data point is an average of five runs -a number that ensures sufficiently small error bars. We next motivate some of the parameter values used in the remainder of this section, many of which are inspired by the NV center in diamond platform. For state transfers within the SD architecture, we fix (1) at 1667Hz, since cf. Figure 4 in [43], a local swap to memory consumes 600 . State transfers within the DD architecture depend on the exact implementation of the transfer procedure across the inter-device interface. For the DD architecture, we often fix (2) = 700Hz, as we expect the state transfer rate in this architecture to be between a third and a half of that of the SD architecture. Note that, according to [43] and [47], for the NV, this value is rather optimistic, since e.g., a Bell-state measurement -an operation that is part of the state transfer gate sequence for the DD architecture -alone consumes 1ms. Next, when exploring different values for the computation rate, we consider the variation in not only gate duration, but also the possibility of more time-consuming atomic gate sequences that must not be interrupted by any other operations; using [13] as a guide, we set to values in the range [10 3 , 10 5 ]. For 1 and 2 values we use [13] as a guide. Finally, when choosing parameter values for the request arrival rates, we ensure that both the SD and DD systems are stable in the number of outstanding entanglement requests.
Effects of Device Memory Lifetimes on Gate Fidelity. Recall from our analytical evaluation that the DD architecture achieves higher gate fidelity than the SD architecture when both devices have the same 1 and 2 parameters characterizing their memory lifetime. This phenomenon can be observed in Figure 6 for two different entanglement generation regimes: = 10 corresponds to a quantum network setting, where the remote node(s) is(are) distant, while = 500 represents closely-located quantum nodes, e.g., as may be found in a quantum computing cluster. Observe that for the latter, the fidelity differences are less pronounced. This can be explained by the fact that faster entanglement generation rates are less detrimental to the SD architecture's gate fidelity than slower ones, as computational jobs wait less before being serviced. Fabrication of networked quantum hardware is a complex task, and achieving comparable memory lifetimes between the SD and DD architectures may prove to be difficult. Interfacing the two processors that make up the DD architecture may introduce additional sources of noise or complicate the process of properly shielding qubits in order to maintain adequate memory lifetimes. As a result, it is possible that a high-quality SD design is both the more economical and functional choice, compared to a poorerquality DD design. A potential instance of this is presented in Figure 7, where the memory lifetimes of SD are five times those of the DD design. The advantages of the SD design are especially notable in the higher entanglement generation rate regime.
In Figure 8, we explore the effect of manufacturing differences on the average gate fidelity further, focusing only on the analytical case ( = ∞). Here, we assume that (1)

= 2s, and
(2) 2 = 0.002s; (1) = 1667Hz, = 150 and = 1000. usual requirement that 1 > 2 for each architecture. From the figure, we observe that the 2 time plays a significant role in improving gate fidelity: note in particular that for lower values of (1) 2 , the SD architecture does not outperform the DD architecture, even for very high values of (1) 1 . Effects of Processing Rates on Gate Fidelity. Processing rates impact the amount of time needed to complete requests and consequently result in longer waiting times. Our analytical evaluation assumed that = ∞ so that computational requests do not interfere with entanglement and moving requests. We use the computational job processing rate, , to capture the behavior of different applications, where smaller processing rates may correspond to computation requests performing a sequence of atomic gates or computation requiring "implicit" state transfers (see Appendix D.1). In both Figures 6 and 7, we observe that more time-consuming computations are more hospitable  to the SD architecture's gate fidelity. This phenomenon arises from the non-preemptive priority of moving jobs, which interrupt computation for longer periods in the DD architecture. To see how the individual average gate fidelities evolve as a function of , see Appendix F.
Effects of Arrival Rates on Gate Fidelity. The computation, entanglement, and moving request arrival rates impact the number of jobs that are issued to the system and consequently the probability that jobs block one another when non-zero processing time is assumed. Since no jobs can be processed in parallel on the SD architecture, this leads to an overall increase in waiting time for all requests. In contrast, computation jobs in the DD architecture are only blocked by moving jobs, while entanglement jobs may be processed in parallel to computation jobs. When considering the results in Figures 6 and 7, one key observation is that both the entanglement request and moving request arrival rates impact the the gate fidelity of computation jobs on the DD architecture. By increasing the arrival rate of entanglement requests, additional moving requests are submitted which leads to a decrease in the gate fidelity of computation requests (as further evidenced by Figures 14 and 15 in Appendix F). In contrast, we observe an increase in the gate fidelity of the SD architecture here as the entanglement generation rate also increases, leading to a decrease in the amount of time that queued computation requests are blocked from processing.
Effects of Moving Entanglement. We now examine the effects of waiting time on the pre-and post-move entanglement fidelity in both architectures when they are realized with the NV center in diamond platform. Recall from Section 2.2 that the SD and DD architectures process moving requests in different ways. The SD architecture performs a sequence of local gates whereas the DD architecture must execute a network operation to transfer entanglement from the networking to the computing device (see Appendix D.2 for further details). Since quantum gates in both architectures are imperfect, it is important to highlight the differences in entanglement fidelity once a moving request has been processed. The pre-move entanglement fidelity is computed immediately before the transfer request is processed. I.e., this measure incorporates the effects of waiting time both from needing to wait for the arrival of the transfer request, as well as the possible extra waiting time due an in-progress computation (recall that transfer requests have non-preemptive priority over computational jobs). The post-move entanglement fidelity is computed immediately after the entanglement has been moved from the networking to the computing component. For the SD design, we are able to obtain the post-move fidelity in closed form (see Appendix E.2). We present it in Figure 9, as a function of the moving request rate and the memory lifetimes of the architecture. As a reference, basic QKD demands entanglement fidelity of at least 0.81 [19]. Figure 9 shows that, as expected, the moving request arrival rate (i.e., the application's responsiveness to processing newly-generated entanglement) may be reduced if higher memory lifetimes are available. For the DD architecture, we obtain the entanglement fidelity via NetSquid. Figure 10a presents its pre-and post-move fidelities for varying (2) 2 times, a parameter to which the post-move fidelity is highly sensitive. Indeed, we observe that the moving request arrival rate is not nearly as impactful to the fidelity as (2) 2 . Figure 10b presents a comparison of the two architectures' post-move fidelities; the SD design clearly outperforms the DD design for each of the 2 values. Figure 11 shows the reduction between the pre-move and post-move entanglement fidelities for the SD architecture, in two entanglement generation regimes. It is immediately evident that in the high entanglement generation rate regime (e.g., one that may be representative of a distributed quantum cluster setting), the pre-and post-move fidelities fair far better than in the low entanglement generation rate regime (e.g., one that is more representative of a quantum network with distant nodes). A reason for this is that faster processing of entanglement requests in the SD design frees up processing time for computation. Consequently, there are fewer computation requests left in the queue, so that new entanglement requests can be processed more quickly as well.
Summary of findings. From our analysis and numerical observations, we find stark contrasts between the two architectures. On the one hand, when implemented with memories of identical quality, the DD design dominates in terms of gate fidelity. However, in a more practical scenario, wherein the DD design's more complex manufacturing would impair its memory lifetimes, the SD design can yield higher gate fidelities, and is more robust to longer computation times. Further, for present-day parameters, the SD design is more hospitable to the entanglement fidelity. The advantages of the SD design are especially evident in the high entanglement generation rate regime. We thus conclude that the DD design is more suitable for settings such as long-distance quantum communication, with lower entanglement generation rates and lighter computational demands. In contrast, the SD design is better suited for settings such as a distributed quantum computing cluster where high entanglement generation rates can be achieved and longer computations must be performed.

CONCLUSION
Quantum distributed applications impose quality constraints on the quantum states that they consume. When such applications are executed on architectures with physical limitations, such as imperfect gates or a limited amount of parallelism, resource contention can significantly impact performance, as some quantum states may be forced to wait in storage while others are being processed. In this work, we studied the effects of waiting times on the gate and entanglement fidelities for two distributed quantum architectures. We accomplished this by deriving formulas for average fidelity as a function of the waiting time distribution for a quantum state awaiting processing, as well as a noise model that governs quantum state evolution during storage. We obtained the waiting time distributions from the analysis of a Markov chain that models both of the quantum architectures in a regime where computation consumes a negligible amount of time; we later relaxed this assumption to study the effects of more time-consuming computation via simulation. We discovered that certain architecture implementations are more suitable for environments that are computation-heavy, while others are suitable for entanglement-heavy applications. Our average fidelity formulas are applicable in scenarios beyond those studied in this work, and may serve as a useful tool for performance evaluation experts.
Several extensions of our problem formulation are possible. First, we examined only two possible architecture implementations in this manuscript, but other realizations of distributed quantum architectures can be proposed. For instance, in the DD architecture, one could equip both of the devices with an interface to the outside world, so that both devices can perform local computation as well as remote entanglement generation. It is not entirely obvious what advantages one would gain in such a setup, and much like we have observed in the current work, the performance of such a system will depend on entanglement generation rates and the quality of the interface between the two devices (for transporting qubits from one to the other), as well as the application type (e.g., computation-heavy vs. entanglement-heavy). Also, in the current work we assumed that each architecture has a single link to the outside world; an extension of the problem would be to consider multiple links, each associated with a different entanglement generation rate. Second, even with the two architectures examined in the current work, we have not studied all possible use cases. For instance, in the SD design, one could take further advantage of processor idle time by allowing computation to occur during entanglement generation, while the device is awaiting a heralding signal from the link. This would require the state of the (not yet entangled) qubit to be moved to a storage qubit, and then back again to the communication qubit; thus, for a rigorous analysis, one would have to account for how these state transfer operations would reflect on the final entanglement fidelity. Finally, from a modeling perspective, one could relax several of our assumptions, e.g., that computational jobs consume zero time -something that we have so far only explored via simulation.

ACKNOWLEDGMENTS
This work was supported in part by the NWO ZK QSC Ada Lovelace Fellowship. The authors thank Filip Rozpędek, Guus Avis, Francisco Ferreira da Silva, and David Maier for useful discussions and careful reading of an earlier version of the manuscript.

A.1 Rate Matrix
Define the following useful variables: The infinitesimal generator of the CTMC in Figure 4 is given by , and the 0's are vectors or matrices of appropriate dimensions. The ergodicity condition for Markov chains whose generators have tridiagonal block structure are well-studied in literature, see, e.g., [36]; we derive it here for completeness. The QBD process driven by is ergodic if and only if where is the equilibrium distribution of the generator 0 + 1 + 2 and is a vector of all ones. To find , we use the relation which, along with the normalizing condition on , yield Thus, after defining + + and with the assumption that > 0 since all rates are positive, (31) becomes or, written another way, Intuitively, (32) indicates that for ergodicity, the average time between entanglement request arrivals must exceed the average processing times summed over all three stages (entanglement generation, waiting for a moving request to arrive, and performing the moving operation). Henceforth, assume that (32) is satisfied. Next, we obtain the rate matrix of the generator. Note that 0 is of rank 1; we may rewrite it as follows: By the results in [32], this means that the rate matrix may be computed explicitly; it is given by After some algebra, we obtain which matches the explicit rate matrix computation in [34] for the / /1 queue when = 3.

A.2 Probability that a Computational Job Must Wait
Our goal here is to compute ∞ =1 /1 and ∞ =1 /3 . To derive these quantities, we use the global balance principle to "cut" the chain three different ways (the first cut isolates the states /3, the second isolates the states 0 and /1, and the third isolates the states /2), obtaining the following balance equations (below, /2 is the stationary probability of state /2): We can rewrite (35) as follows: where 0 is the stationary probability of state 0, and using (37), (38) becomes To obtain 0 , we use the balance equations of the QBD process along with the normalizing condition: 0 1 00 01 where 1 ≡ 1/1 1/2 1/3 ; 00 , 01 , 10 , 1 , and 0 are blocks of the CTMC generator matrix defined in Appendix A.1, is the rate matrix of the QBD process derived in Appendix A.1, is a vector of all ones, and is an identity matrix of the same dimensions as .
From (42) and (43), we obtain where Δ ≡ ( − − − ). Note that Δ > 0 follows directly from the ergodicity of the chain. Using (41), (44) and the definition of Δ, we obtain Using (37), we have B PROOF OF PROPOSITION 1 Proof. Recall that the SD architecture's average gate fidelity is given by (1) while for the DD architecture, (2) Next, note that for a function 0 ≤ ( ) ≤ 1 and , > 0 with ≥ , Since (2) ≥ , it follows from (49) that Proof. (Proposition 2) For this proof, it is useful to keep in mind that 1 is always greater than 2 for any architecture. From this fact, and by Lemma 2, it follows that to satisfy (28), it suffices to ensure that which we obtained by substituting each (1) 1 on the left-hand side of (28) with (1) 2 and each (2) 2 with (2) 1 on the right-hand side of (28). Next, (59) is equivalent to 1 ( To ensure (60) holds, it is sufficient to have 2 Solving (61)  Here, we provide a brief overview of the Nitrogen-Vacancy (NV) center in diamond platform and its characteristics as relevant to our problem. For additional information, we refer the reader to, for example, [13]. The NV center in diamond platform is a few-qubit (at present, no more than 10 [6]) quantum processor capable of executing arbitrary quantum gates and measurements and has demonstrated entanglement establishment over a distance of 1.3km [21]. This hardware has also been used to demonstrate key quantum network protocols required for long-distance networking such as entanglement swapping and distillation [27,43]. Qubits in the NV center in diamond platform may be divided into two types, communication qubits and storage qubits. Communication qubits are those that are equipped with optical interfaces, allowing them to establish entanglement with communication qubits in other quantum processors while storage qubits, on the other hand, are only capable of storing quantum states and having quantum gates applied to them. In the particular case of the NV center in diamond, the set of quantum gates that can be applied to storage qubits is limited to a time-dependent rotation about the axis, while arbitrary quantum gates may be applied to the communication qubit.
Interactions between qubits in the NV center in diamond platform are mediated by an electronic spin (the NV) that acts as a communication qubit. The remaining qubits are C 13 spins that act as storage qubits which are magnetically coupled to the electronic spin. This property of NV in diamond hardware restricts the parallelism of quantum gates on different qubits as multiple quantum gates may not be applied to the NV at the same time, resulting in serial execution of quantum gates. Furthermore, most quantum gates may only be performed on quantum states held by the NV qubit, while C 13 spins may only be initialized and undergo -rotations. This means that the application of a quantum gate to a state held by a storage qubit requires exchanging the quantum state of the NV with said storage qubit.
One implication of these physical restrictions on computation with the NV center in diamond is that for our model of the single-NV architecture (Section 3), computational jobs may require state transfers before they can be serviced. Specifically, such a situation may arise whenever the system is awaiting a state transfer request (to move the state of a newly-entangled qubit from the NV to a storage qubit), while one or more computation requests are in the queue. Recall that during such "idle" periods, according to our modeling framework computation is allowed in the SD architecture. However, depending on the type of computation that is being requested (see discussion above), the processor may need to perform a state transfer prior to the computation (NV → storage qubit) to free up the NV qubit, perform the computation (recall that when = ∞, all computational jobs may be processed instantaneously), and finally, perform another state transfer (storage qubit → NV) to move the state of the entangled qubit back to the NV. One may wonder: why perform the latter move when the state of the entangled qubit must eventually be moved to storage? The reason is that future computation requests may require the use of the entangled qubit in the NV. On the other hand, it may be possible that all computation requests involve the entangled qubit in the NV, in which case the extra transfers are not required. Since we have no knowledge of the computational request requirements a priori, we simply assume within our model that "implicit" state transfers are performed when necessary, and that they also (akin to computation requests) consume a negligible amount of time. In a manner, our simulations in Section 7 relax this assumption by varying computation request processing rates. Applications that infrequently shuffle quantum states between qubits correspond to the case where is very large while applications that move quantum states more frequently correspond to the case when is smaller and comparable to .
In addition to the limitations on quantum gates, the NV qubit is the sole communication qubit that may be equipped with an optical interface for establishing entanglement with qubits in other quantum processors. This further restricts parallelism of operations as quantum gates may not be applied to any qubits when the processor is being used to establish entanglement. Studies on experimental realizations of such hardware have shown that the fidelity and the rate at which entangled states may be established between such processors decrease as the distance over which entanglement must be established increases [13,46]. Combined with existing challenges in emitted photon collection [47], this means that establishing entanglement will dominate the execution time of applications and incur additional latency for performing local quantum gates between qubits. Furthermore, the process of establishing entanglement introduces noise on quantum states that are stored in the remaining qubits of the system [26], which can severely reduce the fidelity of stored states.

D.2 Transferring Quantum States
D.2.1 Single-Device NV. Our NetSquid implementation for simulating the transfer of a quantum state from the NV electron spin qubit into carbon storage qubit is performed through a sequence of gates shown in the circuit of Figure 12 [27]. Here, the electron spin is in some quantum state |Ψ⟩ and the carbon storage is initialized to the state |0⟩. Gates are applied to each qubit in order from left to right and lines from the NV to a gate on the carbon denote a controlled quantum gate. For reference, the quantum gates used in our simulations are defined as cos (  2 ) , where ⊗ denotes the tensor (Kronecker) product of two matrices. In NetSquid, gates in the NV center in diamond platform are modeled as the application of the perfect gate after applying time-independent depolarizing noise D that is parameterized by an associated depolarizing probability depending on the gate G being applied. For the gate sequence in Figure 12, the depolarizing parameters for each gate are summarized in Table 2. Table 2. Depolarizing parameters for gates in the NV hardware. We remark that no two NV devices are exactly identical. Individual values have not been realized simultaneously for producing entanglement which would allow a direct comparison to simulation. We thus focus on simulation parameters that enable a comparison to entanglement generation hardware and provide references to motivations for our chosen values.

Qubits Depolarized
NV Carbon Electron Initialization [44] Electron 0.02 -Electron ( ) [27] ---Electron ( ) [27] ---Carbon Initialization [6] Carbon -0.006/4 Carbon ( ) [53] Carbon -0.001/3 Electron-Carbon (± ) [27] Electron and Carbon 0.005 0.005 Electron-Carbon (± ) [27] Electron and Carbon 0.005 0.005 As an example, consider applying the ( 2 ) to some state = | ⟩⟨ | in a carbon qubit. The new state is D 2 † 2 using depolarizing probability = 0.001 3 for D . D.2.2 Double-Device NV. Our NetSquid implementation for simulating quantum state transfer from the NV electron spin of the networking device to the NV electron spin of the computing device expands upon the gate sequence in Figure 12 and makes use of quantum teleportation [4]. Once entanglement that was originally held by the networking device electron spin has been moved to the networking device carbon storage (using the gate sequence from Figure 12), the following gate sequence Figure 13 is performed in order to transfer the state to the computing device electron spin [43].
Here, the "Entangle" box represents establishing entanglement between the electron spins of the networking device and the computing device, the |0⟩ box represents initializing the networking device electron spin to the |0⟩ state, and boxes containing arcs with an arrow represent measuring the qubit in the standard ( ) basis. The "Entangle" operation instantly establishes a perfect entangled state between the electron spins of the computing device and the networking device, thus giving an optimistic evaluation of the state transfer process in the DD architecture. Connections between measurement boxes and a gate on the computing electron spin represent conditional execution of the gate based on the measurement result. In the first case, measuring |0⟩ on the networking electron spin means we perform a ( ) gate on the computing NV while measuring |1⟩ on the networking NV means we perform a (− ) gate. Similarly for the second instance, here we perform a ( ) gate on the computing electron spin if we measure |0⟩ on the networking NV and a ( ) gate if we measure |1⟩ on the networking electron spin.

E.1 Alternative Average Gate Fidelity Derivation for Noise Channels
For completeness, we present an alternative, self-contained derivation of the average gate fidelity that depends only on a number of well-known tricks in quantum information. Our proof is based on the Choi-Jamiolkowski theorem establishing a duality between quantum states and quantum channels. First, we will make use of the notion of the Choi state of a quantum channel. Here, it will be sufficient to note that for channels N : S → S where S denotes the state of a quantum system , the Choi state for N is defined (see e.g., [60]) as where is the maximally entangled state on and an identical quantum system ′ . We will also make use of the partial transpose operation Γ. For any operator ′ ∈ C × ⊗ C × the partial transpose is defined as which corresponds to taking the transpose on ′ , but not on . We first establish the following lemma.
where is the dimension of the quantum system , Π sym is the projector onto the symmetric subspace of C × ⊗ C × , sym is the dimension of said symmetric subspace, and N is the Choi state of N .
Proof. Using the Choi-Jamiolkowski isomorphism (see e.g., [60]), we can rewrite Using the definition for the average gate fidelity (12) and (68), we can then write where we have used the fact that □ Given the little lemma above, we can now readily evaluate the gate fidelity for any channel of interest in two steps: first, we need an expression for the projector Π sym onto the symmetric subspace. It is well known (see e.g., [45]) that the a full set of so-called mutually unbiased basis in dimension = 2 forms a 2-design, i.e., ∫ Ψ| Ψ⟩⟨Ψ | ⊗2 = 1 ( + 1) where the sum extends over bases indexed by and | ⟩ denotes the -th basis state of basis . For = 2, i.e., a single qubit, these bases are simply the eigenbases of the operators , and defined in Section 2.2. Using (72), this allows us to write where sym = 3. In such small dimensions, it is also easy to write Π sym = | Φ 00 ⟩⟨Φ 00 | + | Φ 01 ⟩⟨Φ 01 | + | Φ 10 ⟩⟨Φ 10 | , where |Φ ⟩ = I ⊗ |Φ⟩ denotes the first three Bell states (excluding the singlet). Second, we need to compute the Choi states of the noise channels defined in Section 2.2.2, which can readily be achieved by using (62).

E.2 Average Post-Move Entanglement Fidelity in Single-NV Architecture
Using Lemma 4 we may also obtain analytic expressions for the entanglement fidelity of the state that was moved into memory.
We may analytically compute the gate fidelity for the move to memory gate sequence in Figure  12 as where , , and are the depolarizing probabilities for Carbon Initialization, Carbon rotations, Electron rotations, and Electron-Carbon respectively. We may now obtain the average post-move gate fidelity by integrating over the waiting time distribution of move requests, From (83) we see that we have an upper bound of ≈ 0.956 on the average post-move fidelity.

F SINGLE-DEVICE AND DOUBLE-DEVICE AVERAGE GATE FIDELITY
In Figures 6 and 7, we presented differences between the SD and DD architecture average gate fidelities. In Figures 14 and 15, we observe the individual average gate fidelity of each architecture design for various entanglement request and generation scenarios, as the computational job processing rate varies. Figure 14 corresponds to the same memory quality regime as Figure 6: namely, one in which the memory lifetimes are equal for the two architectures. Figure 15 corresponds to the same memory quality regime as Figure 7, where the memory lifetimes for the DD architecture are five times shorter than that of the SD architecture.   In both figures, we observe that as the entanglement generation rate increases, the average gate fidelity of the SD architecture increases, while the fidelity decreases for the DD architecture (in these particular examples, this holds even as the entanglement request rate scales up with the generation rate, although it may not hold in general for higher request rates). This is an expected result: recall that in the SD architecture, computational jobs wait both for entanglement generation as well as for moving requests, while in the DD architecture they only wait for moving requests. In addition, moving jobs have non-preemptive priority over computational jobs when < ∞. Thus, for a fixed moving request rate, faster entanglement generation only aids computational jobs in the SD design. In contrast, in the DD design, higher entanglement generation rates lead to more moving requests, thus increasing the likelihood of computation being interrupted by these requests.