Certification of a functionality in a quantum network stage

We consider testing the ability of quantum network nodes to execute multi-round quantum protocols. Specifically, we examine protocols in which the nodes are capable of performing quantum gates, storing qubits and exchanging said qubits over the network a certain number of times. We propose a simple ping-pong test, which provides a certificate for the capability of the nodes to run certain multi-round protocols. We first show that in the noise-free regime the only way the nodes can pass the test is if they do indeed possess the desired capabilities. We then proceed to consider the case where operations are noisy, and provide an initial analysis showing how our test can be used to estimate parameters that allow us to draw conclusions about the actual performance of such protocols on the tested nodes. Finally, we investigate the tightness of this analysis using example cases in a numerical simulation.


Introduction
network can realize the functionality, even those that are farthest apart. Therefore, time τ can be thought of as the maximum time which takes any two nodes to communicate.
To certify that a quantum network achieves a functionality defined by this stage of development, we will consider a set of protocols which pass a qubit state |ψ a number k of times between the nodes A and B, apply the gates and measure at the end. We will choose the testing nodes A and B to be farthest apart in the network.
Many existing tests are known that can be used to estimate whether the operations above can each be performed individually with high accuracy. Examples include quantum state [4] and process tomography [5], gate set tomography [6,7], randomized benchmarking [8][9][10] or capacity estimation to verify the quality of qubit transmission [11]. The concept of self-testing even allows such estimates to be made with only partial trust in the devices (so called device-independent setting) [12,13]. Having estimated the quality of each individual operation with metrics such as the diamond distance, it is straightforward to derive an overall estimate on how well protocols in this stage may be executed [3]. Yet, running many individual tests is rather inefficient, and one may wonder whether there might exist an integrated test that instills confidence that we are capable of performing protocols up to a certain number of rounds using the quantum memory network.
Another approach to testing quantum devices comes from the literature of (interactive) proof systems where a verifier interacts with one or more provers, who are trying to convince the verifier that a certain assertion is true, or indeed that they possess certain capabilities. A well known example of such work is the question whether a classical polynomially bounded verifier can convince herself that (two non-communicating) provers holding a quantum computer do indeed have full quantum computing capabilities [14]. Restricting to only a single prover, there exists also a verification protocol under complexity theoretic assumptions [15]. This line of research is not concerned with the quality of specific operations, but rather aims to obtain a certificate of the provers' general abilities to solve certain tasks. Such tests are appealing as they measure general aptitude-for example in the domain of quantum computation the ability to execute quantum algorithms-but do not typically make specific statements such as the actual number of physical qubits involved. Consequently, such tests usually require large amount of resources to be executed.

Results
Here we take a first step towards finding effective tests to certify that a network has reached the quantum memory stage of development (see definition 1). We propose a test which can be interpreted from two different angles. First, we interpret it as a prover-verifier type protocol inspired by interactive proof literature, to certify that the network has certain capabilities. Second, we interpret it as a tomography-type protocol where we estimate certain properties of operations.
• Ping-pong test. We formulate our test in a bipartite scenario where nodes A and B exchange quantum registers according to a defined set of rules. We call our test the ping-pong game as it is executed by passing qubits back and forth between two nodes. Additionally, the nodes apply gates specified by a gate set G. An important parameter of our test is the number of times k that the nodes pass (ping-pong) the state around. • Prover-verifier view. Our protocol can be viewed as a simple game that the provers (the nodes) play against the verifier with the objective of convincing the verifier that they are capable of executing any protocol in the quantum memory stage, which has a specific form. In particular, we show that the provers win the k-round ping-pong game with probability one if and only if they are capable of executing perfectly any protocol of the following form: for any possible starting state |ψ , each node is capable of executing one possible gate G ∈ G, before sending the resulting state to the other node. The nodes continue in this form for k rounds, before measuring at the end. Moreover, in the case when the winning probability is strictly less than one, we certify that the nodes sent information about the state at least a certain number m < k of times. • Estimation view. In the estimation view we take on a different perspective with the objective to estimate the quality of the operations performed by the nodes, as opposed to certifying their capabilities. We use the statistics of the ping-pong test to assess a measure of the overall quality of the network. We then compare this to the quality one would expect from combining the estimates of the individual devices used in the network. What is more, we estimate the performance of k-round protocols based on our ping-pong test. In order to evaluate the accuracy of our analytical results, we compare our analytical estimate with numerical estimates for a specific example of a k-round protocol influenced by noise.
This paper is organized as follows. In section 3 we define the k-round protocols and introduce our test. Then, inspired by the interactive proof literature, in section 4 we view our test in the prover-verifier setting. In section 5 we view our test in the context of estimation.

k-round protocols
We start with formally describing k-round protocols. A bipartite k-round protocol between any two nodes A and B consists of the following consecutive operations: (a) Local preparation PREP of a perfect qubit state |ψ by node A. (b) Sending deterministically the local qubit from node A to node B and vice versa, using a quantum channel E A→B . Note that the time t send it takes to send a qubit (or a classical bit) from node A to B is upper-bounded by the distance between them and the transmission speed for the qubit carrier. For example, for optical qubits the transmission speed can be understood as the speed of photons in a fiber [3]. (c) Storing the local qubit by nodes A or B, denoted by M A and M B respectively. Storage of the qubit takes time t M . (d) Applying an arbitrary local operation by a node on the local qubit. We describe this operation by a gate G A ∈ G and G B ∈ G, where G is an arbitrary set of gates, for example the single-qubit Clifford gates. Executing a circuit of depth z takes time z . (e) Perfect local measurement of the local qubit at the end of the protocol. The measurements are specified by operators Π A and Π B for nodes A and B respectively.
Steps (b)-(d) are performed in rounds j = 1, . . . , k a total number of k times. We call k the depth of the protocol. Each round takes time Δt = t send + t M + z , so that t j+1 − t j = Δt, for all j. Without loss of generality we assume here that the protocols always start at node A. Note that the parity of j indicates at which node the single qubit is located, i.e., for odd j the qubit is held (sent) by A and for even j-by B. Therefore, we denote the local operations performed by A or B at a jth round by simply putting M j , G j . In Figure 1. A schematic illustrating a single execution of depth κ = 2 of the ping-pong test, test 1. particular, in this notation E j means that a qubit is sent by A and received by B for odd j (E j ≡ E A j →B j ), and vice-versa for even j.
Definition 1 (k-round protocols). We define a k-round protocol as a map of the form Π • P k • Prep, where: • PREP is a preparation of a local qubit |ψ (step (a)).
• P k is a map describing k rounds of local operations-memories M j and gates G j , as well as sending a qubit between A and B (steps (b)-(d)), • Π is a local measurement of all the local qubits (step (e)). Note that depending on the parity of k the measurement is performed either on A's or B's side.

Test
In this section we describe our ping-pong test. The test is a simple instance of a k-round protocol as in definition 1. As we will see in next sections, passing the test will allow us to draw conclusion about the whole class of k-round protocols. Since our test will be later on viewed from two different angles, we introduce a node V which will interact with the nodes A and B. In the prover-verifier view, section 4, the node V will act as a verifier. Whereas, in the estimation view, section 5, the nodes A and B can take up the role of V. We choose the testing nodes A and B to be farthest apart in the network. For those nodes it is the hardest to fulfill the test, since they must account for the longest communication delays.

General ping-pong test
In our test, the task of the nodes is to send ('ping-pong') an unknown state an unknown number of times and at every ping-pong round apply a quantum operation given by V, see figure 1. Additionally, at every round V gives the nodes a challenge denoted by f-either to output the quantum state or continue the ping-pong. At the end of each execution of the test, i = 1, . . . , n, the nodes output a state. V measures this output and produces a single classical bit v i : 1 means 'accept' and 0 means 'reject', see test 1. As stated before, we assume that the nodes' operations are independent and identical across executions i of the test. This implies that v i are independent and identically distributed (IID) random variables. We define a winning rate in such a game as the ratio of wins to the total number of executions: Test 1. General ping-pong test (k, G, X ).
Fix maximum depth k, gate set G and set of states X . Fix a total number of executions n. 1: for i = 1, . . ., n do 2: V chooses depth κ uniformly at random and constructs a challenge string f κ = 1 · · · 110 of length κ 3: V samples independently κ gates from the set G and creates a sequence g κ = G 1 · · · G κ 4: V samples a state |ψ ∈ X and distributes it to A t 1 = 0 5: for j = 1, . . . , κ do 6: if j odd then 7: A sends |ψ to B using E j t j 8: B stores the received state in memory M j t j + t send 9: V gives a classical description of G j to B t j + t send + t M 10: B applies G j to the state in the memory 11: V distributes a challenge bit f j ∈ {0, 1} to B according to the string f κ t j + t send + t M + 12: if f j = 1 then 13: The ping-pong test of depth κ for a sequence of chosen gates g κ = G 1 , · · · , G κ can be associated with the following operator In a single execution of test 1, the test can succeed with a certain probability. For all executions i, we define such probability, conditioned on a specific input state |ψ , a fixed depth κ and a fixed sequence of gates g κ as and similarly the probability of failure, . Note that p |ψ, g κ ,κ does not depend on the execution i, since we assume that executions are IID. Here denotes the measurement performed by V at the end of each execution i. We fix the figure of merit to be the average probability P that the nodes succeed (v i = 1) in the test. Definition 2 (Average probability of success for test 1). The probability of success in the general ping-pong test, test 2, averaged over depths κ, strings of gates g κ of length κ, and states |ψ ∈ X is defined as where the last equality holds due to the IID assumption. Here k is the maximum depth of the test, X is the chosen set of states and G is the chosen set of gates.

Teleportation-based ping-pong test
In the case when X is the set of all single-qubit states, the average probability of success gives us an estimate on the average fidelity of the test, see section 5. This would require sampling from X according to the Haar Test 2. Teleportation-based ping-pong test (k, Cliff, X).
Fix maximum depth k, fix the gate set to Clifford set Cliff and the set of states to the set of six Pauli states X. Fix the total number of executions n. 1: for i = 1, . . ., n do 2: V chooses depth κ and constructs a challenge string f κ = 1 · · · 110 of length κ 3: V samples independently and uniformly at random κ gates from the set Cliff and creates a sequence c κ = C 1 · · · C κ 4: V samples independently and uniformly at random a state |ψ ∈ X and distributes it to A t 1 = 0 5: for j = 1, . . . , κ do 6: if j odd then 7: A sends |ψ to B using deterministic teleportation t j 8: B stores half of his teleportation EPR pair in memory M T j for time τ 9: V gives a classical description of C j to B t j + τ 10: B applies C j to the state in the memory 11: V distributes a challenge bit f j ∈ {0, 1} to B according to the string f κ t j + τ + 12: if f j = 1 then 13: Set B sends |ψ to A using deterministic teleportation t j 22: A stores half of her teleportation EPR pair in memory M T j for time τ 23: V gives a classical description of C j to A t j + τ 24: A applies C j to the state in the memory 25: V distributes a challenge bit f j ∈ {0, 1} to A according to the string f κ t j + τ + 26: if f j = 1 then 27: continue 28: else 29: A outputs her state 30:

31:
V decides on the value v i ('0' reject, '1' accept) 32: break measure in the test. However, the same can be achieved more efficiently, by using sampling from the finite set of the six Pauli states X. The reason for this is that X has a property of a 2-design, meaning that discrete uniform averaging over states (polynomials of degree 2) in X, reproduces the Haar average over the full state space. A similar observation holds for Haar sampling from a set of gates G in the case when G is a full unitary group. Then, it is enough to consider sampling from the Clifford group of single-qubit gates Cliff to reproduce the average probability of success. Note that this allows us to estimate the average fidelity of the test, even in the case when one is not able to implement the full unitary group. Lastly, we remark that any set of states and unitary gates with 2-design properties can be used in place of the Pauli states and Clifford gates. For more details on 2-design properties of the above sets see appendix D. Therefore, we consider a more efficient version of the ping-pong test, test 2. Motivated by the above and the fact that for a quantum network quantum channels between the nodes are realized by quantum teleportation, we choose: (a) The set of states is the set of six Pauli eigenstates, |ψ ∈ X with a uniform probability distribution 1 |X| = 1 6 ; (b) The set of gates is the Clifford set for a single qubit, C j ∈ Cliff with a uniform probability distribution 1 |Cliff| ; (c) Sending a qubit from node A to B is done with perfect deterministic teleportation.
We describe the teleportation-based ping-pong test with a triple (k, Cliff, X). Note that in this case the quantum channel at round j, E j , is equivalent to applying a quantum memory M T j to a half of the EPR pair by one of the provers. We can put τ = t M + t send , which is the time required to generate one maximally entangled state and send over a classical message from node A to B. Hence, a teleportation-based ping-pong test of depth κ for a sequence of chosen Clifford gates c κ = C 1 , · · · , C κ can be associated with the following operator For detailed mathematical description of the test, we refer the reader to appendix B. By using definition 2 with the set of Pauli states X and the set of Clifford gates Cliff, the average probability of success for the teleportation-based ping-pong, test 2, is Note that in test 2 the sampling of depths, gates and states is done uniformly at random. Using the definition of the expected value and the IID assumption , we can write that the winning rate has the expected value .

Lemma 1.
The expected value of the winning rate R in test 2, equation (2), is equal to the average probability of success P , Corollary 1 (Finite statistics). The probability that the winning rate R differs from the average probability of success P by more than is exponentially small in , Furthermore, let us set δ = 2e −2n 2 . If one fixes confidence δ and accuracy , then the minimum number of rounds n necessary to attain these parameters is given by n ln(2δ −1 ) 2 2 .

Prover-verifier view
In this section we interpret our test, test 2 in the prover-verifier view. Specifically, we view our test as an interactive game played between a verifier V (trusted third party), and two provers (the nodes A and B) [16]. An interactive game is a situation where provers exchange a fixed-sized quantum register with the verifier n times. The verifier is honest and wants to verify a certain statement, operating according to a defined set of rules. However, potentially dishonest provers optimize towards a strategy that causes a verifier to output 1 (accept). We further assume assume a standard scenario, where the provers agree on their strategy prior to the beginning of the test and they do not communicate to readjust it during the execution, see definition 4. In contrast to the interactive proof literature, in our framework we consider finitely many test executions and therefore, we can also make non-asymptotic statistical statements. In this view, performing test 2 allows us to certify that the provers have capabilities to perform k-round protocols. Indeed, if the provers follow the test then they can convince the verifier that they do so and achieve a high average probability of success. On the other hand, if the provers do not follow the test they cannot achieve a high probability of success and the verifier detects this behavior with high probability. Formally, we require that the test satisfies: • Completeness-if the provers are able to execute protocols that are certified by the test then they succeed in a game against the verifier, i.e. achieve a winning rate above a certain winning threshold t, R > t, see equation (2); • Soundness-if the provers are not able to execute protocols certified by the test, then they can only achieve a winning rate R t.

Sending channel
Let us now introduce a framework that formalizes what we mean by a round of a quantum communication.
Whereas numerous schemes to describe local operations exist [4][5][6][7][8][9][10] it is not clear how to certify a round of quantum communication. To achieve this, we will assume that the provers are not honest, and might therefore employ an arbitrary strategy leading to a high probability of success. In particular, they might even try to not use a communication channel at all in some rounds of the protocol. As a consequence, we have to specify what we mean by a round of communication.
For sending classical bits one typically considers the following scenario: A chooses a random bit b A 0 ∈ R {0, 1} at time t 0 and wishes to send it to B. We then say that the nodes used a classical channel In analogy, we could say that quantum communication through a quantum channel E : A 0 → B 1 occurred if at time t 0 a quantum state |ψ A 0 was input on node A and at time t 1 it appeared on node B with probability 1, Note that in the classical case, we can prove that the channel was used to send information about the bit only for one round, by giving a uniformly random bit to A and ask B to guess it. Indeed if B guesses it with probability higher than 1/2 then some information must have traveled from A to B. Given a single bit as an input, one cannot generalize that to many rounds with a 'ping-pong' type of protocol like test 2. This is due to the fact that before A sends information to B in the first round, she can keep a copy of the bit. However, this issue can be avoided in a quantum setting due to the no-cloning theorem [17]. Indeed, if A gets a random unknown state and B is able to output the exact same state (with probability 1), then not only did all the (quantum) information about the state traveled from A to B, but also A could not have kept any information about the state to herself (see theorem 2).
While the above definition provides a good intuition of what is going on, it becomes impractical when states do not have a unit probability of being transmitted through a channel (which in relation to our test means t < 1). In such a scenario, classically, we can say that the nodes used a classical channel E cl : A 0 → B 1 if the probability of correctly identifying A's input bit on B's side increased in time, . This implies that some information about the bit must have been transferred from A to B, see figure 2. Our definition of quantum communication is, therefore, a generalization of the above to the quantum case. We say that quantum communication E : A 0 → B 1 occurred if the probability of correctly outputting A's input quantum state on B's side increased in time, see definition 3.
In words, we say that a sending channel was used by the nodes if the fidelity averaged over all states, and optimized over all operations Γ that the nodes can locally do, increased from instant t 0 to t 1 . Note that the above definition implies that any communication, quantum or classical, which increases fidelity of the state is considered a sending channel. As an example consider the following strategy. Node A receives an unknown state from the verifier, measures it in the standard basis and sends the measurement outcome to B. Without loss of generality, let this measurement outcome be 0. Before receiving A's measurement outcome, B has average probability 1 2 of correctly passing verifier's test. However, after receiving A's measurement outcome, B can locally prepare |0 state which increases the average probability of correctly identifying verifier's state to 2 3 . Therefore, there exists a purely classical strategy which satisfies our definition. As a consequence, we say that whenever the nodes do not use a sending channel E, no communication (quantum or classical) occurred between them.

Definition 3 (Sending channel). A channel E
where ρ ψ then we talk about an exact sending channel.

Definition 4 (m-cheating). Provers
A and B are m-cheating if their cheating strategy uses a sending channel E between them at most m times. We assume that the provers choose a strategy-in which round they use a sending channel and in which they do not-prior to the beginning of the test.

Exact completeness and soundness
To investigate the power of test 2 in verifying capabilities of the network, we first consider an instructive case when P = 1. If the nodes are able to perfectly execute the test then they succeed with a unit probability, trivially satisfying the completeness, see theorem 1. On the other hand, if we demand that the nodes always succeed in the game, we can ask the question whether the nodes have the ability to perfectly execute protocols that have the form of test 2, i.e., whether the test is sound. We answer this question positively in theorem 2 below.
Theorem 1 (Exact completeness). If the provers are honest and they are able to perfectly execute test 2 then they succeed P = 1.

Theorem 2 (Exact soundness).
If the provers win the test with P = 1 then they must be able to perfectly execute test 2 and they use an exact sending channel E between them k times.
Idea of the proof. To prove the theorem, we argue that P = 1 implies that the probability of winning p |ψ, c κ ,κ for all states, all Clifford gates and all depths should be 1 (in particular, this implies that the provers are able to apply the required Clifford gates on the input state). Therefore, the average fidelity at every depth κ should be 1. That is, if at step κ − 1 A has fidelity 1 it means that the state on A is pure, and by a purifying argument, B's average fidelity at step κ − 1 must be 1/2. At step κ B has fidelity 1, which means that whatever channel A and B have used between step κ − 1 and κ, it must be an exact sending channel (see definition 3). For more details see appendix E.1.
Note that in practice we are only able to observe the winning rate R and, due to the finite statistics of our test, we cannot certify P = 1.

Completeness and soundness
Therefore, let us explore the implications of test 2, given that the winning rate R > t is observed. If the provers are honest and their devices are sufficiently good, their winning rate should be larger than threshold t with high probability. More specifically, let memories and gates at every round j be described in terms of the average fidelity. Assume that the quality of memory and gates is the same at every round j, i.e. for all j, the average fidelityμ Below we show that for honest provers, a certain fidelity of operations implies a bound on the winning rate. In order to satisfy both completeness and soundness we choose the winning threshold t > 5 6 , since the test 2 does not lead to any conclusion in the case when t 5 6 , see

Theorem 3 (Completeness). If provers are honest and their individual operations satisfyμ h
, then the winning rate R in test 2 is bounded by R t with probability at least 1 − e −n 2 , where t ∈ ( 5 6 , 1] is a winning threshold and is given by equation (9).
Idea of the proof. Using 2-design properties of the set of states X and the set of gates Cliff, we show that in the regime where fidelityμ is the same for every round j, we can express the average probability of success as a sum of powers ofμ. That is, , see appendix E.2 for details. Since we want the winning rate R to be higher than the threshold t, we invert the function h k to obtain a bound on the fidelity of the devicesμ. We plot the inverse h −1 k (t) in figure 3 for t ∈ ( 5 6 , 1]. Moreover, we can ask whether the converse of the above statement is true, i.e. whether a certain winning rate R > t implies something about test 2. When the provers are honest, we can reverse the completeness statement obtaining a bound on the quality of their devices. If the provers are dishonest (m-cheating) then they do not have to exactly follow the test. However, in this case we will show that the winning rate R allows us to certify that the provers used a sending channel (definition 3) a certain number of times.

Theorem 4 (Soundness). If the provers are m-cheating then the winning rate in test 2 is bounded by
Idea of the proof. In the case when the provers are m-cheating they can agree on a cheating strategy which uses a quantum channel E between them at most m times, see definition 4. To prove soundness in this case we look at the average probability of winning for A and B at time steps κ − 1 and κ. In appendix E.2 we argue that whenever the provers use the channel E, this probability is bounded by 1. On the other hand, whenever they do not use the channel and no communication occurred, we argue that the average probability of winning at both time steps is bounded by 5 6 which is the bound provided by the approximate cloning theorem [18]. Since the nodes use the channel E at least m times, their overall average probability of winning P is bounded by 1 k m + 5 6 (k − m) .
The above theorem implies that in the situation when we do not trust the nodes, the higher m we would like to certify, the higher the winning threshold should be. Indeed, for P t we obtain m k(6t − 5). If we now set t = 1 − η, for some small η, then m k − 6kη. For m ∼ k, one should set at least η = O(k −1 ).

Remark.
Note that in theorem 2 we are able to fully certify the action of the provers, even if they are not trusted. In particular, we know that they have perfectly sent the state to each other k times. On the other hand, theorem 4 only certifies the use of some quantum or classical channel regardless of its quality. In particular, in the limit where P = 1, theorem 4 show that m = k sending channels have been used, but we cannot explicitly certify the quality of the channel. However, the exact soundness statement, theorem 2, suggests that even in the imperfect case, the test should be able to certify the quality of each individual operation used by the provers.

Estimation view
In this section we interpret our test in the context of estimation in order to obtain measures of confidence in the nodes' ability to perform the test. We assume that the nodes A and B are honest and follow the protocol. Specifically, we use the winning rate R in the teleportation-based ping-pong test, as a figure of merit to estimate the quality of the network. We then provide a consistency check which allows us to compare this to the quality one would expect from combining the individual devices. Furthermore, we use the statistics of the test to estimate the performance of k-round protocols.
Throughout this section we will use a tilde to denote noisy counterparts of operations, for exampleT κ will denote a noisy realization of the κ-round teleportation-based ping-pong test T κ , test 2.

Preliminaries
In this section we introduce mathematical tools which will be useful for (i) checking whether the test is consistent when the honest nodes use devices of a certain quality, section 5.2, and (ii) drawing conclusions about the performance of k-round protocols, sections 5.3 and 5.4.
We describe the quality of individual devices with a noise model. Specifically, we assume that the individual operations used in the test, i.e. memories M j and gates C j , have been tested individually for each round j, to obtain an estimate on their performance. More formally, let the quality of a noisy gateC j at round j, be described with the average fidelity,F(C j ) = dψTr C j (|ψ ψ| ) · C j (|ψ ψ|) , for all j = 1, . . . , k. Furthermore, let the average fidelity have an empirical estimate r C j , which is known with certain precision [19], such that where Here n C j is the number of repetitions with which the estimate r C j was obtained.
Similarly, forM T j a noisy quantum memory at round j, average fidelity is . This average fidelity has an empirical estimate r M T j and a precision bound where Furthermore, we assume that the nodes can locally and perfectly prepare and measure a quantum state.
The teleportation-based ping-pong test, test 2, is performed the total of n times. Note that one can easily record which executions i were performed for depths κ, states ψ and strings of Clifford gates c κ . Then, in analogy to equation (2), we can define the winning rate for a fixed depth κ and string c κ , where n cκ,κ is a total number of executions for fixed κ and c κ , and v i c κ ,κ is a corresponding random variable assuming values 0 and 1 for 'lose' and 'win' events respectively. Analogously, we can record which executions correspond to a fixed depth κ only. We define as the winning rate for a fixed κ. Here n κ is a total number of executions for depth κ and v i κ is a corresponding random variable recording the wins in the test. Now we will relate the above winning rates to the measures of quality of the test. Intuitively, the higher the winning rate the better the test performs and the less noise is present in the setup. In the remaining part of this section we make that statement rigorous.

Lemma 2. Let the average fidelity of a noisy realization of test 2,T κ , for a fixed depth κ and a fixed string of Clifford gates c
where T κ is defined as in equation (6). The expected value of the winning rate R c κ ,κ over the set of states X, is equal to the average fidelity of the testT κ , Idea of the proof. The first step of the proof is to notice that the expected value of the variable v i c κ ,κ is the probability of success in a single round averaged over all the states in X, The second step is based on relating the above quantity to the average fidelity. Here the key idea is to observe that the expression under the trace contains only polynomials of degree 2 in |ψ ψ|. Therefore one can use the 2-design properties of the set X to equate the discrete averaging over the six Pauli states to the continuous Haar averaging over the whole state space in average fidelity. The details of the proof can be found in appendix F.1.
The above lemma has a simple useful corollary, namely, that the average fidelity and the winning rate R c κ ,κ can be related through the Hoeffding inequality, Before we make a similar connection for the rate R κ , let us define a useful quantity.
Definition 5 (Double-averaged fidelity). LetF cκ,κ (T κ ) be the average fidelity of a the teleportation-based ping-pong test, test 2, defined for a fixed depth κ and a fixed sting of Clifford gates c κ . We define the quantityF as double-averaged fidelity. The averaging for every gate C j is taken according to the Haar measure.
Lemma 3. The expected value of the winning rate R κ in test 2, for a fixed depth κ, taken over the set of states X and set of Clifford gates, is equal to the double-averaged fidelity of the testT κ , The intuition behind the above lemma is that discrete averaging in E[R κ ] X,Cliff over the Clifford gates is equal to the continuous averaging in the definition ofF κ (T κ ). This statement follows from the unitary 2-design properties of the Clifford set, see appendix F.2 for details.
Finally, the probability that the empirical data R κ differs from double-averaged fidelity by more than κ is bounded by the Hoeffding inequality,  [20],

Consistency check
The bound holds for any 2κ quantum channels such that κ j=1 acos , and cκ,κ is given by equation (18).
Recall that the individual estimates are known with certain confidence. That means that the above consistency check will be satisfied with a certain probability. We state it formally in the corollary below.

Corollary 2.
Given the estimates of average fidelities for memories r M T j and gates r C j are known with confidence M T j and C j respectively, the bound from theorem 5 is satisfied by noisy devices with probability at least Idea of the proof. The probability that the bound (22) is satisfied is equal to the unity, minus the probability that at least one of the bounds for individual devices is not satisfied. By properties of probability one arrives at the statement above, see appendix F.3 for details.

Performance of k-round protocols
In this section we investigate the implications of test 2 for the performance of more general k-round protocolsP k , see definition 1. We show that their performance can be bounded using the winning rate R κ (section 5.1) in the teleportation-based ping-pong test.
To explore the performance of protocolsP k we consider the diamond distance [21], However, since Prep and Π are perfect by assumption, the above diamond distance is upper-bounded by P k − P k , which we fix to be the figure of merit in this section. It can be shown that the diamond distance is related to the average fidelity in the following way [22], whereF k, g k (P k ) = dψ Tr P k (|ψ ψ|) · P k (|ψ ψ|) is the average fidelity of a protocolP k of a fixed depth k and for a fixed string of gates g k . Note that the average fidelity differs depending on the sequence of gates one chooses to apply. Therefore, to estimate the behavior of protocolP k one would have to know fidelitiesF k, g k (P k ) for all possible gate sequences G 1 , . . . , G k , which is unfeasible in practice. For this reason, it is much more convenient to use double-averaged fidelity to bound the performance of a protocolP k . We formalize this argument in the following theorem.
Theorem 6 (Performance of k − round protocols). The performance of single-qubit k-round protocols, definition 1, can be bounded in terms of an estimate for the double-averaged fidelity R k of the k-round teleportation-based ping-pong test, test 2, in the following way where |Cliff| is the size of the Clifford group for dimension 2 and k is given by equation (21). The bound is Idea of the proof. To prove the theorem, one first needs to observe that the double-averaged fidelity,F(P k ), can be lower-bounded byF g k (P k ) minimized over all possible strings of gates g κ , see appendix F for details. Moreover we have thatF k (P k ) =F κ=k (T κ=k ). It follows from the fact that averaging over the Clifford group is equivalent to averaging over the entire unitary group, since the Clifford group forms a 2-design. Furthermore, the equality is possible, since we have put M T j ≡ M j • E j , and M T j encompasses operations associated with sending (in the test-teleporting) and storing the qubit. Combining the above with equations (20) and (23) yields the desired result.
Finally, observe the above results can be straightforwardly generalized to bound the performance of protocolsP K for depth K > k. Since the teleportation-based ping-pong test is performed for all 1 κ k, we can define a set S such that κ∈S κ = K. ThenP K = κ∈SP κ . Using the triangle inequality for the diamond distance, theorem 6 can be, therefore, rewritten as where κ is given by equation (21).

Simulated results
To gain intuition on how the test performs in this section we consider a few numerical examples. First, we discuss the implications of the consistency check, theorem 5 and articulate the relation between the average fidelity of individual devices and the maximal depth of the test k. Second, we discuss the performance of the test under common noise models, depolarizing and dephasing noise. Finally, we comment on bounding the noisy protocolsP k based on numerical results from the teleportation-based ping-pong test.
Assume a test of maximum depth k = 2, where we teleport a single qubit state at most two times between A and B. Moreover, for simplicity say that A and B have access to memories and gates of equal fidelities, r M T j = r C j = r. Observe that the higher depth of the test κ, i.e. the more devices one is testing, the higher individual fidelities should be, see figure 4. Finally, note that the bound used for consistency check (22) was derived for a generic noise model and it was shown to be tight [20]. This means that if one does not have any additional knowledge about the noise present in the devices then the results presented here cannot be further improved.
Let us now look at two specific noise models. Namely, let us model memories and gates to be (i) single-qubit depolarizing channels, i.e. D(ρ) = pρ + (1 − p) /2 and (ii) single-qubit dephasing channels, i.e. F (ρ) = qρ + (1 − q)(ZρZ † )/2, where Z is the Pauli Z gate. Again, in these two cases let us fix the average fidelity estimate of individual devices r. Figure 5 presents the simulated behavior of the test as a function of individual estimates r in the two cases. Observe that the test performs according to intuitive expectations-if the noise is modeled as dephasing, the average fidelity of the test is higher than in the case of depolarizing noise, since the dephasing channel subjects any input state only to the Z component of the Pauli noise, whereas depolarizing channel to all X, Y and Z components. Therefore, we expect 'more' noise when the state is subjected to the depolarizing noise.
Although in our network model we assume that the state preparation is perfect, it is interesting to see the behavior of the test once imperfect states are used. Figure 5 shows a result of simulation of the test when the initial state is submitted to a small dephasing noise, such that fidelity of the input state is 0.9. Note that if one has access to the average fidelity estimate of the noisy channel acting on the initial state, then one can use it in the consistency check (22), simply treating the noise of the state as an additional channel in the protocol.
Let us also comment on the bound from theorem 6. Already for a single qubit one obtains a constant prefactor of 2 √ d(d + 1) ≈ 4.9. In addition to that, bound (24) contains a factor associated with the size of the Clifford group-for a single qubit |Cliff| = 24. If one considers protocols of maximum depth k = 2 then to obtain a non-trivial bound on the behavior of protocols in the class, the estimate of double-averaged fidelity must be of order R κ = 1-10 −5 . This puts a very high precision requirement on double-averaged fidelity and, consequently, on individual devices.
As an example consider the quantum gambling (QG) protocol [23]. In the protocol, A chooses one of the states {|0 z , |0 x } and sends it to B. After receiving the state B stores the state and communicates classically his guess on the state sent by A. A upon receiving the classical message from B, communicates back whether B won or lost. After this round of communication B measures the state either in Z basis or X basis. Let the protocol be described with a map P QG which consists of local operations on the state (except measurement and state preparation, as before). Then P QG consists of k = 2 rounds of communication during which B has to store the state. Assume that in the protocol quantum memory is modeled as a depolarizing channel with fidelity 1-10 −5 . Then explicit evaluation of the diamond distance P QG − P QG  (22). No markers (blue color) correspond to the case where the input state is perfect, whereas the triangle markers (red color) to the case where the input state is dephased to initial fidelity 0.9. yields value 6 · 10 −5 . On the other hand if one uses a two-round test to bound the behavior of the protocol, without explicit a priori knowledge about the noise model of the memories then the bound from theorem 6 has the value 0.7436. However, note that in the quantum gambling protocol one does not perform any gates. Using this explicit knowledge about the protocol one could in principle tailor a ping-pong teleportation-based test without any gates. In this case, there would be no need to average over gates and therefore, the bound from theorem 6 would not carry the |Cliff| k term. Consequently, the bound could be improved to value 0.0310.

More noise
In our network model we have assumed that state preparation, measurement, sending qubits as well as preparing an EPR pair can be done perfectly. In particular, this implies that in our test teleportation is carried out perfectly. However, the test can still be performed without major changes if one wishes to take into account noisy teleportation.
We consider two main noise sources arising in teleportation-noise coming from performing imperfect Bell measurement and recovery operation, and noise originating from creation of an EPR pair. In appendix C.1 we show that in a single round j of our test both of these noise types can be absorbed into the noise coming from the memoryM T j , for all j. For the former noise source we assume a noise model where the imperfections follow the Bell measurement but precede recovery operation. For the latter noise source, we assume that noise is local for each half of the EPR pair and that it can be modeled as mixed-unitary noise. That is, each half of the EPR pair is subjected to N(·) = l p l U † l (·)U l , where U l is a unitary operation, and p l is a probability. Then all the teleportation noise can be included in the noise of the memory and we can carry out the test as described above, i.e. by sending qubits via perfect teleportation.
Similarly to the analysis outlined in the previous paragraph, we can treat the noise of the state preparation as if it arose in the teleportation. Indeed, one can absorb the noise in the initial state similarly to the analysis in appendix C.1. Note that in figure 5 we indicate what one might expect from the test if the initial state is noisy. As for the noise in the final measurement, if we consider that the noisy measurement is described by a noise map N followed by a perfect measurement, then N can be treated as another noisy memory applied to the state before measuring. In this case, the analysis carried out in lemma 7 of appendix D still holds.
Finally, we remark that our test can be extended onto multi-qubit settings, where the number of qubits in the k-round protocol is Q. For a detailed description we refer the reader to appendix G.

Conclusions and outlook
In this work we considered the problem of certifying that a quantum network achieves the ability to perform a subset of protocols within a certain stage of development, i.e. a stage called quantum memory network. We designed the first testing protocol, which certifies that nodes have the capability to control and send qubits around the network k times. We provided completeness and soundness statements for our protocol and expressed them in the interactive proof language. Moreover, in an honest implementation, we demonstrated that passing our test allows us to estimate statistical quantities about the devices used in the test and conclude about the performance of other k-round protocols in a quantum network.
An important question is how our estimate of performance for the class of multi-round protocols can be improved. Note that in our simple analysis we bound a very general class of protocols using a single test-we bound the behavior of any unitary gate in terms of behavior of a small subset of gates. Therefore, it is not surprising that there must exist a trade-off between universality of the protocols and the precision of estimating their performance. One improvement could result from designing tests for a more specific (and therefore smaller) class of protocols. Alternatively, tailoring tests using additional knowledge of the underlying noise in a quantum network could improve the bound on the performance of k-round protocols.
Furthermore, as mentioned before, our test does not certify that any universal gate can be implemented. Due to the mathematical structures of unitary designs that we used, we can only make a statement about implementability of the gates from the Clifford set or any gate set with 2-design properties. It is, therefore, an open problem how to test a quantum memory in the presence of the set powerful enough to generate any unitary operation. Such a universal set is, for example, a Clifford set extended with a T gate [24,25].
In the following we present technical details of our work. We first provide mathematical preliminaries necessary for our further considerations in appendix A. Then, in appendix B we give a detailed mathematical description of the general ping-pong test, test 1, and the teleportation-based ping-pong test, test 2. In appendix C we justify why in the teleportation-based ping-pong test, it is possible to absorb the (possibly noisy) teleportation channel into a memory M T j . Next, we discuss 2-design properties of sets of Pauli states and Clifford gates in appendix D. In appendix E we prove completeness and soundness statements of our test 2. Then, in appendix F we give proofs of statements discussed in the estimation view of our test. Finally, we discuss how to extend our results to Q-qubit protocols in appendix G.

Acknowledgments
We thank J Helsen, and G Murta for inspiring discussions and useful comments on this work. We also thank B Dirkse, T Coopmans, M Steudtner and K Goodenough for feedback on the manuscript. This work was supported by STW Netherlands, NWO VIDI grant, ERC Starting grant and NWO Zwaartekracht QSC.

Appendix A. Preliminaries
Communication between nodes of a quantum network can be described by quantum channels. A quantum channel can be described by a completely positive trace-preserving (CPTP) linear map Λ : D(H) → D(H), where D(H) denotes the space of density operators acting on Hilbert space H. In a realistic setup, quantum channels are not perfect (or ideal) and instead of applying a perfect channel Λ one applies its noisy counterpartΛ. If the perfect Λ is unitary, then without loss of generality, a noisy channelΛ can be written as a noise map N followed by a perfect channel Λ, i.e.Λ = Λ • N. A sequence of n operations can be represented as a composition of n maps,Λ n • · · · •Λ 1 .
One can quantify the difference between a noisy channel and its perfect implementation using the average fidelity. where dψ is the Haar measure on pure states.
Average fidelity is a quantity which can be accessed empirically and as such it is widely used as a parameter estimating the quality of a quantum channel. One cannot hope, however, to empirically average over the continuum of all pure states. Realistically, to access average fidelity one can use the properties of so called quantum state designs. Intuitively, a quantum design is a probability distribution over pure states, which replicates the properties of the Haar averaging over the entire space of pure states.

Definition 7 (Projective t-design).
A projective t-design is a distribution {q ψ , ψ} over some finite set of states such that ψ q ψ |ψ ψ| ⊗t = dψ |ψ ψ| ⊗t . (A2) An example of a projective 2-design for qubits is given by a set of six Pauli eigenstates, X chosen with equal probability 1 6 . A similar definition can be used when talking about averaging over the unitary group U(d) of dimension d, see [26] for details.

Definition 8 (Unitary 2-design). A set U(d) of unitary matrices is 2-design if for any quantum channel Λ
holds that [27] 1 |Y| where dU denotes the Haar measure on U(d). An example of a 2-design for a unitary group U(d) is the Clifford group Cliff(d) with uniform probability of each element.
Another useful figure of merit for channels is the diamond distance [21].
Definition 9 (Diamond distance). The diamond distance between two operators,Λ and Λ, is defined through a distance measure on the space of density operators, maximized over all density operators ρ, where · 1 is the trace distance. The operational meaning behind the diamond distance definition is that it quantifies the worst-case distinguishability of any two quantum channels when one is given access to entanglement with an auxiliary system.
From the properties of the diamond distance it follows that, Note that such a relation cannot be easily found for average fidelity, since, unlike the diamond distance, fidelity is not a metric. Although the diamond distance offers a convenient theoretical description, it is not as practical as average fidelity. But, since average fidelity and diamond distance both estimate the quality of a quantum channel, there exists a relation between the two. Indeed, it can be shown CIT that where d is the dimension of the underlying quantum system. While performing an experiment, for example estimating the average fidelity, one gathers empirical data. To compare the data with theoretical expectation one can use the Hoeffding's inequality [19]. It states that the probability of the empirical mean and its expectation differing by more than is exponentially small in n.

Lemma 4 (Hoeffding's inequality).
If v 1 , . . . , v n are independent random variables, 0 v i 1, with empirical mean defined as then an upper bound on the probability that the mean of random variables deviates from its expected value is given by

Lemma 5 (Choi isomorphism). For a map Ω : H S 1 → H S 2 the following identity holds:
where ω Γ S 2 S 1 is a Choi state associated with the map Ω S 1 →S 2 of the form ω Γ

Appendix B. The test-detailed description
In this section we provide a mathematical detailed description of our tests. First we consider a general case of the ping-pong test, test 1. Then we discuss the specific case of the teleportation-based ping-pong test, test 2.

B.1. General ping-pong test
We describe a general test test 1 as a CPTP map which we will denote S κ . We first consider all the registers available to the nodes. We call κ the depth of the test and assume that κ is a natural number upper-bounded by given k. The time for performing one round j = 1, . . . , κ of the protocol is equal for all the rounds, i.e. Δt = t j+1 − t j = t send + t M + . We will describe a round where node A initiates sending of the state, which implies that j is odd. However, this description is fully symmetric and for even j it is enough to interchange registers of A with registers of B. A sends the qubit |ψ ψ| A in j to node B using channel E A in j →B j which takes time upper-bounded by t send . After time t send + t M the verifier chooses a gate according to distribution p G and gives its classical description g j g j G j to B. B applies the quantum gate that corresponds to the description and that we describe with a CPTP map G G j B j →G j . This takes time . After this, at time t send + t M + the verifier distributes a challenge bit f j f j F j chosen uniformly at random (0 means 'teleport back', 1 means 'output'). Depending on the challenge, B applies IN F j B EPR j →B in j+1 for f j = 0 and OUT F j B j →B out j for f j = 1. (Honest round j). Round j of a general test, where provers are honest, can be described aŝ

Definition 10
whenever the challenge bit is 0, or whenever the challenge bit is 1.
Note that challenge bits form a string of length κ, f 1 . . . f κ , in registers F 1 . . . F κ , consisting of κ − 1 ones and a single zero bit on κ-th position. We denote such a string by f , i.e. f κ = 1 · · · 10 κ . For simplicity we will use a short notation for multiple registers, e.g. F [1,κ] ≡ F 1 . . . F κ . Similarly, we will denote by g κ a sequence of κ gates chosen by the verifier, each of the gates chosen at the time step defined above. By G [1,κ] ≡ G 1 · · · G k we denote k registers for the choice of a gate.

B.2. Teleportation-based test
whenever the challenge bit is 0, or whenever the challenge bit is 1.
Note that, for simplicity, in the main text we denote Having defined a single round of a protocol we describe the ping-pong teleportation protocol of depth κ. Such a protocol is simply a κ-round teleportation, where first κ − 1 maps have form (B5) and the last map outputs the state and so has the form (B6).

Definition 13.
The teleportation-based ping-pong testing protocol of depth κ for a state |ψ ψ| ∈ X, a sequence of gates c κ ∈ Cliff κ and a string of challenges f κ = 1 · · · 10 is defined as a CPTP map T κ such that applied to the input state where Λ's are defined as in definition 12.

B.3. Measurements
Upon receiving requested state from either A or B, V must check its consistency with the distributed state, as well as confirm applying desired gates. This can be achieved by projecting outcomes onto the state C κ • · · · • C 1 (|ψ ψ|) B out κ , which is the original state rotated with κ Clifford channels.
Definition 14 (POVM elements for the node V). Measurements performed by V in the teleportation-based ping-pong test can be described by POVM elements, for all κ = 1, . . . , k. κ denotes here the output register of the κ-th party, depending on the parity either A or B.

B.4. Renaming teleportation channel
Now that we have formalized the testing protocol in detail, we will justify using notation for a teleportation channel used in the main text. That is, we will show that a teleportation channel with noisy memory acting on |ψ ⊗| Φ + can be viewed as a channel M T j acting only on |ψ ψ|. Recall definition 12. In a single round j of the protocol A performs a Bell measurement on the state |ψ ψ| A in j and her part of EPR pair. This action is described described by an operator B A in j A EPR j →M , acting on two registers on A's side and producing a classical message m ∈ M which is then sent to B. The initial state Upon receiving a classical message m B undoes the unitary operations to recover the teleported state. This operation is described by a map R MB EPR Then, the test of depth κ can be described as in the main text Appendix C. Teleportation and quantum memory C.1. Absorbing teleportation noise into the memory As it is often done in the estimation literature for quantum computing, see e.g. [8][9][10], we will model teleportation as a perfect operation followed (or preceded) by noise. This will allow us to consider teleportation as a perfect operation i.e. with perfect Bell measurement and recovery operation as well as perfect EPR pair, and absorb all the associated noise into the quantum memory.
C.1.1. Noisy operations. Assume a Bell state measurement is followed by a local noise, Assume further that the recovery operation is also noisy, but in this case the map is preceded by the noise, Here the maps N are mixed-unitary channels, i.e. have the form N(·) = l p l U l (·)U † l , with p l being a probability and U l a unitary. Note that this is not the most general type of noise, however the most common ones (e.g. depolarizing, dephasing) can be modeled this way. Moreover, note that for an EPR pair it holds that (C1) Therefore, using an explicit form of maps N and the above statement, we can write, In particular, this means that noise acting on the EPR pair, which has the mixed-unitary form, can be absorbed into the memory map,

Appendix D. 2-designs
In this appendix we show that for the ping-pong test, the average of the probability p |ψ, c κ ,κ over the six Pauli states is equal to its average over the whole state space according to the Haar measure. To do so, we use the fact that the uniform distribution over set X is a 2-design [26] and p |ψ, c κ ,κ contains a polynomial of degree 2 in |ψ . Next, we prove a similar statement when averaging over the Clifford group.
Proof of lemma 2. We can write the left-hand side explicitly as, where we explicitly write A and B's input and output registers. Let X, Y ∈ L(H) be linear operators over the Hilbert space. The inner product Note that κ j=1 C j is a unitary channel and therefore, we can write, Now, using Choi-Jamiolkowski theorem, see lemma 5, we can write is a Choi-Jamiolkowski state associated with the map It is now clear that averaging is taken over a polynomial of degree 2 under the trace and we can use properties of a 2-design. Therefore, where we used Choi-Jamiolkowski isomorphism and properties of the trace again. We definē F c κ ,κ = dψ p |ψ, c κ ,kt as the average fidelity.

D.2. Clifford gates
Now we will prove that averaging p |ψ, c κ ,κ over the Clifford set reproduces averaging over the whole unitary set taken according to the Haar measure. p |ψ, c κ ,κ = dψ dC 1 · · · dC κ p |ψ, c κ ,κ . (D9) Proof. Just like in the previous lemma, let us first use cyclicity of the trace, where in the last step we pulled the summation over κ − 1 under the trace. Note that is an unnormalized twirl over Cliff and therefore it commutes with all Clifford gates C ∈ Cliff. By repeating pulling the summation under the trace, we can write, Now we are left with a rather awkward map M T 1 which is not twirled. However, since the Haar measure is invariant under unitary transformations, for all E ∈ Cliff it holds that where in the last line we used the fact that the value of the expression does not depend on E. Now, using cyclicity of the trace and commutativity properties of E, we get (D18) Now we can change discrete averaging to the continuous one by definition of the unitary 2-design, see definition 8 and [27]. We have To get back to the expression for p |ψ, c κ ,κ , we can invert the procedure we just applied, i.e.

Appendix E. Completeness and soundness E.1. Exact completeness and soundness
To keep this section more compact, we use notation from the main text. That is we express test 2 as (6).
Proof of theorem 1. First, we prove that test 2 is exactly correct when the winning threshold P = 1. That is, for honest A and B and for any 1 κ k after κ rounds the state that the verifier obtains at output κ is κ j=1 C j (|ψ ψ|). To prove this, we need to make sure that for all the rounds preceding κ the state at outputs j = 1, . . . , κ are correct. The above can be proven by induction. For κ = 1 the verifier measures C 1 (|ψ ψ|). On the other hand, C 1 • M T 1 = C 1 (|ψ ψ|), since the setup is perfect. Repeating this step inductively we get for all κ, Hence, P = 1.
Before proving theorem 2 we formally prove a known fact related to no-cloning theorem [17]. Proof. By Stinespring dilation ∃V A→A BE Ω(·) = Tr E (V(·)V † ). Since Tr B (ρ A B ) = |ψ ψ| A = tr BE (V |ψ ψ| V † ) we must have by the above lemma that V |ψ ψ| V † = |ψ ψ| A ⊗ junk junk BE , and therefore Proof of theorem 2. Now we prove that test 2 is exactly sound. That is, if the average probability of success P = 1, then nodes A and B have the ability to correctly execute test 2. The intuition behind our proof is that challenges given by the verifier impose a certain structure on the provers strategy. We first show that if the nodes win the test with probability 1, then their strategy must produce the correct state at each time step κ. Then, we argue that this implies that the nodes must have passed the state around, and therefore use a quantum channel between them exactly κ times.

Lemma 9.
Let Q c κ ,κ be an arbitrary strategy of the provers, which can depend on the information available throughout the protocol, i.e. depth κ and Clifford string c κ . If the average probability of success in test 2 is P = 1, then for all depths κ = 1, . . . , k, all Clifford strings c κ and all input states ψ ∈ X, Q cκ,κ outputs the correct state.
Proof. This statement is essentially the inverse of the exact completeness statement. Let us explicitly write the average probability of success, This implies that for all states, gates and depths the trace must be equal to 1, and the state at every κ must be exactly the one requested by the verifier.

Lemma 10.
If the average probability of success in test 2 is P = 1, then for all depths κ = 1, . . . , k, all Clifford strings c κ and all input states ψ ∈ X, Q c κ ,κ uses an exact sending channel κ times and apply an operation equivalent to the one described by c κ .
Proof. In our test at every time step κ the provers must produce some state. Since at every time step a state has to be defined, Q cκ,κ can be described by LetΓ A κ−1 andΓ B κ−1 ,B κ denote CPTP maps which act on registers of A and B respectively, and output qubit states, and let Γ A κ−1 and respectively. The above fact together with lemmas 6 and 10 implies that at time steps κ and κ − 1 Moreover, we have put ρ ψ c κ−1 ,κ−1 := Q c κ−1 ,κ−1 |ψ ψ| to denote a joint state of A and B at time step κ − 1. Observe that for all κ, Π κ = C κ • · · · • C 1 (|ψ ψ|) projects onto a pure state. Therefore, for all κ, the states at output registers κ − 1 for A, and κ for B, must be pure, be the joint state of A and B at time step κ − 1, and after applyingΓ A κ−1 on A. Using equation (E9) we have that, which is a pure state on A, and therefore any extension of this state has tensor product form across A and B, in particular, where σ B κ−1 is a state on B independent of ψ by corollary 3. Therefore the (maximum) average fidelity of the state on B's side is, This, together with equation (E6), implies that E cκ,κ is an exact sending channel at time step κ. Since the statement holds for all κ, the provers necessarily use the exact sending channel k times.

E.2. Completeness and soundness
Proof of theorem 3. Completeness. As stated in the main text, we assume that the quality of operations is quantified by average fidelity and that at every round j the quality of operations is the same, i.e. for all j, In the following we bound the average probability of success P in terms ofμ. Let us write P explicitly, From lemma 7 we have that Observe that (M T j ) twirl = dC j C † j • M T j • C j is a twirl of the operator M T j [28]. Furthermore, twirling any map is equivalent to the action of a depolarizing channel, i.e. (M T j ) twirl (ρ) = D j (ρ) = pρ + (1 − p) /2, for some parameter p and any state ρ. Using properties of the depolarizing channel we can write, Additionally, the average fidelity of a twirled map is equal to the average fidelity of the same map without a twirl [28], therefore, P = 1 By assumption, ∀jF M T j =μ, and If we demand that P t thenμ h −1 k (t).
Proof of theorem 4. Soundness. In the case when the nodes A and B are honest, the soundness statement is the converse of the completeness, see the proof above. Here we prove soundness of test 2 in the case when the nodes are dishonest (m-cheating). Just like before, we will assume that output for a fixed κ happens at node B. The idea behind this proof is that we bound the average probability of success of the provers when they use a quantum channel between them, and when they do not. More specifically, let ρ A out κ−1 be a state available at A's output at time step κ − 1 and ρ out B κ be a state available at B's at time step κ. We show that whenever the provers use the channel, the average fidelity between these two states is bounded by 1. However, whenever they do not use the channel, the average fidelity between these two states is at most as large as the average fidelity between the states at time step κ − 1, i.e. ρ A out . This average fidelity is intrinsically bounded by the approximate cloning theorem [18], and here takes value 5 6 . If the provers are m-cheating, they use the channel between at least m times. We prove that, as a consequence, their overall average probability of winning P is upper-bounded by 1 k (m + 5 6 (k − m)). When the provers are m-cheating they adapt an arbitrary strategy Q m c κ ,κ which depends on the maximum number of channel uses m between the nodes. It can also depend on all the information available throughout the protocol, i.e. the challenges and gates distributed by the verifier. We assume that the executions of the test are IID (independent and identically distributed) and the probability of winning a single execution i = 1, . . . , n is expressed as The configuration of channel uses, i.e. at which time step the provers use the channel between them, does not need to be fixed. At each execution, the provers can choose a particular strategy Q m,ν which describes a configuration ν of channel uses between the nodes. We assume that the provers are non-adaptive and throughout an execution i their strategy does not change. Therefore, the fact whether the provers choose to send the state or not, is independent of the information available throughout the protocol. I.e. ν is independent of κ and c κ , and we have q ν 0, ν q ν = 1, such that Note that there are k m such strategies. Furthermore, let us define p |ν,ψ, c κ ,κ := Tr Q m,ν c κ ,κ |ψ ψ| · Π κ .
Let us rewrite the average probability of success, equation (7), Now, we will move the summation in the second component of the sum over κ-instead of going through (1, 2, . . . , k − 1, k) we will set it to go (k, 1, 2, . . . , k − 1), Let us define p |ψ, c 0 ,0 := 1 for round κ = 0, which one can interpret as simply giving the state to node A and immediately requesting it back. Now since for κ = k it holds that p |ψ, c k ,k 1, we have p |ψ, c k ,k p |ψ, c 0 ,0 . Therefore, Now we write the expression as a single summation, The right-hand side of the above equation is bounded by 5 6 due to the approximate cloning theorem [18]. Therefore, we have There are at most k − m time steps κ = 1, . . . , k such that the channel E is not sending. Therefore, using equations (E31) and (E38) we can write (E29) By 2-design properties of the Clifford set, lemma 7, we have that , with n κ being a total number of executions for a fixed κ, we have that E[R κ ] X,Cliff =F κ (T κ ).

F.3. Proof of corollary 2
Next, we prove that the consistency check, theorem 5, is satisfied with a certain probability, determined by the estimates on the performance of individual devices.
Proof of corollary 2. The probability that the bound (22) is satisfied is equal to probability that all the individual bounds are satisfied, i.e. Pr where in the last line we used Hoeffding inequality, equations (12) and (13). Hence, we can write that

F.4. Proof of theorem 6
Here we prove our bound on the performance of k-round protocols in terms of winning rate R κ in test 2. The core of this theorem is the following lemma, which relates the diamond distance between the ideal and real implementation of a k-round protocol, and the double-average fidelity.
Lemma 11. The performance of a k-round protocol can be bounded by the double-averaged fidelity in the following way where d is the dimension of the underlying Hilbert space, and |Cliff| is a size of the Clifford group for dimension d.
Proof of lemma 11. To prove the inequality from lemma 11 one needs to show dependence between F g k (P k ) andF(P k ). In particular, to preserve the direction of inequality we want thatF g k (P k ) F (P k ). Firstly, we trivially have thatF On the other hand, F(P k ) = dG 1 · · · dG kF g k (P k ) (F15) = dC 1 · · · dC kF c k (P k ) (F16) where in lines (F16) and (F17) we used lemma 7, and in line (F18) we separated the minimum element out of the summation and in line (F19) we bounded each element under the sum by 1. Now let us relate the minimum over the Clifford group to a minimum over the whole unitary group. Note that the Clifford group rotated by any unitary U remains a Clifford group. Therefore, let us rotate every C j by a constant U j , j = 1, . . . , k, such that the minimum over the Clifford sets corresponds to the minimum over the whole unitary group. Let us write u k = U 1 , · · · , U k , F u k c min κ (P k ) = min g kF g k (P k ).
Now we will relate the double-average fidelity of a k-round protocol to the double-average fidelity of the test. Indeed, we will show that these quantities are equal. Lemma 12. Double-averaged fidelity of a k-round protocolP k of depth k is equal to double averaged fidelity of the testT κ of the same depth, κ = k,F (P k ) =F(T κ ) (F23) Proof. As stated in the main text, the proof of this lemma follows from noticing that the expression for double-averaged fidelity contains only polynomials of degree 2 in every Clifford gate C j . Therefore, here averaging over the Clifford group is equivalent to averaging over the entire unitary group, since the Clifford group forms a 2-design. Furthermore, the equality is possible, since we have put M T j ≡ M j • E j , and M T j encompasses operations associated with sending and storing the qubit.
The above two lemmas, combined with the Hoeffding bound on R κ , equation (21) complete the proof theorem 6.

Appendix G. Q-qubit protocols
In this section we provide a description of a Q-qubit extension of our class of protocols. The structure of our description is exactly the same as the one from section 3.2 with the difference that all the operations are carried out on more than one qubit.
In a Q-qubit k-round protocol nodes have a total of Q qubits available. At each round j = 1, . . . , k of the protocol nodes A and B can send any subset of the local qubits to one another. We denote all of the sending operations in round j by E j . Moreover, the nodes can store local qubits in the quantum memory and apply local gates. We denote these operations by M (q j ) A j and G (q j ) A j for node A, and G (Q−q j ) B j and M (Q−q j ) B j for node B. Here q j and Q − q j denote the number of local qubits at A or B's side at round j, respectively, after the sending operation E j . Therefore, we describe a Q-qubit k-round protocol can with a map In the presence of noise, we assume the following noise model for Q-qubit k-round protocols: • The noise on gates is independent of the applied gate; • The noise from memories, gates and transmission channels acts individually on each qubit; • For each round j, the qubits are submitted to the same kind of noise on node A and node B (noise can differ from round to round). Formally, we assume the followingM (q j ) A bipartite Q-qubit, k-round protocol between any two nodes A and B consists of the following operations: ). As stated before, we assume that the measurement can be performed perfectly.
Steps (b)-(d) are performed in rounds j = 1, · · · , k P a total of k times. We denote memories and gates that are used by A and B at a jth round by M respectively. Such a protocol operates on a total number of Q qubits. Note that we model noise map as a product for each of Q qubits.
Definition 15 (k-round protocols). Let H ⊗Q be the Hilbert space of a two-partite quantum network. We define a k-round protocol as a CPTP map of the form Π (Q) •P k,(Q) • Prep (Q) , where: • Prep (Q) corresponds to preparation of Q local qubits |ψ A ∈ D(H ) (step (a)). •P k,(Q) is a map describing k rounds of local operations-memories and gates, as well as sending qubits from A to B (steps (b)-(d)), • Π (q) is a local measurement of all the local qubits. (step (e)) Now let us describe a test that certifies the above functionality. It is a straightforward extension of the ping-pong test we have discussed before. The idea of the Q-qubit teleportation-based ping-pong test is instead of teleporting a single qubit, to teleport all Q qubits back and forth between nodes A and B and sample a random Q-qubit Clifford gate (Cliff(2 Q )) at line 3: of test 2. The initial state of Q qubits |ψ ψ| (Q) is chosen uniformly at random from a 2-design of Q-qubit states. In this case the test can be described with a map T κ,(Q) = κ j=1 C (Q) • N ⊗Q j . Based on the average fidelity estimate of this test r(T κ,(Q) ) one can, again, check whether memories and gates were used together by satisfying an analog of the bound (22). This can be done provided one has access to estimates of quality of memoriesF(M (Q) j ) = dψ (Q) Tr (M T j ) ⊗Q (|ψ ψ| (Q) ) · |ψ ψ| (Q) and gates F(N (Q) j ) = dψ (Q) Tr N ⊗Q j (|ψ ψ| (Q) ) · |ψ ψ| (Q) , for all j. Note that here we necessarily use the fidelity of M (Q) j and N (Q) j evaluated on the space of all Q-qubit states. Now we can extend theorem 6 onto Q-qubit protocols using the noise assumptions on the class of protocols. We arrive at the following statement.
Theorem 7 (Bounding the behavior of Q − qubit k − round protocols). Given the noise model is the same for all Q qubits at each round j, the performance of any Q-qubit Q-qubit k-round protocol, can be bounded in terms of an estimate for the double-averaged fidelity R(T κ,(Q) ) of the Q-qubit test in the following way where d = 2 q is the dimension of the underlying Hilbert space, and |Cliff(d)| is a size of the Clifford group for dimension d.
The proof of that statement is analogous to the single-qubit case, with the difference that here one uses the properties of the unitary 2-design given by the Clifford group of dimension 2 Q .