Guaranteed recovery of quantum processes from few measurements

Quantum process tomography is the task of reconstructing unknown quantum channels from measured data. In this work, we introduce compressed sensing-based methods that facilitate the reconstruction of quantum channels of low Kraus rank. Our main contribution is the analysis of a natural measurement model for this task: We assume that data is obtained by sending pure states into the channel and measuring expectation values on the output. Neither ancilla systems nor coherent operations across multiple channel uses are required. Most previous results on compressed process reconstruction reduce the problem to quantum state tomography on the channel's Choi matrix. While this ansatz yields recovery guarantees from an essentially minimal number of measurements, physical implementations of such schemes would typically involve ancilla systems. A priori, it is unclear whether a measurement model tailored directly to quantum process tomography might require more measurements. We establish that this is not the case. Technically, we prove recovery guarantees for three different reconstruction algorithms. The reconstructions are based on a trace, diamond, and $\ell_2$-norm minimization, respectively. Our recovery guarantees are uniform in the sense that with one random choice of measurement settings all quantum channels can be recovered equally well. Moreover, stability against arbitrary measurement noise and robustness against violations of the low-rank assumption is guaranteed. Numerical studies demonstrate the feasibility of the approach.


I. INTRODUCTION
Recent years have seen significant advances in the precise control of quantum systems. Complex quantum states of systems with an increasing number of degrees of freedom can be prepared and manipulated with high accuracy. In this development, it is important to have tools at hand that allow one to completely characterize which state or process is actually realized in a given experimental setup. The task of reconstructing quantum states or process [1,2] from experimental data is variously called quantum state or quantum process tomography, estimation, or recovery.
The precise characterization of processes is highly relevant, e.g., in the quest for scalable quantum computers. The stringent requirements of the error correction threshold -and the adverse scaling of the error correction overhead in terms of the noise -make it necessary that implementations of quantum gates match their specification extremely closely. We note that full quantum process tomography is distinct from certification protocols or coarser characterization schemes like randomized benchmarking (which reports only a single number: a certain error rate). While this makes the latter type of protocols much cheaper to implement, only process tomography allows one to understand in precisely which way a quantum gate deviates from its specification.
The task of quantum process tomography -important as it is -comes at a high price: This is an unfavourable scaling of the necessary effort with the system size. To learn an unknown unstructured process acting on an n-dimensional quantum systems, m ∼ n 4 expectation values are required. However, common processes exhibit additional structure. Most importantly, quantum gates correspond to unitary processes, which are quantum channels with Kraus rank equal to one. Compressed sensing techniques allow one to estimate channels with (approximately) low Kraus rank from significantly fewer measurements than naively required. It is imperative to make use of this structure.

A. Motivation of our measurement model
The most general quantum process tomography setting allows "coherent measurements" of an unknown channel T . Here, m identical copies of T are available simultaneously. At the same time, only part of the input state may be sent through copies of T , while another part potentially entangled with the former is left unchanged. This in turn allows for performing measurements on T ⊗m ⊗ id n -where id n denotes the identity map on a n-dimensional additional ancillary Hilbert space that may be available -rather than "sequential" measurements on single copies of T . This includes the possibility of choosing global, entangled input states ρ global , and likewise performing correlated measurements on the respective output states (T ⊗m ⊗ id n )(ρ global ).
While potentially powerful, coherent measurements are arguably impractical. This is in particular true when attempting to diagnose errors in implementations of a quantum gates, as these are exactly the building blocks of circuits that would be used to prepare the entangled input state in the first place.
Avoiding this drawback, we adopt an experimentally feasible "sequential" measurement model. What is more, we resort to the situation where the local input states ρ loc are transmitted via individual copies of the unknown channel T in their entirety. This setting is often referred to as "ancilla-free" or "direct" process tomography. Data acquisition is then concluded by performing measurements on the individual output states T (ρ loc ).
We employ the following natural measurement setup: the unknown quantum channel T is applied to pure input states |ψ i ψ i |. Subsequently, the expectation value of an observable A i on the output is measured. Those expectation values are estimated by means of a suitably frequent repetition of the prescription choosing the same measurement setting. Repeating this procedure m times for different input states |ψ i ψ i | and observables A i results in a measurement vector of the form (1) For our theoretical recovery guarantees, the input states and observables are assumed to be sampled independently from certain ensembles -see below for details. We restrict attention to pure input states, as they are expected to yield the most information about T , and also because the preparation of pure input states is easily possible in most experimental setups. We expect it to be straight-forward to generalize this work to take mixed input states into account. Note that while this is the common way ancilla-free process tomography is considered, the final measurement process could potentially be modeled in more detail. In a typical experiment, one does not only learn the expectation value of an observable A, but rather acquires statistics about the POVM associated with its spectral decomposition. Here, we disregard this finer-grained information. We have made this decision to avoid technical complications: The various POVM elements associated with any given setting are clearly not independent -however our proof techniques work best for independently drawn measurements. There are now related theoretical recovery guarantees that can handle some form of dependency -e.g. Refs. [3][4][5][6][7][8]. Applying such techniques to process tomography remains an interesting open problem.

B. Our contribution
We investigate several channel reconstruction protocols that exploit the measured channel to have an (approximately) low Kraus rank. Inspired by compressed sensing, these protocols require considerably fewer measurements of the form (1) than traditional process tomography schemes and contain computationally tractable reconstruction algorithms. We provide rigorous recovery guarantees featuring very desirable properties (uniform, stable, and robust). This analytical work is complemented by a comprehensive numerical analysis of the reconstruction protocols. In contrast to previous compressed sensing results that are applicable to process tomography, our reconstruction protocols specifically address the natural process tomography setup described above. Our focus is on a particularly simple reconstruction: i) A least squares fit of the observed data subject to an additional positivity constraint. We call this approach CPT-fit. A distinct advantage of the CPT-fit is that the algorithm does not depend on any parameters. In particular, no estimate of the true rank or the noise strength is required. This stands in contrast to most compressed sensing-based based reconstruction schemes. Also, the CPT-fit appears to be particularly robust against violations of the assumption of low Kraus rank. In order to link and compare the CPT-fit to more established low-rank matrix reconstruction methods we also investigate other reconstruction protocols. Their analytical analysis is also the basis for the recovery guarantees of the CPT-fit.
ii) The first reconstruction method corresponds resembles a typical low-rank matrix reconstruction protocol: minimize the trace norm subject to convex constraints that take into account acquired data of the form (1). iii) The second recovery algorithm closely resembles the first one, but contains trace preservation as an additional (linear) constraint. iv) Otherwise similar to algorithm iii), this reconstruction protocol replaces the constrained trace norm minimization by a diamond norm minimization. The constraints remain unchanged. The diamond norm is a wellmotivated distance measure for quantum channels. A priori, this does not imply in any way that the diamond norm is also a good choice as a regularizing function for channel estimation. However, Ref. [9] does provide strong analytical and numerical arguments for why one should expect the diamond norm to outperform the trace norm as a regularizer in this setting. The numerical studies conducted in this paper lend further credence to this claim. v) Similar to algorithm iv), but with an additional trace preservation constraint. We provide rigorous performance guarantees for the approaches i), iii) and v). The guarantees are valid for measurement setups where the pure input states |ψ i and the eigenbases of the observables A i are drawn at random from certain ensembles. Technically, we require the state vectors and the eigenbases to form complex 4-designs in a precise sense. Haar-random vectors have this property, but recent results on the fourth moments of the Clifford group [10,11] suggest that there are more structured and explicit ensembles with the same property.
Even if a particular practical setup fails to show the 4design property, our results are still a relevant proof of concept. Indeed, in contrast to previous compressed sensing approaches, our reconstruction guarantees show that the specific mathematical structure present in process tomography does not imply that a larger number of measurement settings is required.
What is more, we provide numerical simulations showing that the recovery algorithms work well for a number of measurement models not having the 4-design property. This includes the paradigmatic case of Pauli-type measurements.
Regarding test processes, we consider generic channels with varying Kraus rank, as well as the Toffoli gate. This is a three-qubit unitary quantum gate that is highly relevant in quantum processing. Moreover, it has been experimentally implemented in various architectures [12][13][14][15].

C. Related work
Recent years have seen considerable advance of compressed sensing tools for the tomographic reconstruction of low-rank quantum states [16][17][18][19][20]. To some extent, these findings have also been adapted to cover process tomography. Such compressed sensing approaches allow for reconstructing processes with low Kraus rank from much fewer expectation values than a naive dimension count would suggest.
Our trace norm minimization (algorithms ii) and iii)) builds on a by now well-established methods for low-rank matrix reconstruction [16]. With the diamond norm minimization (algorithms iv) and v)) we continue a line of research that has been started in Ref. [9] and support it with rigorous performance guarantees.
The CPT-fit has been suggested in Ref. [18]. This work also contains numerical studies that demonstrate the capability of such a reconstruction procedure. We expand on these ideas by conducting additional numerical experiments. But, more importantly, we also prove rigorous recovery guarantees for the CPT-fit.
Ref. [16] presents a process tomography protocol that is based on low-rank matrix reconstruction with random Pauli measurements [21][22][23]. These low rank recovery guarantees can be applied to the Choi-Jamiołkowski representation of a quantum channel, since the rank of this matrix representation equals the Kraus rank of the original channel. On first sight, such an approach requires the use of an ancilla in order to implement the Choi-Jamiołkowski representation physically in a concrete application [24]. However, Ref. [16] also provides a more direct implementation of their protocol that does not require any ancillas. Valid for multi-qubit processes this trick exploits the tensor-product structure of (multi-qubit) Pauli operators. It allows for effectively measuring the Pauli expectation value of a Choi matrix by performing several natural channel measurements of the form (1) and evaluating particular linear combinations thereof in a classical post-processing step. The demerit of this approach is that the number of individual channel measurements required to evaluate a single Pauli expectation value scales with the dimension of the underlying Hilbert space. The measurement model studied here does not require such a coarse-graining: every natural measurement (1) itself already corresponds to a valid measurement instance. This considerably reduces the number of different measurement settings that are required in order to acquire sufficient data.
Randomized benchmarking has also been adapted to allow for process tomography [19]. Again, representing channels by their Choi matrix is a key intermediate step. Ref. [19] adopted randomized benchmarking techniques to obtain a process tomography protocol that is robust towards state preparation and measurement errors. This is a distinct advantage over other protocols that do not share this additional feature. However, it comes at a prize: the analytical performance guarantees in Ref. [19] require an additional "non-spikiness condition" on the channels that are to be re-constructed. This restriction also features in the algorithmic reconstruction as linear constraints. The total number of these additional constraints depends on the concrete measurement ensemble and is lowerbounded by the 4-th power of the underlying Hilbert space dimension, which affects the computational overhead of the reconstruction.
Finally, Ref. [17] also considered process tomography via compressed sensing techniques. However, this method is somewhat different from the other approaches presented here: Instead of assuming low Kraus rank, they consider processes that are element-wise sparse with respect to a known basis. Prior knowledge of this sparsifying basis is a necessary prerequisite for this approach. In turn, techniques from compressed sensing [25,26] are applied (rather than low-rank matrix reconstruction protocols) to reconstruct such processes from a number of measurements that is proportional to the sparsity and depends only logarithmically on the ambient system sizes. While requiring only very few measurement settings, the main disadvantage of this approach are stronger model assumptions (sparsity and knowledge of the sparsifying basis) and measurements that may be challenging to implement in practice.

D. Experimental considerations
In several physically important platforms, process tomography has been experimentally realized [27][28][29], both in the direct and hence ancilla-free [27,29] and ancilla-based [28] reading of the task. These works make use of different preparations and measurements. It should be clear, however, that in several physical architectures, random measurements of the type discussed here can readily experimentally be implemented.
Specifically, Haar random unitary maps have been realized in a 16-dimensional Hilbert space associated with the 6S 1/2 ground state of 133 Cs atoms [30]. This has been achieved by suitably exploiting a time-dependent Hamiltonian evolution giving rise to Haar random unitaries, making use of methods of quantum control. Such random unitaries have been put to use in a tomographic protocol, in an approach of performing quantum tomography based on Haar random unitaries that builds upon earlier theoretical ideas laid out in Ref. [31]. The same idea of generating Haar random unitaries using suitable time-dependent Hamiltonians should readily apply to systems of trapped ions [32], where a suitable type of control can be achieved with present technology. Indeed, in a different context, methods of optimal control have readily been applied to optimize quantum gates for trapped ions [33].
Random unitaries, albeit not Haar distributed, that allow for quantum state tomography of quantum many-body systems have been theoretically considered [34], making use of operations only that can be considered basically feasible in Figure 1. The Choi-Jamiołkowski isomorphism and partial trace in terms of tensor network diagrams (explained in Figure 9). Left: Order-4 tensor M ∈ L(L(V) → L(W)) as a map from experiments with cold atoms in optical lattices [35], such as optical super-lattices and time-of-flight measurements.
In integrated linear optical architectures [36,37], Haarrandom circuits are readily conceivable [38]. It is this type of setting in which the methods presented here are most applicable, even though they apply to any physical system in which unitary 4-designs are feasible. This includes settings in which random circuits [39,40] or randomly time-fluctuating dynamics [41], as will be discussed in more detail in the outlook.

E. Notation and terminology
In this section we introduce some basic preliminaries needed to understand our main results. For any integer n ∈ Z + we use the notation [n] := {1, 2, . . . , n}. The space of linear operators on a vector space V is denoted by L(V).
If V is a Hilbert space, we denote vectors by |ψ ∈ V and its adjoints (conjugate transposes) by ψ |. So the projector onto a normalized |ψ is |ψ ψ |. Moreover, the real vector space of self-adjoint operators is denoted by Herm(V).

Maps on operators
We denote the space of linear maps taking operators to operators by L(C n ) := L(L(C n )). The Choi-Jamiołkowski isomorphism [42,43] J : L(C n ) → L(C n ⊗ C n ) takes such maps to operators on a tensor product space and is given by (see also Figure 1 for a tensor network description) for all A ∈ L(C n ) or, equivalently, if Tr 1 [J(M )] = 1, where Tr 1 : L(C n ⊗ C n ) → L(C n ) denotes the partial trace over the fist tensor factor. The affine subspace of hermiticity and trace preserving maps is denoted by HT(C n ) ⊂ L(C n ). The identity map id k ∈ L(C k ) is a particularly simple element of HT(C n ). Moreover, M ∈ L(C n ) is completely positive if for all k ≥ 1 one has that (M ⊗ id k )(A) is positive semidefinite for every positive semidefinite operator A ∈ L(C n ⊗ C k ). We will use the shorthand notation A 0 to indicate positive-semidefiniteness of A. In fact, a map M is completely positive if and only if (M ⊗id n )(A) 0 (k = n). This in turn is equivalent to J(M ) 0.
The convex subset of completely positive and trace preserving (CPT) maps is denoted by CPT(C n ) ⊂ L(C n ). Importantly, these maps take density matrices to density matrices, even when applied to subsystems. Therefore, they are also called quantum channels and are a very general description of quantum processes, e.g. dynamics.

Norms
and the ∞ -norm is v ∞ := max j∈[n] |v j |. The Schatten q-norm A q of an operator A corresponds to the q -norm of A's singular values arranged in an n-dimensional vector.
Here, we will specifically encounter the following Schatten norms, Note that rank-one projectors |ψ ψ | are unit normalized with respect to any Schatten-p norm. The spectral norm is also an example of an induced norm. This concept can be generalized to norms on maps M ∈ L(C n ), and the induced trace norm amounts to Finally, the diamond norm is a stabilized version of the induced trace norm: This norm is a particularly meaningful and widely appreciated distance measure for quantum processes [44]. Perhaps surprisingly, it can be calculated efficiently [45][46][47] as a semidefinite program. Regarding the reconstruction of structured maps, the diamond norm can serve as a convex surrogate for low Kraus rank [9]. This is the case for channel reconstruction and two of the five reconstruction protocols presented in this work build on this idea. Figure 2. The tensorial structure of one measurement setting: T is mapped to Tr U A0U † T ( |ψ ψ |) .

Spherical and unitary designs
The probability measures on the unitary group, respectively, the unit complex sphere that is invariant under multiplication with any unitary is called Haar measure and is the natural uniform measure on these sets. A complex projective t-design [48][49][50] is a probability distribution µ on the unit sphere that reproduces the moments of the Haar measure up to order t + t, Similarly, a unitary design [51,52] is a probability distribution ν on the unitary group satisfying for all operators X ∈ L(C n ) ⊗t . Typically, designs are considered to be uniform distributions over finite sets.

Measurement terminology
In the compressed sensing literature, a single measurement is usually given by an inner product, which is a Hilbert-Schmidt inner product Tr[AT ( |ψ ψ |)] in our case. In quantum physics terminology, a (ψ, A) would correspond to a measurement setting, whereas a corresponding measurement would be the von-Neumann measurement of the observable A in the state T ( |ψ ψ |) giving rise to an expectation value Tr[AT ( |ψ ψ |)] in the limit of infinitely many von-Neumann measurements. In order to also account for finite sample errors we always allow for additive noise on the obtained expectation values Tr[AT ( |ψ ψ |)].

II. RESULTS
In this section, we explicitly specify our measurement model of natural measurements, explain how we computationally reconstruct the quantum channels from the measurements, state our recovery guarantees, discuss their stability and robustness properties, discuss our numerics on Pauli-type measurements, and derive a sample complexity upper bound from our recovery guarantees.

A. Measurement model
We consider the task of reconstructing a quantum channel T ∈ CPT(C n ) from measurement data of the form (1): The unknown channel receives pure states |ψ i ψ i | as input and subsequently expectation values of observables A i are measured for the output state; see Figure 2 for a pictorial description. This procedure is repeated m times with different measurement settings (input states and observables). Hence, the entire measurement process leads to a measurement vector with single expectation values The vector e ∈ R m denotes additive noise present in the measurement process. In contrast to previous approaches [16,23], no prior assumptions on the nature of this noise corruption are required. Throughout this work, we consider instances of random that are independent instances of a measurement ensemble (A, |ψ ) that meets the following requirements: Definition 1 (4-generic measurement ensemble). We call a measurement ensemble (A, |ψ ) with observable A and state |ψ 4-generic if it fulfils the following criteria: i) The distribution of |ψ in C n is a spherical 4-design (9), i.e., it reproduces the first four moments of the unitarily invariant (Haar) measure on the complex unit sphere. Corresponding expectation values Tr[AT ( |ψ ψ |)] of a quantum channel T ∈ CPT(C n ) are referred to as 4-generic measurement and normalized 4-generic measurement, respectively.
This definition encapsulates a variety of process measurement ensembles. In particular, the generic measurement ensemble ( |ψ and U are Haar random states and unitaries, respectively) meets the requirements by definition. However, our recovery guarantees do not require such a strong notion of randomness in the measurement design: 4-designs are sufficient.
We also expect that the actual channel recovery works similarly well if |ψ and U are chosen from other distributions as well, provided that they cover the complex unit sphere and U (n) "evenly" enough (see Section IV for a further discussion). We confirm this expectation with numerical simulations, see Section II E.

B. Reconstructions
In this section, we lay out how a quantum channel T can be reconstructed from data of the form (11) for measurement settings given by A. We put an emphasis on describing the fitting method T 2 referred to as CPT-fit. To complement this approach, we also investigate the reconstruction methods T * η and T * c η as versions of low-rank matrix reconstruction as well as T η and T c η as reconstruction methods based on the diamond norm.
To start with the former, we minimize the square loss under the model constrained that T is a quantum process to obtain This minimization is essentially a fit under a CPT-constraint and we will call it CPT-fit. It has been suggested and numerically investigated in Ref. [18]. This approach makes use of the measurement data y alone.
More common low-rank matrix reconstruction methods in compressed sensing use the trace norm as a so-called regularizer to favour low-rank solutions [22,[53][54][55]. These reconstructions do not only make use of the data y, but require an a-priori bound η > 0 on the noise e, Applied to the Choi matrix J(T ) of a quantum channel T the usual trace norm regularization leads to the reconstructions These two approaches are very similar. However, the second one contains an additional constraint that enforces hermicity and trace preservation. The diamond norm is well-known in quantum information theory as a measure of distinguishability of quantum channels by expectation values [56,Chapter 11], [57] and can practically be calculated and minimized using a semidefinite program [47]. For non-obvious reasons, it can also be used as a reguralizer for the reconstruction of certain maps on operators [9], leading to the reconstructions T c η := arg min{ T : where η ≥ 0 is again an anticipated bound on the measurement noise. Superior reconstruction properties over simple inversion methods are expected for channels with small Kraus rank with unitary channels being the optimal case. Perhaps surprisingly at first sight, the CPT-fit performs equally well as the other reconstruction methods based on the trace (15), (16) or diamond norm (17), (18) minimization. However, a quantum channel T has a Choi matrix J(T ) with constant trace norm J(T ) 1 = n. Hence, one can omit the trace norm minimization in the the program (16) when the constraint J(T ) 0 is added to that one has T ∈ CPT(C n ). Minimizing the square loss A(T ) − y 2 instead yields the CPT-fit (13).   (15), (16), (18) and (18) is chosen to be ten times machine precision.
We mention once more that existing recovery guarantees for low-rank matrix reconstruction [4,22,23,54] do not apply to the measurement model considered here. Our measurement setting (12) carries a certain product structure (see Figure 2) that is more restrictive than the structure covered in prior works.

C. Noiseless case
For sake of clarity, we first discuss the noiseless case (e = 0) and will discuss stability and robustness in Section II D.

Recovery guarantees I
We have recovery guarantees for the minimizations (16), (18), and (13), which have a trace preservation as a constraint.
Theorem 2 (Uniform recovery guarantees (noiseless case)). Fix r ≤ n and suppose that A : 4-generic measurements which are chosen independently from the measurement ensemble introduced in Definition 1. Then, with probability at least 1 − e −λm , all T 0 ∈ CPT(C n ) with Kraus rank at most r may be reconstructed exactly from noiseless measurements y = A(T 0 ). This exact reconstruction is achieved by each of the minimizations (16), (18), or (13).
Here, C and λ are universal constants (which can, in principle, be extracted explicitly from the proof).
This statement follows from more general results, namely Theorems 20, 21, and 25 below. Stability and robustness are discussed with Theorems 3 and 4.

Remarks :
i) The number of real parameters required to specify a hermiticity preserving map T 0 ∈ L(C n ) is n 4 . If said map is also trace preserving, then n 4 − n 2 many parameters are required. The same holds for T 0 ∈ CPT(C n ). If the Kraus rank of T 0 is rank(J(T 0 )) = r then T 0 is compressible in the sense that only a number of parameter scaling as rn 2 is required and also sufficient to describe T 0 . Therefore, up to the factor ln(n) the scaling (19) of the number of measurements is optimal. ii) The scaling (19) also contains a constant C that we have not bounded explicitly. Numerically, we find for n = 4, 8 that m ≥ 5 r n 2 generic (Haar-random) measurements are sufficient for recovery, see Sections II C 2 and II D 2. Such an approach is very common in compressed sensing: Recovery guarantees, often with unknown constants and logarithmic factors in the scaling of the number of measurements, ensure the functioning of the reconstruction procedure also in the limit of large dimensions. The precise number of measurements and the expected errors in a given setting are often determined numerically. Here, also more special measurement settings can practically be considered. In Section II E, we numerically investigate Pauli-type measurements, which are not covered by our recovery guarantees.

Numerical demonstration
Our variational reconstructions can be recast as standard convex optimization problems (see also Appendix A). These can be solved computationally efficiently and also practically using standard software such as CVX [58,59]. For the reconstructions (16), (17), and (13) that have trace preservation as a constraint we have proven that a number of m ≥ m 0 := C r n 2 ln(n) (20) measurement settings is sufficient to recover quantum channels of Kraus rank r. We expect similar reconstruction properties for the unconstrained trace and diamond norm reconstructions (15) and (17), although our proves do not directly carry over to that case. Indeed, for many situations low-rank matrices can be recovered via trace norm minimization from a number of measurements with such an essentially optimal scaling [20,22,23,54,60]. Different reconstruction procedures with the optimal scaling (20) often have a different constant C. For instance, Ref. [61] shows that a constant of size C 6 provably suffices for Gaussian measurement matrices (even without the ln(n) factor). On the other hand, the results presented in Refs. [22,23] are valid for more structured Pauli measurements and require a much larger constant. Numerical studies typically highlight a similar behaviour.
Exploiting additional prior information -such as the fact that quantum channels preserve both hermicity and traces in the algorithmic reconstructions (16), (18) -can only lead to an improvement. In fact, these constraints also facilitate the mathematical analysis. Besides being able to prove recovery guarantees, we also observe the benefit of such additional constraints numerically: Figure 3(a) shows the recovery rates of different approaches.
This recovery rate is determined as follows. We say that a channel T 0 is successfully reconstructed if the reconstructed channel T rec is close to the original one: T rec − T 0 F ≤ 10 −5 . We have chosen 10 −5 as threshold because we have observed the reconstruction errors to be typically well separated from this value. Varying it changes the curves in the plot only slightly. The precision of the convex optimization software CVX [58,59] in terms of the machine precision eps = 2 −52 ≈ 2.2 · 10 −16 is √ eps ≈ 1.4 · 10 −8 . This somewhat limits the choice of thresholds. In order to obtain the rate shown in the plots, we run 100 independent instances and obtain the rate as the percentage of trials with a successful reconstruction.
Our numerical studies are based on generic measurement ensembles (Haar random input states, Haar random unitaries) and we have chosen A 0 to be a diagonal n × n matrix whose spectrum evenly covers the interval [−1, 1].
Interestingly, the unconstrained diamond norm minimization (17) performs almost as good as its constrained counterpart (18). The heuristic reason for this behaviour is that the diamond norm "favours" maps that satisfy a certain trace preservation condition [9,62], which is fulfilled for CPT maps. Moreover, the so-called descent cone at CPT-maps of the diamond norm is contained in an intersection of descent cones of nuclear norms [9]. If this containment is strict, it also leads to an improved recovery recovery guarantee.
Arguably the simplest reconstruction procedure is the fit under the CPT constraint (13). Here, we observe that this protocol achieves a rate that is very similar to the other constrained procedures (16), (18). Moreover, the CPT-fit (13) clearly has the fastest computation time in our simple implementation in CVX [58,59] Figure 4. Retrieval of random quantum channels T0 ∈ CPT(C 4 ) of different Kraus ranks r = rank(J(T0)) from the CPT-fit (13). The white region corresponds to 100% observed recovery and the black one to 0%. Parameters and measurements are as in Figure 3.
linearly on the Kraus rank r of the channels to be reconstructed. This dependence is confirmed for small r in our numerics for Hilbert space dimension n = 4, see Figure 4. For dimension n = 4 the Kraus rank is r ≤ n 2 = 16 and for r = 16 the number m = dim R (HT(C n )) = n 4 −n 2 = 252 of measurement values are required for the reconstruction. Here, dim R (HT(C n )) denotes the real dimension of the affine space HT(C n ). This observation explains the non-linear behaviour for larger r.

D. Stability and robustness
We consider two types of errors: i) the measurement errors e already indicated in the measurement model (12) and ii) model mismatches capturing violations of the assumption rank(J(T 0 )) ≤ r on the Kraus rank.
For this purpose, we write T ∈ CPT(C n ) as T = T |r +T ¬r where T |r is defined to be a CPT map of Kraus rank r encompassing the r largest eigenvalues [63] of the Choi-matrix J(T ) and T ¬r = T − T |r is a trace-annihilating map (i.e., Tr[T ¬r (X)] = 0 for all X ∈ L(C n )) that is called the model mismatch. In turn we relax our model assumption of low Kraus rank. In the interesting cases, the model mismatch is small.
A reconstruction is called stable if it tolerates measurement errors and it is called robust if it tolerates model mismatches.

Recovery guarantees II
We prove all reconstructions from Theorem 2 to be stable against measurement noise. Moreover, we prove the trace norm minimization (16) and the CPT-fit (13) also to be robust against model mismatches.
Theorem 3 (Stability of the diamond norm reconstruction (18)). Consider normalized 4-generic measurements from Definition 1. Then, under the hypotheses of Theorem 2, reconstruction via the constrained diamond norm minimization (18) is stable towards additive noise in the measurements y = A(T 0 ) + e: for any η ≥ e 2 , the associated reconstruction error obeys The constants C, λ, andc only depend on each other.
A more general version of this theorem -which allows for quantifying the noise strength by any q -norm -is also true; see Corollary 25 below.
Theorem 4 (Stability and robustness, trace norm minimization (16) and CPT-fit (13)). Under the hypotheses of Theorem 2, reconstruction via constrained trace norm minimization (16) is both stable towards additive noise corruption and robust with respect to relaxing the model assumption of Kraus rank r: This bound is valid for any e 2 ≤ η. Similarly, the CPT-fit (13) has an error bounded as (23) Once more, the constants C, λ, andc again only depend on each other.
This statement summarizes two results -Theorems 20 and 21 below -that are slightly more general: they allow for quantifying the reconstruction errors (22) and (23) by any Schatten-p norm with p ∈ [1, 2]. Moreover, the noise strength may be characterized by any q -norm: η ≥ e q .
Corollary 5 (Normalization of the measurements). Similar statements as in Theorems 3 and 4 hold also in the case of 4generic measurements. In this case, in the bounds (21), (22), and (23), η needs to be replaced by η/ A 0 ∞ and A 0 2 by the 2-norm of the traceless part of A 0 .
Remarks : i) Theorem 4 guarantees approximate reconstructions for any quantum channel T 0 ∈ CPT(C n ) without a prior rank constraint. The deviation T 0¬r from a channel of low Kraus rank, i.e., the model mismatch enters the error bound linearly. Such a robustness is desirable for practical applications where the model assumption of low Kraus rank is typically only approximately true. ii) For the constrained norm minimizations (18) and (16) a prior error threshold is required. This additional model selection task is important in actual applications and can be a non-trivial task; see c.f., Ref. [64]. The CPT-fit (13) has the distinct advantage that the reconstruction can be done without such a prior noise estimation.
iii) We emphasize that the dimensional scaling of the error bounds in Eqs. (22), (23) is a consequence of the normalization we adopted. To make this precise, it is useful to view these statements as results about stably reconstructing (particular) Choi matrices J(T ) ∈ L(C n ⊗ C n ) L(C n 2 ). Rewriting the single expectation values (12) as reveals that all individual measurement matrices Choosing a particular normalization influences the signalto-noise ratio (SNR) in (24) and, in turn, the stability assertions. In order to avoid these ambiguities, it is useful to rewrite Eq. (22) for vanishing model mismatch as with d := dim C n 2 . Note that the sum scales like m. Up to our knowledge, all stable low-rank matrix reconstruction guarantees are essentially [65] [20,Theorem 2]. iv) Regarding sample complexity, an order of rn 5 ln(n)/ 2 independent channel evaluations ("samples") are required to achieve a reconstruction error of at most in Frobenius norm; see Section II F below. We expect this scaling to be close to optimal up to ln(n)-factors (at least for channels with very low Kraus rank r). Evidence for this is provided by the discussion above: the stability guarantees in Theorem 3 and Theorem 4 essentially match the best existing results on stable low-rank matrix reconstruction. Among these are two that are applicable to the related problem of quantum state tomography: random Pauli measurements [23] and outer products of standard Gaussian vectors [20]. The sample complexity of the former approach has been determined in Ref. [16]. Moreover, it has been shown to be close to optimal in the sense that it reproduces a fundamental lower bound -valid for any tomographic procedure based on Pauli measurements -up to a single ln(n)factor. Fundamental lower bounds on the sample complexity achievable by any tomographic procedure have been derived in Ref. [66]. Said work also determines the sample complexity associated with measuring outer products of standard Gaussian vectors. Interestingly, this sample complexity matches the fundamental lower bound up to a factor of r ln(n), where r denotes the rank of the density operator in question. Thus, state tomography via lowrank matrix recovery from outer products of Gaussian vectors is close to optimal, at least for states that have very low rank.
We expect that similar results are true for quantum process tomography via low-rank matrix reconstruction of Choi matrices. However, while conceptually similar, the results regarding sample complexities of state tomography procedures [16,66] are not directly applicable to process tomography. We intend to address such an extension in future work.

Numerical example: Reconstructing the Toffoli gate
The Toffoli gate is a three qubit gate that has drawn a lot of attention from theorists as well as experimentalists. It is universal for classical computation and, together with the Hadamard gate, also for quantum computation [67]. Moreover, it has played an important role in the theory of gate sets [68] and can help to significantly reduce the number of gates in quantum algorithms, such as in quantum error correction. It also has been implemented experimentally in nuclear magnetic resonance [12], linear optics [13], trapped ions [14], and in superconducting circuits [15]. Hence, the Toffoli gate is a good candidate to benchmark our process tomography schemes.
We demonstrate uniform recovery, i.e., we draw a fixed number of measurement settings at random and keep them fixed throughout the simulation in Figure 5. Then, we always use the first m of them for reconstructions with m settings.
Since reconstruction via the unconstrained trace norm minimization (15) is numerically inferior to the other reconstructions (Figure 3(a) and (b)) we will not investigate it in all simulations.
There are two types of error sources: (i) imperfect measurements give rise to measurement noise e ∈ R m in our model (12) and (ii) the implemented channel T 0 ∈ CPT(C n ) could have violated the model assumption of a low Kraus rank. Here, we confirm our analytic stability result towards both error sources numerically.
Our theorems put minimal assumptions on the potential noise corruption e ∈ R m . In particular, it does not need to follow a specific statistical model. For sake of our numerical analysis, however, we draw e i.i.d. from a zero-mean Gaussian distribution and scale it to a desired value. Such an error model frequently occurs in practice: When estimating expectation values from observed frequencies such an error model arises naturally in the limit of many measurements per setting.
In order to demonstrate robustness of our reconstructions, we set T 0 to be a convex combination of the Toffoli gate T Toff and a completely depolarizing channel T dep (ρ) = n −1 Tr(ρ)1, where λ ∈ [0, 1]. The depolarizing channel corresponds to a physically relevant error model that maximally violates our model assumption of low Kraus rank; see, e.g., Ref. [69,Chapter 8.3.4].
In order to determine a reasonable number of measurement settings m for the investigation of the noise dependencies e 2 and λ, we first run a uniform reconstruction with different m in the ideal case with λ = 0 and e = 0. Again, we simulate 100 trials with Haar random unitary channels. It turns out that already for m = 320 all reconstructions are successful with extremely high probability, see Figure 5(a). To put this number into perspective, note that for n = 8 a unitary channel is given by 63 real parameters and the full dimension is dim(CPT(C 8 )) = 4032.
We reconstruct T Toff from m = 320 noisy measurements with different values of e 2 without model mismatch (λ = 0), see Figure 5(b). For the trace and diamond norm reconstructions (16), (17), and (18) we set the error parameter η = e 2 + 10 eps, where eps = 2 −52 ≈ 2 · 10 −16 is the machine precision. As predicted by Theorems 3 and 4, the reconstruction error T rec − T 0 F scales linearly in the noise strength e 2 , see Figure 5(b). The CPT-fit (13) has the smallest reconstruction error, see Figure 5(b). If one further increases the number of measurements, then the reconstruction error T rec − T 0 F decreases further. This is guaranteed by Theorems 3 and 4, but not shown here.
In order to probe robustness against model mismatches, we test the reconstruction of T 0 defined in (27) for different values of λ ∈ [0, 1]. For the sake of clarity, we completely suppress additive noise (e = 0) and set the error threshold η = 10 eps. The results are presented in Figure 5(c) and confirm the robustness guaranteed by Theorem 4. The reconstruction error T rec − T 0 F depends roughly linearly on the noise strength e 2 . Here, the diamond norm minimizations (17) and (18), perform worse than the constrained trace norm minimization (16) and the CPT-fit (13).
In the case of measurement noise, the reconstruction error T rec − T 0 F approaches the optimal value e 2 in the limit of large m, see Figure 5(d) left plot. In the case of a model mismatch, the reconstruction error decreases below the mismatch parameter λ and vanishes if m approaches the full dimension of CPT(C n ), see Figure 6 on the right.
We find it worthwhile to share one interesting numerical observation on the unconstrained diamond norm reconstruction (17): The optimal value T rec of the reconstruction seems to decrease with the reconstruction error T rec − T 0 F , see the right column of Figure 5. Hence, this reconstruction procedure does not only yield good approximations to the measured channel T 0 , but its optimal value also provides some indication of the reconstruction error's size.
One can exploit this observation by using both the CPTfit (13) and the unconstrained diamond norm reconstruction (17). The CPT-fit is the fastest reconstruction procedure and yields the smallest error. Complementing this, the diamond norm minimization provides an indication of what the error might be. In the CPT-fit, the reconstructed map T rec is always a quantum channel, i.e., T rec ∈ CPT(C n ). In contrast, the solution of the diamond norm minimization can, in principle, be any map in L(C n ). The latter also holds true for the unconstrained trace norm reconstruction (15), but we could not observe a similar feature of its minimum value.  To,oli gate, m=320 To,oli gate, m=320 are randomly selected once. Subsequently, the same set of measurements is used in every test run. The parameter η is chosen to be ten times machine precision plus the noise strength e 2 . Left: Success rate of the reconstruction over the number of measurement settings m, where the rate is the number of trails out of 100 with small errors T rec − T 0 F ≤ 10 −5 , for reconstructed channels T rec . Right, (a)-(c): The optimal value T η from the minimization (17) over the reconstruction error T η − T 0 F achieved by the unconstrained diamond norm minimization (17). To,oli gate, kek`2 = 0.05 CPT fit Figure 6. Uniform recovery under imperfect conditions as in Figure 5 (b) and (c) for constant error parameters e 2 = 0.05 and λ = 0.05, respectively. The horizontal red line marks these values.
The error bounds from Theorem 4 suggest that observables with larger Frobenius norm have a better noise suppression, as A 0 F appears in the denominator in the error bounds (22) and (23). We also tested this behaviour numerically in order to demonstrate that it is not just a proof artifact, but an actual feature. We choose A 0 to have r A many non-zero eigenvalues, which we evenly distribute in the interval [−1, 1]. These A 0 have a Frobenius norm in the interval [1, 2]. Figure 7 shows the average reconstruction error of the Toffoli gate for nonuniform measurements in dependence of A 0 F and noise strength e 2 = 0.1. This numerical analysis demonstrates that the reconstruction error can indeed be reduced with increasing A 0 F .

E. Pauli measurements
Our recovery guarantees hold for 4-generic measurements. However, these measurements are challenging to implement in many experimental situations. So, how do our recovery schemes perform for more more restricted measurement ensembles? In this section we numerically investigate two practically relevant measurement scenario of Pauli-type measurements.
For quantum state tomography [16], Pauli measurements are proven to satisfy the so-called restricted isometry property for rank-r matrices in L(C d ) for a number of measurement settings scaling as r d log(d) 6 [23].
Motivated by this strong statement, we numerically investigate process measurements that are inspired by a Pauli setting. We denote the set of Pauli strings by P ⊂ U(2 L ), i.e., the set of operators P = σ (1) ⊗ σ (2) ⊗ · · · ⊗ σ (L) with Pauli matrices σ (i) ∈ {1, σ x , σ y , σ z } for i ∈ [L]. We write P ∼ P for a Pauli string that is drawn uniformly at random from P. Then we choose the measurements as where observables P j and input states |ψ j are i.i.d. selected as follows. Each P j ∼ P is a uniformly drawn Pauli string and each state vector |ψ j is a tensor product of uniformly i.i.d. drawn eigenvectors of random Pauli operators {σ x , σ y , σ z } (hence, an eigenvector of the corresponding Pauli string). Numerically, we observe that for random unitary quantum channels our reconstructions perform very similar for these Pauli measurements and the generic (Haar-random) measurements (not shown in the plots). However, for non-generic channels we observe that the two types of measurements lead to different reconstruction behaviours, see Figure 8: For the reconstruction of the Toffoli gate, more Pauli-typemeasurements than generic measurements are required in the case of unconstrained trace norm regularization (15). In contrast, fewer Pauli-measurements are required for the other regularizations.
We have also observed that reconstructions from Paulimeasurements have similar stability properties as the ones in the generic case (not shown in the plots).
The identity quantum channel T 0 = id is an extreme case in the sense that it is sparse in any basis and commutes with all transformations. For generic measurements we have observed that it has the same reconstruction behaviour as Haarrandom unitary T channels. In comparison, in case of Pauli measurements, the unconstrained trace norm reconstruction works worse and the other reconstructions better (not shown in the plots).

F. Sample complexity
The expectation values in our measurements need to be estimated from finite samples leading to a reconstruction error e. Assume we want to estimate the measured channel T 0 ∈ CPT(C n ) in Frobenius norm up to an error e 2 ≤ . Then the sample complexity is the scaling of the optimal number of measurements in the ideal setting, i.e., without model mismatch or measurement errors.
We use the Landau symbols O and Ω to denote the usual asymptotic upper and lower bounds, respectively. The Landau symbolÕ denotes the same scaling as O up to log-factors.

G. Applications to fault tolerant quantum computation
The techniques presented here also have applications for fault tolerant quantum computation. Threshold theorems [70][71][72] give a theoretical guarantee that that quantum computers can be built in principle if the noise strength are below some threshold value. The strength of the errors needs to be quantified in diamond norm distance which is not directly accessible. Instead one typically evaluates average error rates using direct fidelity estimation [73], or randomized benchmarking, see e.g. [74]. However, these two error measures can differ by orders of magnitude [75]. This, in particular, is the case for coherent error sources, such as unitary over/under-rotations [76], where the diamond distance is proportional to the square root of the average error rate. Thus, achieving fault-tolerance thresholds in the presence of unitary noise requires an exceedingly high error control (around 10 −8 for typical threshold levels of a few times 10 −4 ). In contrast, other noise sources typically imply a much more favourable (linear) relation between both error measures.
From a practical perspective, these results are encouraging: Unlike incoherent noise, coherent noise effects can typically be corrected. A necessary subroutine for achieving this goal is to accurately estimate these error channels. Our results considerably simplify this task, in particular for unitary errors which have Kraus rank one. They provide estimation techniques that require considerably fewer measurements than traditional process tomography protocols.

III. ANALYTICAL DETAILS AND PROOFS
Our analytical results follow from by now well-established mathematical proof techniques for low-rank matrix reconstruction [20,60,61]. The main technical contribution of this work is to extend these techniques to natural measurements whose structure deviates from less structured measurement matrices common in low-rank matrix reconstruction.
Starting point of our analysis is a geometric approach to low-rank matrix recovery presented in [61]. It relates the reconstruction error from a constrained trace norm minimization to a certain quantity associated with the measurement map A: the minimum conic singular value; see Definition 8 below. We bound this quantity by invoking Mendelson's small ball method [77,78] -a strong probabilistic tool that depends on certain concentration properties of the measurement ensemble. For 4-generic measurements these are derived using representation theory of the unitary group and general bounds on tensor network contractions besides probabilistic bounds commonly used on low-rank matrix reconstruction.
This geometric analysis results in a reconstruction guarantee for the constrained trace norm minimization (16) that is stable towards additive noise corruption. In turn, the geometric arguments provided in Ref. [9] suggest a strengthening of the obtained error bounds, if one replaces the trace norm by the diamond norm. Theorem 3 -or, more generally: Theorem 24 and Corollary 25 -are consequences of such an approach.
Our second main result -Theorem 4 -assures stability towards noise corruption as well as robustness with respect to violating the model assumption of low Kraus rank. This further improvement is achieved via establishing a robust version of the null space property for 4-generic measurements. We refer to Section III A 4 for a brief introduction. The technical prerequisites for such an approach are largely identical to the ones associated with the more geometric framework outlined above. Mendelson's small ball method, in particular, is again applicable. Hence, relatively little additional effort is required for this approach which has the added benefit of ensuring robustness towards model mismatches.
The remainder of this section is organized as follows: After some preliminaries, we prove a bound on the minimum conic singular value for the case of our measurement setting in Section III B. The proof used a general bound on tensor networks, which we state and prove in Section III C. In Section III D, we state and prove general versions of our main theorems. Fi- nally, in Section III E, we show that our results also hold for a quantum linear optical setting.

A. Preliminaries
Before we come the to proofs we introduce some helpful notation, discuss the minimum conic singular value and a useful bound to it, introduce the null space property and a subsequent recovery guarantee, and explain the use of representation theory for bounding certain moments.

Notation
A vectorization of a tensor is a vector containing all its elements. The Frobenius norm T F of a tensor T is defined to be the 2 -norm of some vectorization of T . Importantly, for an operator A ∈ L(C n ) it holds that A F = A 2 .
The permutation group on k elements is denoted by S k and the unitary group by U(n) ⊂ L(C n ). Its representation on (C n ) ⊗k is denoted by R, so that for σ ∈ S k the representation R(σ) ∈ U (n k ) acts on (C n ) ⊗k by permuting the k tensor copies according to σ.
Affine spaces : In order to deal with the constraint that the reconstructed maps are trace preserving we need the affine version of the usual reconstruction problem [61] in compressed sensing.
It is helpful to extend the basic algebraic operations to sets: E.g., we denote the Minkowski sum of subsets S and R of some vector space by S + R := {s + r : s ∈ S, r ∈ R}. Similarly, we define −R and s + R for some s ∈ S.
An affine space V over the field R is a subset of a vector space over R such that for all λ ∈ R and x, y ∈ V one has (1 − λ)x + λy ∈ V. Then V 0 := V − V is a vector space and V = x + V 0 for any x ∈ V. The linear span lin(V) of V is another vector space containing both V and V 0 .
A map A : V → W between affine spaces V and W is called affine if for all λ ∈ R and x, y ∈ V one has A((1 − λ)x + λy) = (1 − λ)Ax + λAy.
Given an affine map A : V → R m (such as the measurement map defined in (11)) one can extend it linearly to lin(V). This extension will also be denoted by A.
Maps and operators : Any map M ∈ L(C n ) and operators A, ρ ∈ L(C n ) satisfy the identity We will use this identity to write expectation values of the type Tr[AT (ρ)] in terms of the Choi matrix J(T ). By M † we denote the Hilbert-Schmidt adjoint of M ∈ L(C n ) and by M the map which obeys M (X) † = M (X † ) for all X ∈ L(C n ). These two involutions satisfy M † = M † . Moreover, we use the notation We note in passing that CPT maps T ∈ CPT(C n ) have diamond norm T = 1 and Choi matrices are normalized as J(T ) 1 = n.
Truncating Choi matrices : It will be helpful to define the set containing all Choi matrices J HT(C n ) = {X ∈ Herm(C n ⊗ C n ) : Tr 1 (X) = n 1 n }, where Tr 1 denotes the partial trace over the first tensor factor. Consistently with the truncation of maps in L(C n ), the truncation X |r ∈ J HT(C n ) of X ∈ J HT(C n ) to rank r is given by setting all but the largest r singular values of X to zero. The corresponding tail is defined to be X ¬r := X − X |r ∈ J HT(C n ) 0 .

Normalization and centralization of the observables
We assume the fixed observable A 0 to be normalized ( A 0 ∞ = 1) and and centered (traceless): Tr[A 0 ] = 0. Such restrictions somewhat simplify the technical analysis; the following observations highlight that little generalization is lost by imposing them: Observation 6 (Uncentered observables). Fix A 0 ∈ Herm(C n ) and letÃ be the traceless part of A 0 . Then, the associated 4-generic measurement maps A andÃ introduced in Definition 1 do not necessarily coincide. However, the algorithmic reconstructions T rec andT rec of any trace-preserving map T nonetheless coincide for all reconstruction procedures (15), (16), (17), (18), or (13).
Proof. Since any T ∈ CPT(C n ) is trace preserving, we have thatÃ Together with the definition of y andỹ this implies But this means that the corresponding minimizations are the same.
The following observation says that the observable's spectral norm suppresses the noise.

Minimum conic singular value
The usual minimum conic singular value of a map A can be written variationally as the right hand side (RHS) of Eq. (38) with K being the full space and q = 2. In order for A to be invertible, the minimum conic singular value needs to be positive. Moreover, for fixed spectral norm of A, the larger the minimum conic singular of A the more stable A can be inverted. An extension of these basic concepts can be extended to realm of convex analysis, which motivates the following Definition.
Definition 8 ( q -minimum conic singular value). Consider an affine space V where lin(V) is equipped with a norm · . Let A : V → R m be an affine linear map and K ⊂ V 0 be a cone. The minimum singular value of A w.r.t. K, measured in q -norm with q ≥ 1, is where A has been extended to lin(V).
Typically, one chooses q = 2 in order to define the minimum conic singular value [61]. Here, we opt for a more general definition that allows for adjusting stability guarantees towards noise to different types of noise models. For instance one may choose q = 2 for Gaussian noise and q = 1 for Poissonian noise.
The conic singular value of our measurement map A (see Eq. (12)) w.r.t. a cone K ⊂ L(C n ) can be written as We use a method established by Mendelson [77,78] in order to bound the minimum conic singular value. As suggested in Ref. [ (the same for all i) and mean empirical width where i ∈ {−1, 1} are independent uniformly random signs. Then, for any ξ > 0, λ > 0, and q ≥ 1 with probability at least 1 − e −λ 2 /2 .
Using that v 1 ≤ m 1/q v q for any v ∈ C m results in (42) for any q ≥ 1.

Null space property
If an operator X is of rank r one has X 1 ≤ √ r X 2 . Here, we rely on the idea to take a similar inequality to define the notion of effectively rank-r elements. This notion of effective low rank is often enough for low-rank matrix reconstruction. Additionally, one can take into account violations of X being of low rank. This idea is formalized with the null space property (NSP).
Definition 10 (NSP). For q ≥ 1, a linear map A : J HT(C n ) → R m satisfies the q -null space property for rank r with constants µ ∈ ]0, 1[ and τ > 0 if for all X ∈ HT(C n ) 0 We will use the following result from Ref. [60] in our setting. It yields recovery guarantees from the NSP.

Tools from representation theory
In this section we simplify the k-th moments of the random variables U AU † and |ψ ψ |, where U and |ψ are drawn independently from the Haar measures. These derivations will be used when we bound the moments of In order to compute expectation values over the unitary group of the type we will use some basic facts from representation theory. Specifically, we use the decomposition from Schur Weyl duality into irreducible representations (irreps) ρ n λ and π k λ of the unitary group U(n) and the symmetric group S k , respectively. The irreps are labelled by Young diagrams λ with k boxes and at most n rows. Since [U ⊗k , E] = 0 for all U ∈ U(d), Schur's Lemma implies that E can be written as E = λ 1 ⊗ Y λ , where Y λ ∈ ρ n λ . But we also have that [σ, E] = 0 for all σ ∈ S k and Schur's Lemma implies that where each a λ ∈ C and each P λ is the projection onto π k λ ⊗ρ n λ . The coefficients can be calculated as the expansion coefficients This argument also yields that The coefficients are b λ ∝ Tr[P λ F ] and we have |ψ ψ | ⊗k P Sym k = |ψ ψ | ⊗k and P λ P λ = 0 for λ = λ . Together with Tr[F ] = 1 we obtain that where P Sym k is the projector onto the fully symmetric subspace in (C n ) ⊗k . First moments : Setting k = 1 and using that Tr[A 2 ] = A 2 2 and |ψ ψ | 2 = |ψ ψ | yields that and Second moments : The flip operator F ∈ L(C n ⊗ C n ) is given by the linear extension of F |ψ ⊗ |φ := |φ ⊗ |ψ .
The projector onto the symmetric subspace of C n ⊗ C n can be written as P Sym 2 = 1 2 (1 + F). The dimension of the fully symmetric subspace in C n ⊗ C n is
(58) In order to derive the second moment of U AU † we also need the projector onto the anti-symmetric subspace P ∧ 2 = 1 2 (1 − F). From Eq. (48) and Tr[1] = n 2 and Tr[F] = n we obtain E = 2 Tr P Sym 2 A n(n + 1) Evaluating the remaining traces in a basis or using tensor network diagrams yields

B. Our bound on the minimum conic singular value
The following theorem is the main technical result of this work.
Theorem 12 (Bound on λ min (A, K)). For i ∈ [m] let |ψ i be state vectors drawn i.i.d. from the Haar measure on the sphere in C n . Moreover, let (A i , |ψ i ) i∈[m] be a normalized 4-generic measurement ensemble (Definition 1). Denote by A : HT(C n ) → R m the linear map given by the components Moreover, for c µ > 0 denote by the q -minimum conic singular value of A is lower bounded as with probability at least 1 − e −c 2 λ m/5 over A.
Proof of Theorem 12. We choose E in Lemma 9 to be E ν from Lemma 13. Inserting the bounds from Lemmas 14 and 13 into Eq. (42) implies with probability at least 1 − e −λ 2 /2 over A. Choosing ν = ν 0 := n n(n + 1)/ A 2 and using that (n+1)(n−1)/n 2 ≤ 1 and ln( √ 2n) ≤ 3 2 ln(n) for n ≥ 2 (the theorem is trivial otherwise) yields Next, choosing ξ = 1/ √ 10 yields a maximum value greater than 1/5 for ξ 1 − 2ξ 2 2 . This choice yields the bound Furthermore, we set λ = c λ So, we choose m > m 0 with r ln(n)n(n + 1) (69) in order guarantee that the infimum yields a positive value. This choice leads to As the infimum over E ν is homogeneous in ν (i.e., "proportional" to all ν ≥ 0), we obtain for the cone where we have remembered the choice of ν 0 form the beginning of the proof. This bound holds with probability at least 1 − e −λ 2 /2 = 1 − e −c 2 λ m/5 over A for any c λ ∈ ]0, c[.

Upper bound on the mean empirical width Wm
We prove an upper bound on W m defined in Eq. (41) for E being a slice of the cone of effectively rank-r maps.
Lemma 13 (Bound on W m ). For r ∈ Z + , ν > 0, c µ > 0 let and For m ≥ 2n 2 ln(n) let |ψ i let (A i , |ψ i ) i∈[m] be a normalized 4-generic measurement ensemble (Definition 1). Then the mean empirical width (41) is bounded as Proof. By definition (41), W m reads as Since W m (E ν ) = νW m (E) with E := E 1 , it is enough to prove the lemma for ν = 1. With and using the identity (31), the "expectation value" can also be written as We define 80) to arrive at the compact form The application of Hölder's inequality yields Using the definition (74) of K and that J( In order to bound E H ∞ we proceed similarly as in the proof of [20,Proposition 13]. By applying a non-commutative Khintchine inequality (see Theorem 26 in Appendix B) to (80) we obtain Moreover, H 1 always satisfies as A and |ψ are both normalized. Hence, Choosing θ = 1 and proceeding from Eq. (85) we obtain Finally, with Eq. (83), which proves Lemma 13.

Lower bound on the marginal tail function Q ξ
In this section, we prove a lower bound on the marginal tail function (40). (94) for all ξ > 0, where c is an absolute constant.
We will prove this lemma using the Paley-Zygmund inequality. This inequality states that for every non-negative random variable Z ≥ 0 and parameter θ ∈ [0, 1] We will choose Z = |S| 2 so that we can use a lower bound on the second moment and an upper bound on the fourth moment of S. Proof of Lemma 14. As typically done [61], we use the Paley-Zygmund inequality (96) to establish the lower tail bound, Inserting the fourth moment bound (118) and the second moment (100) into (98) we obtain for another absolute constant c > 0 that 2 (n − 1) 2 n 2 (n + 1) 2 (n + 2)(n + 3) where (100) has been used again in the second step. This bound proves Lemma 14.
Actually, we can fully calculate the second moment without any assumptions on Tr[A].
Then S := Tr[U A U † M ( |ψ ψ |)] has the second moment Proof. First, we will derive some identities for certain traces containing 1, F, and M . As M is trace annihilating, i.e., M † (1) = 0, we obtain see Figure 9 for an explanation of the tensor network diagram. Next, using the expressions for the second moments of |ψ ψ | and U AU † , (58) and (60), we obtain We finish the proof by using the trace containing M from the beginning and that Tr[ Lemma 16 (Upper bound on the fourth moment). The random variable S form Lemma 14 has a fourth moment bounded as where c 3 is an absolute constant.
The proof of this lemma uses facts about the symmetric group S 4 all of which are stated in Appendix C.

Proof. The fourth moment is
According to Eq. (51) the average over |ψ yields where we have used that Tr(P Sym 4 ) = d 1 (n) with d 1 (n) = n (n+1)(n+2)(n+3)/24 from Eq. (C9) being the dimension Tr(P Sym 4 ) corresponding to the trivial representation Sym 4 . According to Eq. (48) we have By taking the trace, we obtain the corresponding expansion coefficients where d i (n) = Tr[P i ] is also given in (C9) and each P i is the representation of the central minimal projection p i of S 4 (see (C6)) on (C n ) ⊗4 . Inserting everything into (120), we obtain For a permutation σ ∈ S 4 with representation R(σ) ∈ L((C n ) ⊗4 ) the Hilbert-Schmidt overlap Tr[A ⊗4 R(σ)] only depends on the conjugacy class of σ. We denote the conjugacy class containing permutations composed of each j i disjoint cycles of sizes {k i } i∈[l] by k j1 1 k j2 2 . . . k j l l , e.g., 2 1 ⊂ S 4 are the transpositions. One can conclude, e.g., from tensor network diagrams that for n ≥ 4. For n = 3 we have P 2 = 0 and, hence a 2 = 0. For n = 2 we have P 2 = 0 and P 5 = 0 and, hence, a 2 = a 5 = 0. In both cases, the remaining a i are as stated above. From the submultiplicativity of the Schatten 2-norm follows that A j 2 ≤ A j 2 for j ∈ Z + . Also using the bound we obtain for all i ∈ [5], where c 1 is an absolute constant.
Combining (118) from Lemma 17 below, (128), and (124) yields for some absolute constant c 3 . Proof. We will conclude from Proposition 18 that (131) for some absolute constant c 2 , which implies the lemma.
We consider HT(C n ) 0 ⊂ L(L(X ) → L(Y)) as a subspace, where X = C n = Y are labels for the input and output Hilbert space. The permutation operator R(τ ) permutes and contracts the indices of M 2,2 in Tr[R(σ)M 2,2 (R(τ ))] that correspond to X . Similarly, the permutation operator R(σ) permutes the indices of M 2,2 in Tr[R(σ)M 2,2 (R(τ ))] that correspond to Y, which are contracted subsequently by Tr. So, no X -index is contracted with a Y-index. Hence, Tr[R(σ)M 2,2 (R(τ ))] is a contraction without self-contractions of the ten-

C. General bound on tensor networks
In this section, we prove a simple bound on a fully contracted tensor network, which we used to prove Lemma 17. A tensor is a vector in C n1 ⊗ C n2 ⊗ · · · ⊗ C n K , where X ⊗ Y is denotes the tensor product of vector spaces X and Y. It is often helpful to identify a tensor with its representation t in terms of a product basis of the canonical bases of C ni . Then t is given as an array of numbers, i.e., t ∈ C n1×n2×···×n K .
A tensor network is a set of tensors together with a contraction corresponding to pairs of indices where both indices have the same dimension n i . Instances of such tensor networks are, e.g., the workhorse of powerful simulation techniques for strongly correlated quantum systems [80].
We present a general version of such tensor networks and then prove our bound. For this purpose, we use a notation that is completely disjoint from the other sections. E.g., C will be a contraction instead of a constant.
A pointer to an index of a tensor in C n1 ⊗ C n2 ⊗ · · · ⊗ C n K is just a number k ∈ [K] used to identify the k-th index. A contraction is a linear map given by a set of pairs of pointers P = ({k l , k l }) l∈[L] with even K − I = 2L so that each number k ∈ [K] occurs at most once in at most one of the pairs. In particular, k l = k l for all l ∈ [L]. Moreover, the consistency condition n k l = n k l is required to hold for all l ∈ [L] (in many relevant cases the dimensions n k assume only a very few different values). Let P = {k l , k l } l∈[L] be the set of pointers occurring in the pairs P and I := [K] \ P the other pointers. Then the contraction C(t) of a tensor t ∈ C n1 ⊗ C n2 ⊗ · · · ⊗ C n K is given by the components t α1,α2,...,α K × l∈P δ α k l ,α k l m∈I δ αm,α m , i.e., all index pairs {k l , k l } are summed over and the remaining indices are the indices of C(t).
A tensor network is a set of tensors T = (t j ) j∈J with t j ∈ C n j 1 ×n j 2 ×···×n j K j together with a contraction C of their tensor product t 1 ⊗ t 2 ⊗ · · · ⊗ t J , where we use the convention with K := J j=1 K j . For short, we write C(T ) := C(t 1 ⊗ t 2 ⊗· · ·⊗t J ). We say that a tensor network (T, C) with tensors T = (t j ) j∈J has a self-contraction if there is a tensor t j such that both pointers of one of the pairs {k l , k l } defining C point to indices of t j . We call a tensor network (T, C) closed if C(T ) is a number, i.e., if I is empty.
The relevance of tensor networks comes from the fact that contractions of certain tensor networks (such as Matrix Product States) can be calculated efficiently, while only to store general tensors requires exponentially many parameters in the total number of indices. In general, estimating the outcome of a contraction is a #P-hard problem [81]. However, there is a simple and natural upper bound, which might be well-known. But for sake of a self-contained presentation we will also provide a proof below.
Proposition 18 (Bound on tensor network contractions). Let (T, C) be a tensor network with J ≥ 2 tensors T = (t j ) j∈ [J] and contraction C without self-contractions. Then If a tensor network has a self-contraction this statement can fail to hold. Such an example can be easily constructed by taking the tensors so that the trace of an identity matrix occurs.
Proof idea of Proposition 18. Reshaping t j suitably into matricest j , the tensor network C(T ) can be rewritten as a matrix product oft j ⊗1 sandwiched between vectorizations t 1 and t J of t 1 and t J , Hence, where we have used that A ⊗ B ∞ = A ∞ B ∞ , 1 ∞ = 1, and that the Frobenius norm is the 2 -norm of a vectorization. Then the bound t j ∞ ≤ t j 2 = t j F between the Schatten norms finishes the proof.
In order to formalize this proof idea, we provide some more preliminaries. A contraction C of a tensors with K indices is defined by pairs of pointers {(k l , k l )} l∈ [L] . For any L ⊂ [L] ⊂ [K], we define the partial contraction C L of a tensor t to be the one given by the subset of pointer pairs {(k l , k l )} l∈L . For L 1 ⊂ [K] and L 2 ⊂ [K] with L 1 ∩ L 2 = ∅ we can naturally apply C L2 to C L1 (t) and it holds that C L2 (C L1 (t)) = C L1 (C L2 (t)) and C(t) = C [K]\L (C L (t)).
We will use the following facts about matrices A ∈ C n1×n2 and B ∈ C n2×n3 with dimensions n 1 , n 2 , n 3 ≥ 1: For n 1 = n 3 = 1 one has A ∞ = A 2 and similarly for B. We will also use the identity which holds for arbitrary matrices A ∈ C n1×n2 and B ∈ C n3×n4 .
Any bipartition of the indices of a tensor t yields a class of unitarily equivalent matrices. More specifically, a matricization of a tensor t ∈ C n1×n2×···×n K is a matrix A of which the matrix elements are given by A (i σ(1) ,i σ(2) ,...,i σ(K ) ),(i σ(K +1) ,...,i σ(K) ) = t i1,i2,...,i K (140) for some permutation σ ∈ S K . Two such matricizations given by τ, σ ∈ S K and the same K ∈ [K] are unitarily equivalent if {σ(i)} i∈[K ] = {τ (i)} i∈[K ] , i.e., if σ and τ yield the same bipartition of the pointers [K]. For a tensor t ∈ C n1×n2×···×n K and a disjoint bipartition L∪ R = [K] of the pointer set [K] we denote by t L,R some matricization of t that comes from this bipartition. For any such matricization t L,R holds that A vectorization of a tensor is a matricization that yields a vector, i.e., a matrix with one column or row.
Proof of Proposition 18. It is enough to prove the proposition for the case where the tensor network is closed, i.e., where C(T ) ∈ C. This is so because we can write C(T ) 2 F always as a closed tensor network, where all previous tensors plus their complex conjugates occur. A tensor and its conjugate have the same norm, which shows the reduction argument.
We denote the tensors of the tensor network by t j ∈ C n j 1 ×n j 2 ×···×n j K j where j ∈ [J] and the pointer pairs defining the contraction C by Let us consider first the case J = 2. As there are no selfcontractions, we can relabel the pointer pairs ({k l , k l }) l∈ [K] so that the k l belong to t 1 and the k l to t 2 . Hence, C(T ) is an inner product C(T ) = t 1 t 2 of matricizations of t 1 and t 2 into vectors t 1 and t 2 . Therefore, the Cauchy-Schwarz inequality and the invariance of the Frobenius norm (141) prove the proposition for J = 2.
Now we consider J ≥ 3. If there are contractions {k l , k l } where one of the indices belongs to t 1 and one to t J , for each such l we introduce an auxiliary tensor s l as identity matrix with components s l m k l ,m k l := δ m k l ,m k l that is contracted with t 1 and t 2 and replaces their contraction {k l , k l }. This corresponds to tensoring t 2 ⊗ t 3 ⊗ · · · ⊗ t J−1 with an n k l × n k l identity matrix. Denote by s l1 , s l2 , . . . , s l M the tensors that are introduced in this way. We obtainC from C by, the this modification, so thatC contracts the modified tensor networkT = (t 1 ,T , t J ) with which contain no further contractions between t 1 and t J . Now we have achieved that the pointer pairs connected to t 1 are K 1 := [K 1 ] and the ones of t J are K J := [K] \ [K − K J ], so that they are clearly disjoint.
Hence, we can write In factC K J CK (T ) ⊗ t J →C(T ) is the action of the functional given byC K 1 t 1 ⊗ · , which, in turn, is given by a vectorization t 1 of t 1 . Hence, we can obtain Similarly, t J →C K J CK (T ) ⊗ t J is a linear mapping, where we viewCK (T ) as a linear map contracting the indices K J and having non-contracted indices K 1 . This yields a vectorization t J of t J an a map is represented by a matri-cizationCK (T ) K 1 ,K J ofCK (T ) so that Using Eq. (142) we arrive at (148) By construction, each auxiliary tensor s l is an identity matrix with one index in K 1 and one in K J and the corresponding matricization has unit spectral norm. Using Eq. (139), we obtain where K 1 ⊂ K 1 and K J ⊂ K are those pointer pairs of K 1 and K 2 have no pointer to any s l (there are no contractions between S and T ).
Iterating Lemma 19 and using the bound (138) and Eq. (141) we obtain that This completes the proof.
Proof. We write the matricized partially contracted tensor network as a matrix product, where id L 2 denotes the identity matrix with row indices given by L 2 and matching column indices and similarly for id R1 ; e.g., id {3,5} has the matrix components Using Eqs. (137) and (139) finishes the proof.

D. Proofs of the reconstruction theorems
In this section we prove generalized versions of the theorems presented in Section II providing recovery guarantees for the constrained trace and diamond norm reguralization (16) and (17) and the CPT-fit (13).

Constraint trace norm minimization
In this section we prove a stable and robust reconstruction guarantee for: with q ≥ 1. The reconstruction procedure T * c η introduced in (16) is a special case, namely q = 2, of this class of minimization problems.
Then there are constants C and λ such that the following holds. Fix q ≥ 1, some Kraus rank r, a number of measurement settings m ≥ C r ln(n)n 2 (154) and let p ∈ [1, 2]. Then, with probability at least 1 − e −λm , for all T 0 ∈ CPT(C n ) the solution T * c η,q of the minimization (153) with y = A(T 0 ) + e approximates T 0 with an error The constants C, λ, andc only depend on each other.
Proof. Thanks to Observations 6 and 7 it is enough to prove the theorem of the case where A 0 is traceless (i.e., A 0 =Ã 0 ) and normalized to A 0 ∞ = 1.
We prove this statement by establishing a strong notion of the NSP (44) for Choi matrices of channel differences. The main technical ingredient to prove the NSP is the bound on the minimum conic singular value from Theorem 12. Then Theorem 11 will give the desired result.
When proving the NSP it is helpful to consider two cases. First, operators X ∈ J HT(C n ) satisfying X |r F ≤ µ √ r X ¬r 1 are effectively of high rank and satisfy the NSP by default. Thus, it suffices to consider the case where X |r F ≥ µ √ r X ¬r 1 . Every such matrix satisfies Hence M = J −1 (X) is contained in the cone K from Eq. (62) with c µ = 1+µ µ . For any q ≥ 1, Theorem 12 yields the bound with probability over A at least 1 − e −λm , where and whereC is a constant only depending on λ and µ. Hence, Theorem 11 yields for any µ ∈ ]0, 1[ . The minimization in (153) assures J(T * c η,q ) 1 ≤ J(T 0 ) 1 by construction, because J(T 0 ) is a feasible point of this optimization problem. Moreover, since A(T 0 ) − y = e with e q ≤ η and y − A(T * c η ) q ≤ η due to a constraint in the minimization (16). Choosing µ = √ 5 − 2 leads to (1 + µ) 2 /(1 − µ) = 2. Simplifying the bound (161) with these observations completes the proof.

CPT-fit
In this section, we prove a stable and robust reconstruction guarantee for the following generalization of the CPT-fit (13): where q ≥ 1 arbitrary. Similar to before, the CPT-fit discussed in the introductory section arises from fixing q = 2.
Theorem 21 (Stable and robust reconstruction from CPT-fit). Consider a 4-generic measurement setup as in Theorem 20 and the same constants C and λ. Fix q ≥ 1, some Kraus rank r, a number of measurement settings m ≥ C r ln(n)n 2 (164) and p ∈ [1, 2]. Then, with probability at least 1−e −λm , for all T 0 ∈ CPT(C n ) the solution T q of (163) with y = A(T 0 )+e approximates T 0 with an error (165) The constants C, λ, andc again only depend on each other.
Proof. This proof is analogous to the proof of Theorem 20 with two exceptions: First, J(T q ) 1 = J(T 0 ) 1 = 1, because both are constrained to be Choi matrices of CPT maps. Consequently, their difference vanishes in (161). Secondly, the minimization (163) assures because T 0 is a feasible satisfying y − A(T 0 ) q = e q and T q being the minimizer.

Constrained trace and diamond norm minimization
In Ref. [9] is shown that certain recovery guarantees for trace norm regularization (such as (15) and (16)) automatically imply recovery guarantees for the analogous diamond norm regularization (such as (17) and (18)). This holds when the recovery guarantees can be proven as, e.g., in Ref. [61] via the descent cone of the reguralizer. In this section, we follow this strategy in order to obtain a recovery guarantee for diamond norm reguralization in our quantum process tomography setting.
An error bound from the descent cone : For the proof of Theorem 20 we use an error bound relying on the so-called descent cone of the function that is minimized in the reconstruction. The descent cone of a function is the cone of directions in which the function decreases [61]: Definition 22 (Descent cone). Let V be an affine space and f : V → R be a proper convex function. The descent cone of f at a point x ∈ V is The following error bound is the basis for many recovery guarantees from bounds to the conic singular value from Definition 8. It has been proven in Ref. [82] (where the descent cone is given as a tangent cone) and has later been restated by Tropp [61] for optimizations over a vector space. One can easily see from its proof that it also holds if one optimizes over an affine space and chooses arbitrary q -norms (q ≥ 1) to measure the size of the minimum conic singular value.
Proposition 23 (Error bound for convex recovery, Tropp's version [61,Proposition 2.6]). Let x 0 ∈ V be a signal, A ∈ V → R m be an affine linear measurement map, y = A(x 0 )+e a vector of m measurements with additive error e ∈ R m . Fix q ≥ 1 and let x f η,q be the solution of the optimization for a convex function f : V → R. If e q ≤ η then where λ min (A; D(f, x 0 ), q ) has been defined in (38).
Proof. In the proof of [61, Proposition 2.6] one uses Definition 8 of the conic singular value only for element in V − V.
Generalizing the proof from errors measured in Euclidean norm (q = 2) to any q -norm is also straightforward, provided that the definition of the minimum conic singular value is properly adjusted.
Our recovery guarantee for the diamond norm : First we prove a weaker version of Theorem 20 with a different argument. In turn this reconstruction guarantee for minimization (153) will imply a recovery guarantee for the following generalization of diamond norm reconstruction: T c η,q = arg min{ T : T ∈ HT(C n ), A(T ) − y q ≤ η}.
(169) Note that the minimization (18) discussed in the main text is a special case of (169), where q = 2.
Theorem 24 (Stable reconstruction from trace norm minimization). Consider a measurement setting as in Theorem 20 with possibly different constants C and λ. Fix some Kraus rank r and m ≥ C r ln(n)n 2 , as well as q ≥ 1. Then, with probability at least 1 − e −λm , for all T 0 ∈ HT(C n ) with Kraus rank rank(J(T 0 )) ≤ r the solution T * c η,q of the minimization (153) with y = A(T 0 ) + e approximates T 0 with an error provided that e q ≤ η. The constants C, λ, andc again only on each other.
Proof of Theorem 24. Thanks to Observations 6 and 7 it is enough to prove the theorem of the case where A 0 is traceless (i.e., A 0 =Ã 0 ) and normalized to A 0 ∞ = 1. This proof relies on Proposition 23 where the reconstruction error is bounded in terms of the q -minimum conic singular value of A w.r.t. the descent cone of the trace norm at T 0 .
Then the theorem implies that with probability at least 1 − e −λm the minimum conic singular value λ min of A w.r.t. the descent cone D( J( · ) 1 , T 0 ) is bounded as whereC is a constant only depending on λ. Proposition 23 finishes the proof. Now, the results from Ref. [9], specifically [9, Implication 9] tell us that the diamond norm minimization performs at least as well as the trace norm minimization.
Corollary 25 (Stable reconstruction from diamond norm minimization [9]). Choosing the diamond norm minimization (18) instead of the trace norm minimization (16) in Theorem 24 yields a smaller smaller reconstruction error,

E. Bosonic and fermionic linear optical circuits
It is worth mentioning that the above results largely carry over to another important task which is the tomography of bosonic and fermionic circuits. For bosonic systems, this refers to the tomography of linear optical circuits, which play an important role in quantum information processing, with such circuits becoming available for a large number of modes with the advent of integrated optical circuits [36,37]. For fermionic systems, this applies to what is called fermionic linear optics [83]. Interestingly, the results laid out above readily apply to both situations, once the objects of interest are appropriately identified.
For the bosonic setting, consider n modes associated with bosonic annihilation operators (b 1 , . . . , b n ) and Hilbert space H B n in second quantization. The correlation matrix of a quantum state ρ ∈ S(H B n ) is defined as C ∈ Herm(C n ) with C j,k = Tr(b † j b k ρ). Mode transformations that preserve the boson number are reflected by maps (b 1 , . . . , b n ) → (c 1 , . . . , c n ), where with U ∈ U(n). Such maps are usually referred to as passive transformations, or linear optical transformations in the quantum optical context. Under such mode transformations, correlation matrices transform as Initial correlation matrices reflecting the quantum state can hence be taken as reflecting the situation that mode labelled 1 is first prepared in a Gaussian state satisfying Tr(b † 1 b 1 ρ) = 1, while the others are kept in the vacuum Tr(b † j b j ρ) = 0 for j = 2, . . . , n. This is then conjugated by a Haar random unitary U ∈ U(n), giving formally rise to an identical transformation as considered above, with |ψ i ψ i | ∼ |ψ ψ | are i.i.d. realizations of |ψ ψ | = U † |1 1 | U , with U ∈ U(n) drawn from the Haar measure. Such preparations are optically readily feasible with present technology, as Gaussian states are particularly accessible with common sources.
Random measurements can again be seen as i.i.d. realizations A i ∼ U AU † with U ∈ U(n) being Haar random. Here A does not take the role of the observable itself, but reflect natural homodyne measurements on the level of correlation matrices. Their expectation values are obtained as Tr(AC) for correlation matrices C. So again, while the type of measurement is different and the objects involved take an altered physical role, the map realized is formally identical with with single expectation values for T ( |ψ i ψ i |) := V † |ψ i ψ i | V , with V ∈ U(n) reflecting the unknown linear process. In this way, process tomography of the kind discussed here is applicable to the bosonic setting. This seems particularly important with the advent of monolithic bosonic integrated optical devices [37], as they are, e.g., employed in boson samplers. Fermionic linear circuits associated with fermionic annihilation operators (f 1 , . . . , f n ) have the same structure (on that level). Again, correlation matrices C ∈ Herm(C n ) transform as for U ∈ U(n), and the same preparations are feasible. Here |1 1 | reflects a fermionic Gaussian state in which the first mode contains exactly a single fermion, while the other n − 1 modes contain no fermion. The same type of measurement is therefore again possible.

IV. CONCLUSION
We have proven that quantum processes can be reconstructed from an essentially optimal number of expectation values without the requirement of ancillary quantum systems. Moreover, by an extensive numerical analysis we have (i) demonstrated the practical feasibility of our approach and (ii) that the reconstruction procedures also work for Pauli-type measurement settings. The number of necessary expectation values scales as ∼ r n 2 ln(n), where r is the anticipated Kraus rank of the channel. The reconstructions are stable against measurement noise and robust against violations of the measured quantum channel having the anticipated Kraus rank. In particular, no strict assumptions on the noise level or the Kraus rank are required for a simple fitting procedure (CPT-fit) to be guaranteed to give a good approximation of the measured quantum channel. In several physically feasible and realistic setting, the prescriptions laid out give direct and concrete advice on how to optimally perform quantum process tomography.

A. Outlook
In this outlook, we present a short outline of several aspects that seem to be interesting starting points for future research.
Mixed input states : The first potential generalization concerns the use of pure quantum states in the reconstruction procedure. Numerically, we have observed that our reconstructions work almost equally well when the input states to the channels are mixed. Finding a recovery guarantee following this observation would be a step towards more practical measurements.
The restricted isometry property (RIP) and perspectives for thresholding methods : The RIP can be adapted to our setting. A measurement map A is said to fulfil RIP for Kraus rank r if for all quantum channels T ∈ CPT(C n ) with Kraus rank at most r. The lower RIP bound is -as in our case -typically enough to obtain recovery guarantees for optimization procedures. But their computational cost is often not optimal. For instance, iterative hard [84,85] and soft [86] thresholding algorithms are faster in many instances. But here, recovery guarantees are typically more difficult to prove and also required the upper RIP bound. Analyzing such algorithms for process tomography setting would be an interesting endeavour for future research.
Approximate 4-designs : In order for our recovery guarantees to hold, the input states to the process need to be drawn from a complex 4-design and the output states need to be measured with observables that have unitary 4-designs are eigenbases. A next step in proving more general recovery guarantees could be to prove that -approximate 4-designs are enough, in the sense that the only changes the constants in the recovery guarantee but not directly the reconstruction error. This would significantly lessen the burden for experimental realizations. For low-rank matrix recovery, such a generalization has indeed been proven [20]. Indeed, here only the constants in the recovery guarantee need to be changed in order to allow for approximate designs.
Random Clifford circuits : Building upon this line of thought, the work [10] shows that random Clifford gates are very close to being unitary 4-designs, and that they provide a precise characterization in terms of irreducible representations of the Clifford group. Such tools might be helpful to generate the measurement settings. Here, it would be interesting to see if these new insights can be used in order to prove our recovery guarantees with the input states of the channels being random stabilizer states and the bases of the observables random Clifford unitaries. Having the results on random quantum circuits [39] in mind, we even expect random Clifford circuits of moderate length to be good enough to provide the randomness for our measurements. Approximate unitary 4-designs are also known to be realizable with random quantum circuits of suitable depth, each quantum gate of which is drawn according to the Haar measure [39], but then the constituent quantum gates are no longer Clifford gates, which may again be more challenging, in particular if fault tolerance is to be achieved.
Time-fluctuating dynamics and random circuits : Again further developing this idea, it has also been proven that stochastic Hamiltonian time evolution [41] gives rise to approximate unitary 4-designs. This approach offers a charming perspective: Instead of having to implement an elaborate quantum circuit consisting of precisely controlled unitary gates, one considers suitably time-fluctuating Hamiltonian dynamics. For the purposes of quantum process tomography, the randomly fluctuating classical parameters governing the quantum process need to be (classically) stored. But this seems a small price to pay, given that any need to implement quantum gates to achieve approximate unitary designs is abolished in this fashion. It is the hope that the present work, providing essentially resource optimal schemes for quantum process tomography, inspires such further work on both feasible and economical schemes.
Diamond norm : We have observed numerically ( Figure 5) that the minimum value (the diamond norm of the reconstructed channel) of the unconstrained diamond norm reconstruction (17), [9] is one if the reconstruction error is small and decreases with increasing reconstruction error. It would be interesting to also understand this observation analytically. In this way one can obtain some confidence about reconstructed quantum channels. Of course, also other types of reconstruction certificates would be of great interest.
Frequencies and dependent measurements : In quantum experiments one measures frequencies. These can be com-bined to determine expectation values of observables. We use such expectation values in our reconstruction. However, our reconstruction procedures can, in principle, also receive frequencies as inputs. It is expected that fewer measurement settings suffice in such an approach. This, however, likely comes at the price of more repetitions per setting in order to reduce the statistical error.
Such a measurement setup would be an important instance of compressed sensing with dependent measurements (measurements corresponding to different eigenvalues of one observable are not independent). Unfortunately, these dependencies render the mathematical analysis considerably more challenging. Even in the state tomography setting, there are only first inroads towards this problem [3,4].
We have restricted this work to natural measurements which lead to independence in a natural way. As a first step towards dependent measurements, it would be interesting to numerically compare sample complexities in the cases of our natural measurement setup and the corresponding frequency measurements, with a fair accounting for statistical noise.
Sample complexity : For quantum state tomography, fundamental lower bounds on the sample complexity have been established in Refs. [66] and [16]. They are valid for arbitrary POVM measurements [66] and measuring Pauli observables [16], respectively. Moreover, these works also determine the sample complexity associated with different compressedsensing based state tomography techniques. A comparison with the associated fundamental lower bounds shows a close to optimal scaling (at least for low-rank states).
In contrast to state tomography, very little is known about the sample complexity associated with process tomography. A straightforward adaptation of the results [66] and [16] is hindered by: (i) Typical process tomography measurementssuch as the 4-generic measurements considered here -have neither a Pauli structure, nor can they be interpreted as state POVMs in a strict sense [87]. This makes the task of determining the exact sample complexity associated with a concrete tomographic procedure more difficult. (ii) Due to the trace preservation condition, quantum channels are more restricted than quantum states. Suitable packing nets -a key ingredient in the derivation of the fundamental lower bounds in [16,66] -must fulfil these additional requirements which makes their construction considerably more challenging. Despite these obstacles, we believe that such a generalization is timely and well-motivated.

APPENDIX
In this appendix, we provide some auxiliary statements in order to keep this work largely self-contained. In Appendix A, we provide semidefinite programming formulations of the reconstruction procedures (15), (16), (17), and (18). In Appendix B, we state a non-commutative Khintchine inequality used in the proof of our bound on the conic minimum singular value, more precisely in the proof of Lemma 13 with the bound on the mean empirical width. Then, finally, we provide some facts about the symmetric group S 4 in Appendix C. These facts are also used in the derivation of the bound on the conic minimum singular value in order to bound the fourth moment of our measurements (Lemma 16). Our reconstructions can be implemented as semidefinite programs (SDPs) [9], which can practically be solved using standard software such as CVX [58,59]. Let us consider the reconstruction of a quantum channel mapping operators in L(X ) to operators in L(Y). The minimization (15) can be rewritten as the following SDP, The reconstruction (16) is obtained by adding the constraint T † (1 Y ) = 1 X into this SDP. By only changing the objective function with the spectral norm of partial traces Tr Y over the output space Y of T : L(X ) → L(Y), we obtain the minimization (17) as the following SDP [9,47], Again, the constrained minimization (18) is obtained by adding the constraint T † (1 Y ) = 1 X into the SDP.  In order to bound the fourth moment of the measurement map some facts about the representation of the permutation group S 4 are helpful. In this section, they are summarized and partially derived. Facts that we just state can, e.g., be found in the Wikis [90] and [91].
By k j1 1 k j2 2 . . . k j l l we denote the conjugacy class containing permutations composed of each j i disjoint cycles of sizes {k i } i∈[l] , e.g., 2 2 ⊂ S 4 are products of disjoint transpositions. Corresponding to each conjugacy class there is on irrep. They are given by the Young Frames [91] F 1 := (trivial rep.)    The dimension of the representation F i (in the group algebra) will be denoted by d i (also called degree [91]). These dimensions are given by d i = χ i ([1]). Expanding a character χ in the group algebra yields where · , · is the inner product of the group algebra, C denotes the sum over all conjugacy classes C in S 4 , and ΣC := σ∈C σ the sum over the conjugacy class C. The central minimal projections where we have made the identification σ, · ∼ = σ.