Quantifying the magic of quantum channels

To achieve universal quantum computation via general fault-tolerant schemes, stabilizer operations must be supplemented with other non-stabilizer quantum resources. Motivated by this necessity, we develop a resource theory for magic quantum channels to characterize and quantify the quantum"magic"or non-stabilizerness of noisy quantum circuits. For qudit quantum computing with odd dimension $d$, it is known that quantum states with non-negative Wigner function can be efficiently simulated classically. First, inspired by this observation, we introduce a resource theory based on completely positive-Wigner-preserving quantum operations as free operations, and we show that they can be efficiently simulated via a classical algorithm. Second, we introduce two efficiently computable magic measures for quantum channels, called the mana and thauma of a quantum channel. As applications, we show that these measures not only provide fundamental limits on the distillable magic of quantum channels, but they also lead to lower bounds for the task of synthesizing non-Clifford gates. Third, we propose a classical algorithm for simulating noisy quantum circuits, whose sample complexity can be quantified by the mana of a quantum channel. We further show that this algorithm can outperform another approach for simulating noisy quantum circuits, based on channel robustness. Finally, we explore the threshold of non-stabilizerness for basic quantum circuits under depolarizing noise.


A. Background
One of the main obstacles to physical realizations of quantum computation is decoherence that occurs during the execution of quantum algorithms. Fault-tolerant quantum computation (FTQC) [19,73] provides a framework to overcome this difficulty by encoding quantum information into quantum error-correcting codes, and it allows reliable quantum computation when the physical error rate is below a certain threshold value.
The fault-tolerant approach to quantum computation allows for a limited set of transversal, or manifestly fault-tolerant, operations, which are usually taken to be the stabilizer operations. However, the stabilizer operations alone do not enable universality because they can be simulated efficiently on a classical computer, a result known as the Gottesman-Knill theorem [1,37]. The addition of non-stabilizer quantum resources, such as non-stabilizer operations, can lead to universal quantum computation [11]. With this perspective, it is natural to consider the resourcetheoretic approach [21] to quantify and characterize non-stabilizer quantum resources, including both quantum states and channels.
One solution for the above scenario is to implement a non-stabilizer operation via state injection of so-called "magic states," which are costly to prepare via magic state distillation [11] (see also [10,18,20,43,44,51,54]). The usefulness of such magic states also motivates the resource theory of magic states [12,46,58,80,81,88], where the free operations are the stabilizer operations and the free states are the stabilizer states (abbreviated as "Stab"). On the other hand, since a key step of fault-tolerant quantum computing is to implement non-stabilizer operations, a natural and fundamental problem is to quantify the non-stabilizerness or "magic" of quantum operations. As we are at the stage of Noisy Intermediate-Scale Quantum (NISQ) technology, a resource theory of magic for noisy quantum operations is desirable both to exploit the power and to identify the limitations of NISQ devices in fault-tolerant quantum computation.

B. Overview of results
In this paper, we develop a framework for the resource theory of magic quantum channels, based on qudit systems with odd prime dimension d. Related work on this topic has appeared recently [69], but the set of free operations that we take in our resource theory is larger, given by the completely positive-Wigner-preserving operations as we detail below. We note here that dlevel fault-tolerant quantum computation based on qudits with prime d is of considerable interest for both theoretical and practical purposes [4,17,26,38,48].
Our paper is structured as follows: • In Section II, we first review the stabilizer formalism [37] and the discrete Wigner function [41,42,93]. We further review various magic measures of quantum states and introduce various classes of free operations, including the stabilizer operations and beyond.
• In Section III, we introduce and characterize the completely positive-Wigner-preserving (CPWP) operations. We then introduce two efficiently computable magic measures for quantum channels. The first is the mana of quantum channels, whose state version was introduced in [81]. The second is the max-thauma of quantum channels, inspired by the magic state measure in [88]. We prove several desirable properties of these two measures, including reduction to states, faithfulness, additivity for tensor products of channels, subadditivity for serial composition of channels, an amortization inequality, and monotonicity under CPWP superchannels.
• In Section IV, we explore the ability of quantum channels to generate magic states. We first introduce the amortized magic of a quantum channel as the largest amount of magic that can be generated via a quantum channel. Furthermore, we introduce an informationtheoretic notion of the distillable magic of a quantum channel. In particular, we show that both the amortized magic and distillable magic of a quantum channel can be bounded from above by its mana and max-thauma.
• In Section V, we apply our magic measures for quantum channels in order to evaluate the magic cost of quantum channels, and we explore further applications in quantum gate synthesis. In particular, we show that at least four T gates are required to perfectly implement a controlled-controlled-NOT gate.
• In Section VI, we propose a classical algorithm inspired by [61] for simulating quantum circuits, which is relevant for the broad class of noisy quantum circuits that are currently being run on NISQ devices. This algorithm has sample complexity that scales with respect to the mana of a quantum channel. We further show by concrete examples that the new algorithm can outperform a previous approach for simulating noisy quantum circuits, based on channel robustness [69].

A. The stabilizer formalism
For most known fault-tolerant schemes, the restricted set of quantum operations is the stabilizer operations, consisting of preparation and measurement in the computational basis and a restricted set of unitary operations. Here we review the basic elements of the stabilizer states and operations for systems with a dimension that is a product of odd primes. Throughout this paper, a Hilbert space implicitly has an odd dimension, and if the dimension is not prime, it should be understood to be a tensor product of Hilbert spaces each having odd prime dimension.
Let H d denote a Hilbert space of dimension d, and let {|j } j=0,··· ,d−1 denote the standard computational basis. For a prime number d, we define the unitary boost and shift operators X, Z ∈ L(H d ) in terms of their action on the computational basis: where ⊕ denotes addition modulo d. We define the Heisenberg-Weyl operators as where τ = e (d+1)πi/d , u = (a 1 , a 2 ) ∈ Z d × Z d .
For a system with composite Hilbert space H A ⊗ H B , the Heisenberg-Weyl operators are the tensor product of the subsystem Heisenberg-Weyl operators: where The Clifford operators C d are defined to be the set of unitary operators that map Heisenberg-Weyl operators to Heisenberg-Weyl operators under unitary conjugation up to phases: These operators form the Clifford group. The pure stabilizer states can be obtained by applying Clifford operators to the state |0 : A state is defined to be a magic or non-stabilizer state if it cannot be written as a convex combination of pure stabilizer states.

B. Discrete Wigner function
The discrete Wigner function [41,42,93] was used to show the existence of bound magic states [80]. For an overview of discrete Wigner functions, we refer to [80,81] for more details. See also [31] for a review of quasi-probability representations in quantum theory with applications to quantum information science.
For each point u ∈ Z d × Z d in the discrete phase space, there is a corresponding operator A u , and the value of the discrete Wigner representation of a state ρ at this point is given by where d is the dimension of the Hilbert space and {A u } u are the phase-space point operators: The discrete Wigner function can be defined more generally for a Hermitian operator X acting on a space of dimension d via the same formula: For the particular case of a measurement operator E satisfying 0 ≤ E ≤ 1, the discrete Wigner representation is defined as i.e., without the prefactor 1/d. The reason for this will be clear in a moment and is related to the distinction between a frame and a dual frame [32,33,61]. Some nice properties of the set {A u } u are listed as follows: From the second property above and the definition in (7), we conclude the following equality for a quantum state ρ: For this reason, the discrete Wigner function is known as a quasi-probability distribution. More generally, for a Hermitian operator X, we have that so that for a subnormalized state ω, satisfying ω ≥ 0 and Tr[ω] ≤ 1, we have that u W ω (u) ≤ 1.
Following the convention in (10) for measurement operators, we find the following for a positive operator-valued measure (POVM) {E x } x (satisfying E x ≥ 0 ∀x and x E x = 1): so that the quasi-probability interpretation is retained for a POVM. That is, W (E x |u) can be interpreted as the conditional quasi-probability of obtaining outcome x given input u.
We can quantify the amount of negativity in the discrete Wigner function of a state ρ via the sum negativity, which is equal to the absolute sum of the negative elements of the Wigner function [81]: By definition, we find that sn(ρ) ≥ 0. The mana of a state ρ is defined as [81] M(ρ) := log We define the mana more generally, as in [88], for a positive semi-definite operator X via the formula We denote the set of quantum states with a non-negative Wigner function by W + (Wigner polytope), i.e., It is known that quantum states with non-negative Wigner function are classically simulable and thus are useless in magic state distillation [80], which can be seen as the analog of states with positive partial transpose (PPT) in entanglement distillation [45,62]. Motivated by the Rains bound [66] and its variants [30,77,78,[83][84][85][86] in entanglement theory, the set of sub-normalized states with non-positive mana was introduced as follows to explore the resource theory of magic states [88]: It follows from definitions and the triangle inequality that Tr[σ] ≤ 1 if σ ∈ W (alternatively one can conclude this by inspecting the right-hand side of (16)). Furthermore, we define W + to be the set of Hermitian operators with non-negative Wigner function: The Wigner trace norm and Wigner spectral norm of an Hermitian operator V are defined as follows, respectively, The Wigner trace and spectral norms are dual to each other in the following sense: with C ranging over Hermitian operators within the same space.

C. Stabilizer channels and beyond
A stabilizer operation (SO) consists of the following types of quantum operations: Clifford operations, tensoring in stabilizer states, partial trace, measurements in the computational basis, and post-processing conditioned on these measurement results. Any quantum protocol composed of these quantum operations can be written in terms of the following Stinespring dilation representation: where U is a Clifford unitary and the ancilla ρ E is a stabilizer state.
The authors of [2] generalized the set of stabilizer operations to stabilizer-preserving operations, which are those that transform stabilizer states to stabilizer states and which form the largest set of physical operations that can be considered free for the resource theory of non-stabilizerness. More recently, Ref. [69] introduced the completely stabilizer-preserving operations (CSPO); i.e., a quantum operation Π is called completely stabilizer-preserving if for any reference system R,

D. Magic measures of quantum states
We review some of the magic measures of quantum states in Table I. In particular, the maxthauma of a quantum state ρ is defined as [88]: where the max-relative entropy D max (ρ σ) was defined in [25].

A. Completely Positive-Wigner-Preserving operations
A quantum circuit consisting of an initial quantum state, unitary evolutions, and measurements, each having non-negative Wigner functions, can be classically simulated [61]. It is thus natural to consider free operations to be those that completely preserve the positivity of the Wigner function. Indeed, any such quantum operations are proved to be efficiently simulated via classical algorithms in Section VI and thus become reasonable free operations for the resource theory of magic.

Definition 1 (Completely PWP operation)
A Hermiticity-preserving linear map Π is called completely positive Wigner preserving (CPWP) if for any system R with odd dimension, the following holds Figure 1 depicts the relationship between stabilizer operations, completely stabilizerpreserving operations, and completely PWP operations.
Definition 2 (Discrete Wigner function of a quantum channel) Given a quantum channel N A→B , its discrete Wigner function is defined as Here the Choi-Jamiołkowski matrix [22,49] of N is given by where {|i A } i and {|i A } i are orthonormal bases on isomorphic Hilbert spaces H A and H A , respectively. More generally, the discrete Wigner function of a Hermiticity-preserving linear map P A→B can be defined using the same formula in (28), by substituting N therein with P.
From the definition above and the properties recalled in Section II B, it follows for a quantum channel because where the penultimate equality follows from the fact that N is trace preserving (in fact here we did not require complete positivity or even linearity). Due to the normalization in (29), W N (v|u) can be interpreted as a conditional quasi-probability distribution. Furthermore, the discrete Wigner function of a channel allows one to determine the output Wigner function from the input Wigner function by propagating the quasi-probability distribitions just as one does in the classical case: Lemma 1 For an input state ρ AR and a quantum channel N A→B with respective Wigner functions W ρ AR (u, y) and W N (v|u), the Wigner function W N (ρ AR ) (v, y) of the output state N A→B (ρ AR ) is given by Proof. The proof is straightforward: All steps follow from definitions and the properties of the phase-space point operators recalled in Section II B. In particular, we made use of the fact that

Theorem 2
The following statements about CPWP operations are equivalent: 1. The quantum channel N is CPWP; 2. The discrete Wigner function of the Choi-Jamiołkowski matrix J N is non-negative; 3. W N (v|u) is non-negative for all u and v (i.e., W N (v|u) is a conditional probability distribution or classical channel).
Proof. 1 → 2: Let us first apply the (stabilizer) qudit controlled-NOT gate CNOT d to the stabilizer state |+ ⊗ |0 to prepare the maximally entangled state Φ d ∈ W + . Since N completely preserves the positivity of the Wigner function, it follows that 2 → 3: We find that In the last inequality, we note that A u A = (A u A ) T and we can always find such u since {A u is a conditional probability distribution follows from the inequality in (39) and the constraint in (29).
3 → 1: If the channel N has a non-negative Wigner function, then for an input state ρ AR such that ρ AR ∈ W + , it follows from Lemma 1 that concluding the proof.

B. Logarithmic negativity (mana) of a quantum channel
To quantify the magic of quantum channels, we introduce the mana (or logarithmic negativity) of a quantum channel N A→B :

Definition 3 (Mana of a quantum channel)
The mana of a quantum channel N A→B is defined as = log max = log max More generally, we define the mana of a Hermiticity-preserving linear map P A→B via the same formula above, but substituting N with P.
In the following, we are going to show that the mana of a quantum channel has many desirable properties, such as   6. Monotonicity under CPWP superchannels (Proposition 8), which implies monotonicity under completely stabilizer-preserving superchannels.

Proposition 3 (Reduction to states)
Let N be a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, with σ a state. Then Proof. Applying definitions and the fact that Tr[A u ] = 1 for a phase-space point operator A u , we find that concluding the proof.

Proposition 4 (Additivity)
For quantum channels N 1 and N 2 , the following additivity identity holds More generally, the same additivity identity holds if N 1 and N 2 are Hermiticity-preserving linear maps.
Proof. The proof relies on basic properties of the Wigner 1-norm and composite phase-space point operators, i.e., = log max This concludes the proof.

Proposition 5 (Subadditivity)
For quantum channels N 1 and N 2 , the following subadditivity inequality holds More generally, the same subadditivity inequality holds if N 1 and N 2 are Hermiticity-preserving linear maps.
Proof. Consider the following for an arbitrary phase-space point operator A u : Since the chain of inequalities holds for an arbitrary phase-space point operator A u , we conclude the statement of the proposition. Proof. To see the first claim, from the assumption that N is a quantum channel and (29), we find that Taking a maximization over u and applying a logarithm leads to the conclusion that M(N ) ≥ 0 for all channels N . Now suppose that N ∈ CPWP. Then by Theorem 2, it follows that W N (v|u) is a conditional probability distribution, so that v |W N (v|u)| = v W N (v|u) = 1 for all u. It then follows from the definition that M(N ) = 0.
Finally, suppose that M(N ) = 0. By definition, this implies that max u v |W N (v|u)| = 1. However, consider that the rightmost inequality in (58) holds for all channels. So our assumption and this inequality imply that v:W N (v|u)<0 |W N (v|u)| = 0 for all u, which means that W N (v|u) ≥ 0 for all u, v. By Theorem 2, it follows that N ∈ CPWP.

Proposition 7 (Amortization inequality) For any quantum channel N A→B , the following inequality holds
Furthermore, we have that Proof. The inequality in (59) is a direct consequence of reduction to states (Proposition 3) and subadditivity of mana with respect to serial compositions (Proposition 5). Indeed, letting N be a replacer channel that prepares the state ρ A , we find that for all input states ρ A , from which we conclude (59). By applying the inequality in (61) with the substitution N → id ⊗N , the additivity of the mana of a channel from Proposition 4, and the fact that the identity channel is free (and thus has mana equal to zero), we finally conclude that from which we conclude (60).
A CPWP superchannel Θ CPWP is a physical transformation of a quantum channel. That is, the superchannel realizes the following transformation of a channel N A→B to a channel EÂ →B in terms of CPWP channels P post BM →B and P prê A→AM : Theorem 8 (Monotonicity) Let N A→B be a quantum channel, and let Θ CPWP be a CPWP superchannel of the form (65). Then M(N ) is a channel magic measure in the sense that Proof. The inequality in (66) is a direct consequence of subadditivity of mana with respect to serial compositions (Proposition 5) and faithfulness (Proposition 6). Indeed, we find that concluding the proof.

C. Generalized thauma of a quantum channel
In this section, we define a rather general measure of magic for a quantum channel, called the generalized thauma, which extends to channels the definition from [88] for states. To define it, recall that a generalized divergence D(ρ σ) is any function of a quantum state ρ and a positive semi-definite operator σ that obeys data processing [64,72], i.e., D(ρ σ) ≥ D(N (ρ) N (σ)) where N is a quantum channel. Examples of generalized divergences, in addition to the trace distance and relative entropy, include the Petz-Rényi relative entropies [63], the sandwiched Rényi relative entropies [59,92], the Hilbert α-divergences [16], and the χ 2 divergences [76]. One can then define the generalized channel divergence [55], as a way of quantifying the distinguishability of two quantum channels N A→B and P A→B , as follows: where the optimization is with respect to all pure states ψ RA such that system R is isomorphic to the channel input system A (note that one does not achieve a higher value of D(N P) by allowing for an optimization over mixed states ρ RA with an arbitrarily large reference system [55], as a consequence of purification, the Schmidt decomposition theorem, and data processing). More generally, P A→B can be a completely positive map in the definition in (70). Interestingly, the generalized channel divergence is monotone under the action of a superchannel Ξ: We then define generalized thauma as follows:

Definition 4 (Generalized thauma of a quantum channel)
The generalized thauma of a quantum channel N A→B is defined as where the optimization is with respect to all completely positive maps E having mana M(E) ≤ 0.
It is clear that the above definition extends the generalized thauma of a state [88], which we recall is given by We now prove that the generalized thauma of a quantum channel reduces to the state measure whenever the channel N is a replacer channel: Proposition 9 (Reduction to states) Let N be a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, where σ is a state. Then Proof. First, denoting the maximally mixed state by π, consider that The first equality follows from the definition. The inequality follows by choosing the input state suboptimally to be π R ⊗ π A . The second equality follows because the max-relative entropy is invariant with respect to tensoring in the same state for both arguments. The third equality follows because π is a free state with non-negative Wigner function and E is a completely positive map with M(E) ≤ 0. Since one can reach all and only the operators ω ∈ W, the equality follows. Then the last equality follows from the definition. To see the other inequality, consider that E(ρ) = Tr[ρ]ω, for ω ∈ W, is a particular completely positive map satisfying M(E) = M(ω) ≤ 0, so that This concludes the proof.
That the generalized thauma of channels proposed in (72) is a good measure of magic for quantum channels is a consequence of the following proposition: Theorem 10 (Monotonicity) Let N A→B be a quantum channel, and let Ξ CPWP be a CPWP superchannel of the form in (65). Then θ(N ) is a channel magic measure in the sense that Proof. The idea is to utilize the generalized divergence and its basic property of data processing. In more detail, consider that The first inequality follows from the fact that the generalized divergence of channels is monotone under the action of a superchannel [40, Section V-A]. The second inequality follows from the monotonicity of M(N ) given in Theorem 8 (and which extends more generally to completely positive maps as stated there). This monotonicity implies that M(E) ≥ M(Ξ CPWP (E)) and leads to the second inequality.
A generalized divergence is called strongly faithful [7] if for a state ρ A and a subnormalized state Proposition 11 (Faithfulness) Let D be a strongly faithful generalized divergence. Then the generalized thauma θ(N ) of a channel N defined through D is non-negative and it is equal to zero if N ∈ CPWP. If the generalized divergence is furthermore continuous and θ(N ) = 0, then N ∈ CPWP.
Proof. From Lemma 27 in Appendix A, it follows that any completely positive map E subject to the constraint M(E) ≤ 0 is trace non-increasing on the set W + . It thus follows that E A→B (ψ RA ) is subnormalized for any input state ψ RA ∈ W + . By restricting the maximization to such input states in W + , applying the faithfulness assumption, and applying the definition of generalized thauma, we conclude that θ(N ) ≥ 0.
Suppose that N ∈ CPWP. Then by Proposition 6, M(N ) = 0 and so we can set E = N in the definition of generalized thauma and conclude from the faithfulness assumption that θ(N ) = 0.
Finally, suppose that θ(N ) = 0. By the assumption of continuity, this means that there exists a completely positive map E satisfying D(N E) = 0. By Lemma 27 in Appendix A and the faithfulness assumption, this in turn means that N A→B (Φ RA ) = E A→B (Φ RA ) for the maximally entangled state Φ RA ∈ W + , which implies that N A→B = E A→B . However, we have that M(E) ≤ 0, implying that M(N ) = 0, since N is a channel and M(N ) ≥ 0 for all channels. By Proposition 6, we conclude that N ∈ CPWP.
As discussed in [55,71], a generalized divergence possesses the direct-sum property on classical-quantum states if the following equality holds: where p X is a probability distribution, {|x } x is an orthonormal basis, and {ρ x } x and {σ x } x are sets of states. We note that this property holds for trace distance, quantum relative entropy [79], and the Petz-Rényi [63] and sandwiched Rényi [59,92] quasi-entropies sgn(α −1) Tr ρ α σ 1−α and respectively. For such generalized divergences, which are additionally continuous, as well as convex in the second argument, we find that an exchange of the minimization and the maximization in the definition of the generalized thauma is possible: Proposition 12 (Minimax) Let D be a generalized divergence that is continuous, obeys the direct-sum property in (89), and is convex in the second argument. Then the following exchange of min and max is possible in the generalized thauma: Proof. Let E be a fixed completely positive map such that M(E) ≤ 0. Let ψ 1 RA and ψ 2 RA be input states to consider for the maximization. Due to the unitary freedom of purifications and invariance of generalized divergence with respect to unitaries, we can equivalently consider the maximization to be over the convex set of density operators acting on the channel input system A. Define for λ ∈ [0, 1]. Then the state purifies ρ λ A and is related to a purification |ψ λ RA of ρ λ A by an isometry. It then follows that The inequality follows from data processing, by applying a completely dephasing channel to the register R . The last equality again follows from data processing. So the objective function is concave in the argument being maximized (again thinking of the maximization being performed over density operators on A rather than pure states on RA). By assumption, for a fixed input state ψ RA , the objective function is convex in the second argument and the set of completely positive maps E satisfying M(E) is convex.
Then the Sion minimax theorem [74] applies, and we conclude the statement of the proposition.

Remark 1
Examples of generalized divergences to which Proposition 12 applies include the quantum relative entropy [79], the sandwiched Rényi relative entropy [59,92], and the Petz-Rényi relative entropy [63]. The proposition applies to the latter two by working with the corresponding quasi-entropies and then lifting the result to the actual relative entropies.

D. Max-thauma of a quantum channel
As a particular case of the generalized thauma of a quantum channel defined in (72), we consider the max-thauma of a quantum channel, which is the max-relative entropy divergence between the channel and the set of completely positive maps with non-positive mana. Specifically, for a given quantum channel N A→B , the max-thauma of N A→B is defined by where the minimum is taken with respect to all completely positive maps E satisfying M(E) ≤ 0 and is the max-divergence of channels [24]. (More generally, N and E could be arbitrary completely positive maps in (97).) Note that it is known that [7,29] where Φ RA is the maximally entangled state and J N AB is the Choi-Jamiołkowski matrix of the channel N A→B and similarly for J E AB . Due to the properties of max-relative entropy, it follows that Theorem 10 and Propositions 9, 11, and 12 apply to the max-thauma of a channel, implying reduction to states, that it is monotone with respect to completely CPWP superchannels, faithful, and obeys a minimax theorem, so that where N is a quantum channel and Ξ CPWP is a CPWP superchannel. We can alternatively express the max-thauma of a channel as the following SDP: Proposition 13 (SDP for max-thauma) For a given quantum channel N A→B , its max-thauma θ max (N ) can be written as where J N AB is the Choi-Jamiołkowski matrix of the channel N A→B . Moreover, the SDP dual to the above is as follows: Proof. Consider the following chain of equalities: = log min t : where the second equality follows from (98) and the last from the fact that E is completely positive and thus in one-to-one correspondence with positive semi-definite bipartite operators. The dual SDP of θ max (N ) is given by which can be simplified to This concludes the proof.
Corollary 14 (Max-thauma vs. mana) For a quantum channel N A→B , its max-thauma does not exceed its mana: Proof. The proof is a direct consequence of the primal formulation in (104). By setting Y AB = J N AB , we find that where the last equality follows from (43).

Proposition 15 (Additivity)
For two given quantum channels N 1 and N 2 , the max-thauma is additive in the following sense: Proof. The idea of the proof is to utilize the primal and dual SDPs of θ max (N ) from Proposition 13. On the one hand, suppose that the optimal solutions to the primal SDPs (105) for θ max (N 1 ) and On the other hand, considering Eq. (96), suppose that the optimal solutions for N 1 and N 2 are E 1 and E 2 , respectively. Noting that M(E 1 ⊗ E 2 ) = M(E 1 ) + M(E 2 ) ≤ 0, and employing (98) and the additivity of the max-relative entropy, we find that This concludes the proof.
The following lemma is essential to establishing subadditivity of max-thauma of channels with respect to serial composition, as stated in Proposition 17 below. We suspect that Lemma 16 will find wide use in general resource theories beyond the magic resource theory considered in this paper. For example, it leads to an alternative proof of [7,Proposition 17].

Lemma 16 (Subadditivity of max-divergence of channels) Given completely positive maps
B→C , E 1 A→B , and E 2 B→C , the following subadditivity inequality, with respect to serial compositions, holds for the max-channel divergence of (97)-(98): where we have made the abbreviations Proof. Recall the "data-processed triangle inequality" from [23]: which holds for P a positive map and ρ, ω, and σ positive semi-definite operators. Note that one can in fact see this as a consequence of the submultiplicativity of the operator norm and the data-processing inequality of max-relative entropy for positive maps: Let us pick where Φ denotes the maximally entangled state. We find that The first equality follows from (98). The first inequality follows from (120) with the choices in (127). The second inequality follows because D max ((id ⊗N 1 )(Φ) (id ⊗E 1 )(Φ)) = D max (N 1 E 1 ), as a consequence of (98), and the channel divergence D max (N 2 E 2 ) involves an optimization over all bipartite input states, one of which is (id ⊗E 1 )(Φ).

Remark 2
The proof above applies to any divergence that obeys the data-processed triangle inequality, which includes the Hilbert α-divergences of [16], as discussed in [7, Appendix A].

Proposition 17 (Subadditivity)
For two given quantum channels N 1 and N 2 , the max-thauma is subadditive in the following sense: Proof. This is a direct consequence of Lemma 16 above. Let E i be the completely positive map satisfying M(E i ) ≤ 0 and that is optimal for N i with respect to the max-thauma θ max , for i ∈ {1, 2}. Then applying Lemma 16, we find that The equality follows from the assumption that E i is the completely positive map satisfying M(E i ) ≤ 0, which is optimal for N i with respect to the max-thauma θ max , for i ∈ {1, 2}.
Given that, by assumption, M(E i ) ≤ 0 for i ∈ {1, 2}, it follows from Proposition 5 that M(E 2 • E 1 ) ≤ 0. Since the max-thauma involves an optimization over all completely positive maps E satsifying M(E) ≤ 0, we conclude that which is the statement of the proposition.

Proposition 18 (Amortization inequality)
For any quantum channel N A→B , the following inequality holds with the optimization performed over input states ρ A . Moreover, the following inequality also holds Proof. The inequality in (135) is a direct consequence of reduction to states (Proposition 99) and subadditivity of max-thauma with respect to serial compositions (Proposition 17). Indeed, letting N be a replacer channel that prepares the state ρ A , we find that for all input states ρ A , from which we conclude (135).
To arrive at the inequality in (136), we make the substitution N → id ⊗N , apply the above reasoning, the additivity in Proposition 15, and the fact that the identity channel is free (CPWP), to conclude that the following holds for all input states ρ RA from which we conclude (136).
To summarize, the properties of θ max (N ) are as follows: 1. Reduction to states: θ max (N ) = θ max (σ) when the channel N is a replacer channel, acting as N (ρ) = Tr[ρ]σ for an arbitrary input state ρ, where σ is a state.

Remark 3
Due to the subadditivity inequality in Proposition 17, the additivity identity in Proposition 15, and faithfulness in (101), the following identities hold which have the interpretation that amortization in terms of arbitrary pre-and post-processing does not increase the max-thauma of a quantum channel.

A. Amortized magic
Since many physical tasks relate to quantum channels and time evolution rather than directly to quantum states, it is of interest to consider the non-stabilizer properties of quantum channels. Now having established suitable measures to quantify the magic of quantum channels, it is natural to figure out the ability of a quantum channel to generate magic from input quantum states. Let us begin by defining the amortized magic of a quantum channel:

Definition 5 (Amortized magic)
The amortized magic of a quantum channel N A→B is defined relative to a magic measure m(·) via the following formula: The strict amortized magic of a quantum channel is defined as That is, the amortized magic is defined as the largest increase in magic that a quantum channel can realize after it acts on an arbitrary input quantum state. The strict amortized magic is defined by finding the largest amount of magic that a quantum channel can realize when a stabilizer state is given to it as an input. Such amortized measures of resourcefulness of quantum channels were previously studied in the resource theories of quantum coherence (e.g., [5,29,34,57]) and quantum entanglement (e.g., [6,53,56,75,87]). They have been considered in the context of an arbitrary resource theory in [53,Section 7].

Proposition 19
Given a quantum channel N A→B , the following inequalities hold Proof. These statements are an immediate consequence of the amortization inequality for M(ρ) and θ max (ρ) given in Propositions 7 and 18, respectively.

B. Distillable magic of a quantum channel
The most general protocol for distilling some resource by means of a quantum channel N employs n invocations of the channel N interleaved by free channels [53,Section 7]. In our case, the resource of interest is magic, and here we take the free channels to be the CPWP channels discussed in Section III A. In such a protocol, the instances of the channel N are invoked one at a time, and we can integrate all CPWP channels between one use of N and the next into a single CPWP channel, since the CPWP channels are closed under composition. The goal of such a protocol is to distill magic states from the channel.
In more detail, the most general protocol for distilling magic from a quantum channel proceeds as follows: one starts by preparing the systems R 1 A 1 in a state ρ (1) . 2: The most general protocol for distilling magic from a quantum channel.
Wigner function, by employing a free CPWP channel F (1) Continuing the above steps, given state ρ (i) R i A i after the action of i − 1 invocations of the channel N A→B and interleaved CPWP channels, we apply the channel N A i →B i and the CPWP channel F (i+1) After n invocations of the channel N A→B have been made, the final free CPWP channel F (n+1) RnBn→S produces a state ω S on system S, defined as Such a protocol is depicted in Figure 2. Fix ε ∈ [0, 1] and k ∈ N. The above procedure is an (n, k, ε) ψ-magic distillation protocol with rate k/n and error ε, if the state ω S has a high fidelity with k copies of the target magic state ψ, A rate R is achievable for ψ-magic state distillation from the channel N , if for all ε ∈ (0, 1], δ > 0, and sufficiently large n, there exists an (n, n(R − δ), ε) ψ-magic state distillation protocol of the above form. The ψ-distillable magic of the channel N is defined to be the supremum of all achievable rates and is denoted by C ψ (N ). A common choice for a non-Clifford gate is the T -gate. The qutrit T gate [47] is given by where ξ = e 2πi/9 is a primitive ninth root of unity. The T gate leads to the T magic state by inputting the stabilizer state |+ to the T gate. Furthermore, by the method of state injection [39,94], one can generate a T gate by acting with stabilizer operations on the T state |T .
In what follows, we use quantum hypothesis testing to establish an upper bound on the rate at which one can distill qutrit T states. The proof follows the general method in [6, Theorem 1] and [5, Theorem 1], which was later generalized to an arbitrary resource theory in [53,Section 7]. Proposition 20 Given a quantum channel N , the following upper bound holds for the rate R = k/n of an (n, k, ε) T -magic distillation protocol: Consequently, the following upper bound holds for the T -distillable magic of a quantum channel N : Proof. Consider an arbitrary (n, k, ε) T -magic state distillation protocol of the form described previously. Such a protocol uses the channel n times, starting from the state ρ RnAn , and ω S step by step along the way, such that the final state ω S has fidelity 1 − ε with |T ⊗k , where |T = T |+ is the corresponding magic state of the T gate. By assumption, it follows that while the result in [88] implies that for all σ S ∈ W with the same dimension as ω S . Applying the data processing inequality for the max-relative entropy, with respect to the measurement channel we find that ≥ log(1 − ε) + k log(1 + 2 sin(π/18)).
Moreover, by labeling ω S as ρ (n+1) , we find that The first equality follows because θ max (ρ (1) ) = 0 and by adding and subtracting terms. The first inequality follows because the max-thauma of a state does not increase under the action of a CPWP channel. The last inequality follows from applying Proposition 18. Hence, nθ max (N ) ≥ log(1 − ε) + k log(1 + 2 sin(π/18)), which implies that This concludes the proof.
We note here that similar results in terms of max-relative entropies have been found in the context of other resource theories. Namely, a channel's max-relative entropy of entanglement is an upper bound on its distillable secret key when assisted by LOCC channels [23], the max-Rains information of a quantum channel is an upper bound on its distillable entanglement when assisted by completely PPT preserving channels [8], and the max-k-unextendibility of a quantum channel is an upper bound on its distillable entanglement when assisted by k-extendible channels [52].

C. Injectable quantum channel
In any resource theory of quantum channels, it tends to simplify for those channels that can be implemented by the action of a free channel on the tensor product of the channel input state and a resourceful state [53,Section 7] and [91, Section 6]. The situation is no different for the resource theory of magic channels. In fact, particular channels with the aforementioned structure have been considered for a long time in the context of magic states, via the method of state injection [39,94]. Here we formally define an injectable channel as follows: Definition 6 (Injectable channel) A quantum channel N is called injectable with associated resource state ω C if there exists a CPWP channel Λ AC→B such that the following equality holds for all input states ρ A : The notion of a resource-seizable channel was introduced in [7,91], and here we consider the application of this notion in the context of magic resource theory: Definition 7 (Resource-seizable channel) Let N A→B be an injectable channel with associated resource state ω C . The channel N is resource-seizable if there exists a free state κ pre RA with non-negative Wigner function and a post-processing free CPWP channel F post RB→C such that In the above sense, one seizes the resource state ω C by employing free pre-and post-processing of the channel N A→B .
An interesting and prominent example of an injectable channel that is also resource seizable is the channel T corresponding to the T gate. This channel T has the following action T (ρ) := T ρT † on an input state ρ. This channel is injectable with associated resource state ω C = |T T |, since one can use the method of circuit injection [94] to obtain the channel T by acting on |T T | with stabilizer operations. It is resource seizable because one can act on the free state |+ +| with the channel T in order to seize the underlying resource state |T T | = T (|+ +|).
As a generalization of the T channel example above, consider the channel ∆ p • T , where ∆ p is a dephasing channel of the form where p = (p 0 , p 1 , p 2 ), p 0 , p 1 , p 2 ≥ 0, and p 0 + p 1 + p 2 = 1. The channel is injectable with resource state ∆ p (|T T |), because the same method of circuit injection leads to the channel ∆ p • T when acting on the resource state ∆ p (|T T |). Furthermore, the channel ∆ p • T is resource seizable because one recovers the resource state ∆ p (|T T |) by acting with ∆ p • T on the free state |+ +|. For such injectable channels, the resource theory of magic channels simplifies in the following sense:

Proposition 21
Let N be an injectable channel with associated resource state ω C . Then the following inequalities hold where θ denotes the generalized thauma measures from Section III C. If N is also resource seizable, then the following equalities hold Proof. We first prove the first inequality in (168). Consider that = log ω C W,1 (174) The first two equalities follow from definitions. The inequality follows from Lemma 28 in the appendix. The third equality follows because the Wigner trace norm is multiplicative for tensorproduct operators. The fourth equality follows because A u A W,1 = 1 for any phase-space point operator A u .
We now prove the second inequality in (168): The first two equalities follow from definitions. The first inequality follows because the completely positive map E = Λ AC→B (· ⊗ σ C ) with σ C ∈ W is a special kind of completely positive map such that M(E) ≤ 0, due to the first inequality in (168). The second inequality follows from data processing under the channel Λ AC→B . The third equality follows because the generalized divergence is invariant under tensoring its two arguments with the same state ψ RA (again a consequence of data processing [92]). The final equality follows from the definition in (73).
The inequalities in (169) are a direct consequence of the definition of a resource-seizable channel, the fact that both the mana and the generalized thauma are monotone under the action of a CPWP superchannel (Theorems 8 and 10, respectively), and with F post RB→C (N A→B (κ pre RA )) understood as a particular kind of superchannel that manipulates N A→B to the state ω C . Furthermore, it is the case that the channel measures reduce to the state measures when evaluated for preparation channels that take as input a trivial one-dimensional system, for which the only possible "state" is the number one, and output a state on the output system (see Proposition 3 and (99)).
Applying Proposition 21 to the channel T and applying some of the results in [88], we find that θ max (T ) = θ(T ) = θ max (|T T |) = θ(|T T |) = log(1 + 2 sin(π/18)). (182) The notion of an injectable channel also improves the upper bounds on the distillable magic of a quantum channel:

Proposition 22
Given an injectable quantum channel N with associated resource state ω C , the following upper bound holds for the rate R = k/n of an (n, k, ε) T -magic distillation protocol: where . Consequently, the following upper bound holds for the T -distillable magic of the injectable quantum channel N : Proof. Consider an arbitrary (n, k, ε) T -magic state distillation protocol of the form described previously. Due to the injection property, it follows that such a protocol is equivalent to a CPWP channel acting on the resource state ω ⊗n C (see Figure 5 of [53]). So the channel distillation problem reduces to a state distillation problem. Applying Proposition 4 of [88] and standard inequalities for the hypothesis testing relative entropy from [82], we conclude the bound in (183). Then taking limits, we arrive at (184).
Of particular interest is to study exact gate synthesis of multi-qudit unitary gates from elements of the Clifford group supplemented by T gates. More generally, a fundamental question is to determine how many instances of a given quantum channel N are required simulate another quantum channel N , when supplemented with CPWP channels. That is, such a channel synthesis protocol has the following form: as depicted in Figure 3. Let S N (N ) denote the smallest number of N channels required to implement the quantum channel N exactly. Note that it might not always be possible to have an exact simulation of the channel N when starting from another channel N . For example, if N is a unitary channel and N is a noisy depolarizing channel, then this is not possible. In this case, we define S N (N ) = ∞.
In the following, we establish lower bounds on gate synthesis by employing the channel measures of magic introduced previously.

Proposition 23
For any qudit quantum channel N , the number of channels N required to implement it is bounded from below as follows: If the channel N is injectable with associated resource state ω C , then the following bound holds Proof. Suppose that the simulation of N is realized as in (185). Applying Proposition 5 iteratively, we find that where the equality follows from Proposition 6 and the assumption that each F i is a CPWP channel. Then n ≥ M(N ) M(N ) . Since this inequality holds for an arbitrary channel synthesis protocol, we find that S N (N ) ≥ M(N ) M(N ) . Applying Propositions 17 and 11 in a similar way, we conclude that S N (N ) ≥ θmax(N ) θmax(N ) . If the channel is injectable, then the upper bounds in (168) apply, from which we conclude (187).
As a direct application, we investigate gate synthesis of elementary gates. In the following, we prove that four T gates are necessary to synthesize a controlled-controlled-X qutrit gate (CCX gate) exactly.

Proposition 24
To implement a controlled-controlled-X qutrit gate, at least four qutrit T gates are required. Proof. By direct numerical evaluation, we find that which means that four qutrit T gates are necessary to implement a qutrit CCX gate.
For NISQ devices, it is natural to consider gate synthesis under realistic quantum noise. One common noise model in quantum information processing is the depolarizing channel: Suppose that a T gate is not available, but instead only a noisy version D p • T of it is. Then it is reasonable to consider the number of noisy T gates required to implement a low-noise CCX gate, and the resulting lower bound is depicted in Figure 4. Considering the depolarizing noise (p = 0.01) and applying Proposition 23, the lower bound is given by

A. Classical algorithm for simulating noisy quantum circuits
An operational meaning associated with mana is that it quantifies the rate at which a quantum circuit can be simulated on a classical computer. Inspired by [61], we propose an algorithm for simulating quantum circuits in which the operations can potentially be noisy. We show that the complexity of this algorithm scales with the mana (the logarithmic negativity) of quantum channels, establishing mana as a useful measure for measuring the cost of classical simulation of a (noisy) quantum circuit. For recent independent and related work, see [67].
Let H ⊗n d be the Hilbert space of an n-qudit system. Consider an evolution that consists of the sequence {N l } L l=1 of channels acting on an input state ρ. Then the probability of observing the POVM measurement outcome E, where 0 ≤ E ≤ I, can be computed according to the Born rule as where − → u = (u 0 , . . . , u L ) represents a vector in the discrete phase space and W (E|·) is the discrete Wigner function of the measurement operator E (cf., Eq. (10)). For the base case L = 1, this follows from the properties of the discrete Wigner function: The case of general L follows by induction.
Our goal is to estimate Tr E(N L • · · · •N 1 )(ρ) with additive error. In what follows, we assume that the input state is ρ = |0 n 0 n | and the desired outcome is |0 0|. This assumption is without loss of generality, since we can reformulate both the state preparation and the measurement as quantum channels Consequently, we have To describe the simulation algorithm, we define the negativity of quantum states and channels as Then a noisy circuit comprised of the channels {N l } L l=1 can be simulated as follows. We sample the initial phase point u 0 according to the distribution |W |0 n 0 n | (u 0 )|/M ρ and, for each l = 1, . . . , L, we sample a phase point u l according to the conditional distribution |W N l (u l |u l−1 )|/M N l (u l−1 ), after which we output the estimate This gives an unbiased estimate of the output probability since Note that M |0 n 0 n | = 1 since |0 n is trivially a stabilizer state. Also for any stabilizer POVM which implies that Therefore, the estimate that we output has absolute value bounded from above by By the Hoeffding inequality, it suffices to take samples to estimate the probability of a fixed measurement outcome with accuracy and success probability 1 − δ.
In the description of the above algorithm, we have used the discrete Wigner representation of quantum states, channels, and measurement operators. However, the algorithm can be generalized using the frame and dual frame representation along the lines of the work [61]. Specifically, for any frame {F (λ) : λ ∈ Λ} and its dual frame {G(λ) : λ ∈ Λ} on a d-dimensional Hilbert space, we define the corresponding quasiprobability representation of a state, a channel, and a measurement operator respectively as and the above discussion carries through without any essential change. For simplicity, we omit the details here. Note that for the discrete Wigner function representation that we used in our paper, the correspondence is F (λ) = A u /d and G(λ) = A u .

B. Comparison of classical simulation algorithms for noisy quantum circuits
Recently, the channel robustness and the magic capacity were introduced to quantify the magic of multi-qubit noisy circuits [69]. To be specific, given a quantum channel N , the channel robustness is defined as and the magic capacity is defined as They are related by the inequality [69] where R(·) is the robustness of magic (cf. Section II) and Φ N is the normalized Choi-Jamiołkowski operator of N . The authors of [69] further developed two matching simulation algorithms that scale quadratically with these channel measures. Here, we compare their approach with the one described in Section VI A for simulating noisy qudit circuits. Note that neither the proof of (209) nor the static Monte Carlo algorithm of [69] depend on the dimensionality of the underlying system, so those results can be generalized to any qudit system with odd prime dimension. Thus, we consider an n-qudit system with the underlying Hilbert space H ⊗n d , where d is an odd prime. Consider a noisy circuit consisting of the sequence {N l } L l=1 of channels acting on the initial state |0 n , after which a computational basis measurement is performed. To describe the simulation algorithm based on channel robustness [69], we assume each N j has the optimal decomposition with respect to the set of CSPOs where R * (N j ) = 2p j + 1. For any k ∈ Z L 2 , define We sample a k ∈ Z L 2 from the distribution |p k |/ p 1 and simulate the evolution N L,k L • · · · • N 2,k 2 • N 1,k 1 [95]. To achieve accuracy and success probability 1 − δ, it suffices to take samples.
To compare it with the mana-based simulation algorithm, we first prove that that the exponentiated mana of a quantum channel is always smaller than or equal to the channel robustness. To establish the separation, we introduce the robustness of magic with respect to non-negative Wigner function as follows.
Since Stab ⊂ W + , we have The inequality in (218) follows due to the triangle inequality. The equality in (219) follows since Λ ± ∈ CSPO and then Λ ± (A u ) W,1 = 1 for any u. Furthermore, we demonstrate the strict separation between 2 M(N ) and R * (N ) via the following example. Let us consider the diagonal unitary Note that the T gate is a special case, given by U 2π . Due to Eq. (214), the separation between M(U θ ) and log R W + (Φ U θ ) in Figure 5 indicates that the mana of a channel can be strictly smaller than the channel robustness and magic capacity, i.e., This concludes the proof.
Applying Proposition 25 to the channels {N l } L l=1 , we find that Thus for an n-qudit system with odd prime dimension, the sample complexity of the mana-based approach is never worse than the algorithm of [69] based on channel robustness. Furthermore, the separation demonstrated in (222) indicates that the mana-based algorithm can be strictly faster for certain quantum circuits. The above example shows that the mana of a quantum channel can be smaller than its magic capacity [69], due to (222), but it is not clear whether this relation holds for general quantum channels. Comparison between M U θ and R W+ (Φ U θ ) for π ≤ θ ≤ 2π. The gap indicates that M U θ is strictly smaller than C(U θ ) and R * (U θ ).

A. Non-stabilizerness under depolarizing noise
For near term quantum technologies, certain physical noise may occur during quantum information processing. One common quantum noise model is given by the depolarizing channel: where p ∈ [0, 1] and X, Z are the generalized Pauli operators. Let us suppose that depolarizing noise occurs after the implementation of a T gate. From Figure 6, we find that if the depolarizing noise parameter p is higher than or equal to 0.62, then the channel D p • T cannot generate any non-stabilizerness. That is, the channel D p • T becomes CPWP after this cutoff.
Another interesting case is the CCX gate. Let us suppose that depolarizing noise occurs in parallel after the implementation of the CCX gate. The mana of D ⊗3 p • CCX is plotted in Figure 7, where we see that it decreases linearly and becomes equal to zero at around p ≈ 0.75.

B. Werner-Holevo channel
An interesting qutrit channel is the qutrit Werner-Holevo channel [89]: In what follows, we find that the Werner-Holevo channel maps any quantum state to a free state in W + (state with non-negative Wigner function), while its amortized magic is given by our channel measure. This also indicates that the ancillary reference system is necessary to consider in the study of the resource theory of magic channels.

Proposition 26
For the qutrit Werner-Holevo channel N WH , for any input state ρ (which is restricted to be a state of the channel input system), while Proof. On the one hand, for any input state ρ, we find that where the first inequality follows because A u ∞ = A 0 ∞ = 1, since A u = T u A 0 T † u , the matrix T u is unitary, and A 0 for a qutrit is explicitly given by the following unitary transformation: On the other hand, we can set ρ RA to be the maximally entangled state, and we find from numerical calculations that Meanwhile, we find from numerical calculations that M(N WH ) = 5/3, which by Proposition 7 means that Similarly, we find from numerical calculations that This concludes the proof.

VIII. CONCLUSION
We have introduced two efficiently computable magic measures of quantum channels to quantify and characterize the non-stabilizer resource possessed by quantum channels. These two channel measures have application in evaluating magic generating capability, gate synthesis, and classical simulation of noisy quantum circuits. More generally, our work establishes fundamental limitations on the processing of quantum magic using noisy quantum circuits, opening new perspectives for the investigation of the resource theory of quantum channels in fault-tolerant quantum computation.
One future direction is to explore tighter evaluations of the distillable magic of quantum channels. We think that it would also be interesting to explore other applications of our channel measures and generalize our approach to the multi-qubit case.