No-go theorems for quantum resource purification II: new approach and channel theory

It has been recently shown that there exist universal fundamental limits to the accuracy and efficiency of the transformation from noisy resource states to pure ones (e.g.,~distillation) in any well-behaved quantum resource theory [Fang/Liu, Phys. Rev. Lett. 125, 060405 (2020)]. Here, we develop a novel and powerful method for analyzing the limitations on quantum resource purification, which not only leads to improved bounds that rule out exact purification for a broader range of noisy states and are tight in certain cases, but also enable us to establish a robust no-purification theory for quantum channel (dynamical) resources. More specifically, we employ the new method to derive universal bounds on the error and cost of transforming generic noisy channels (where multiple instances can be used adaptively, in contrast to the state theory) to some unitary resource channel under any free channel-to-channel map. We address several cases of practical interest in more concrete terms, and discuss the connections and applications of our general results to distillation, quantum error correction, quantum Shannon theory, and quantum circuit synthesis.


I. INTRODUCTION
Quantum technologies, such as quantum computing, quantum communication, and quantum cryptography, are an exciting frontier of science, due to their promising potential of achieving substantial advantages over conventional methods that may spark an important technological revolution. However, quantum systems are inherently highly susceptible to noise and errors in real-world scenarios, which often make them unreliable or difficult to scale up. This poses a serious challenge to realizing the potential power of quantum technologies in practice. The noise problem is particularly pressing at the moment, as we are now at a critical juncture where we are starting to make real effort to put the theoretically blueprinted quantum technologies into practice [1,2]. In order to ease the effects of noise, we would generally need techniques that can "purify" the noisy systems. To this end, methods such as quantum error correction [3] and distillation [4][5][6][7] are developed and have become central research topics in quantum information.
Behind the power of quantum technologies is the manipulation and utilization of various forms of quantum "resources" such as entanglement [8], coherence [9], and "magic" [7,10]. These different kinds a quantum resources can be commonly understood and characterized using the universal framework of "quantum resource theory" (see, e.g., Ref. [11] for an introduction), which have been under active developments in recent years. Recently, Ref. [12] revealed a fundamental principle of quantum mechanics that there exists universal limitations on the accuracy and efficiency of purifying noisy states in * fangkun02@baidu.com † zliu1@perimeterinstitute.ca general quantum resource theories, by employing oneshot resource theory ideas [13]. However, Ref. [12] is only part of the story and there are two gaps that we would like to fill to make the picture more complete. First, the results there assume the input states to be full-rank and it is not fully understood whether there are no-purification rules when the input state is noisy but not of full rank. Second, the approach developed there is primarily designed for state or static resources, but given that the manipulation of channel or dynamical resources plays intrinsic roles in many scenarios including quantum computation, communication, and error correction, it is also important to understand whether the no-purification principles extend to quantum channels. In this work, we develop a novel approach to establishing fundamental limits of general quantum resource purification tasks, which addresses the above problems. This approach is built upon decompositions of the input that separate out the free parts. As we demonstrate, such decompositions link the weight of the free parts, a key quantity that we call free component, to the optimal error of purification. We apply this approach to both quantum states and channel resource theories. For state theories, we use the new method to derive new bounds on the error and efficiency of deterministic purification or distillation tasks, which significantly improve those in Ref. [12]. More specifically, the new results lift the fullrank assumption and imply no-purification principles for a broader range of mixed states. Furthermore, they are quantitatively better and are shown to be tight in certain simple cases. We use several concrete examples to demonstrate the improvements and show that the new bounds are tight in certain cases. Next, as a major contribution of this work, we develop a comprehensive no-purification theory for quantum channels (Ref. [12] presents only a zero-error result). Most importantly, there are two key complications of the channel theory arXiv:2010.11822v4 [quant-ph] 10 Mar 2022 that does not come up in the state theory: (i) There are several different ways to define channel fidelity measures; (ii) Multiple instances of channels can be used or consumed in various presumably inequivalent ways, such as in parallel, sequentially, or adaptively. Using the free component method, we derive bounds on the purification errors and costs for all cases. To provide a more concrete understanding, we shall discuss the roles and features of common noise channels in different types of channel resource theories, as well as providing guidelines for applying the no-purification bounds to a broad range of fields of great theoretical and practical interest, including distillation, quantum error correction, Shannon theory, and circuit (gate) synthesis.
We emphasize a particularly remarkable and counterintuitive feature of the no-purification principles, which is that they rule out any noisy-to-pure transformation for noisy input states or channels with free component, where the noisy inputs can be much more "resourceful" in terms of common resource measures or operational tasks than the pure targets. This is in sharp contrast with generic (such as pure-to-pure) transformation tasks where the transformability is naturally determined by the resource content in general. Also notably, our theory is applicable to virtually all well-defined resource theories (not even requiring the standard convexity assumption), highlighting the fundamental nature of the no-purification principles.
The paper is organized as follows. In Sec. II, we apply the free component method to state theories, and in particular discuss the improvements over previous results in Ref. [12]. In Sec. III, we establish the nopurification theory for quantum channels using the free component method. We first present general-form results in Sec. III A, and then elaborate on specific scenarios and applications in Sec. III B. Finally in Sec. IV we summarize the work and discuss future directions.

II. STATE THEORY
We first consider state resource theories, which are built upon the notions of free states and free operations that represent the allowed transformation among states. Here, we consider the most general resource theory framework with the "minimalist" requirement-the golden rule that any free operation must map a free state to another free state, or in other words, cannot create resource (see, e.g., Refs. [11,14,15]). This golden rule defines the largest possible set of operations that encompasses any legitimate set of free operations, and thus the fundamental limits induced by it apply universally to any nontrivial resource theory. Also, for mathematical rigor, we assume that the set of free states F has the following two reasonable, commonly held properties: (i) The composition of free states should be free, namely if ρ 1 , ρ 2 ∈ F then ρ 1 ⊗ ρ 2 ∈ F ; (ii) F is closed.
The following quantity that we call free component will play a central role in our theory: Definition 1 (Free component). The free component of quantum state ρ is defined as Equivalently, where D is the set of all density matrices. That is, the free component is directly related to the "weight of resource" W , which is recently studied in general resource theory contexts [16,17], by Γ ρ = 1 − W ρ . Another equivalent form is Γ(ρ) = min σ∈F 2 Dmax(σ ρ) where the maxrelative entropy is defined by D max (σ ρ) := log min{t : σ ≤ tρ} if supp(σ) ⊆ supp(ρ) and +∞ otherwise [18]. Note that, if F can be characterized by semidefinite conditions (which is quite common, e.g., in coherence theory F = {σ : σ ≥ 0, Tr σ = 1, σ = ∆(σ)}, where ∆ is the dephasing channel erasing the off-diagonal entries), then Γ ρ can be efficiently computed by semidefinite programming (SDP) for given ρ. In the resource theory of thermodynamics, for Hamiltonian H and inverse temperature β the Gibbs (thermal) state σ := e −βH / Tr e −βH is the only free state and we thus have a closed-form formula for free component as Γ ρ = 1 λ max (ρ −1 σ) (where λ max denotes the largest eigenvalue) if supp(ρ) ⊇ supp(σ) and zero otherwise [19,Theorem 2].
It can be easily seen that the free component obeys the desirable monotonicity property that it cannot be reduced by free operations.
Also, let f ψ denote the maximum overlap of pure state ψ = |ψ ψ| with free states, namely, We now prove an improved deterministic no-purification theorem using a method different from Ref. [12], which directly connects the accuracy of purifying a noisy state with its free component.
Theorem 4. Given any state ρ and any pure state ψ, there is no free operation that transforms ρ to ψ with error smaller than Γ ρ (1 − f ψ ). That is, it holds for any free operation N that Proof. By the definition of Γ ρ , there exists free state σ ∈ F and state τ such that ρ can be decomposed as follows: Let N be any free operation. By linearity, Then it holds that where the inequality follows from Tr N (σ)ψ ≤ f ψ since N (σ) ∈ F by the golden rule, and Tr N (τ )ψ ≤ 1.
As first noted in Ref. [12], we can translate the upper bounds on transformation accuracy into lower bounds on the "amount" of input resources required to achieve a certain target, in particular, the cost of many-copy distillation procedures, which are widely considered for various purposes in quantum computation and information [4][5][6][7]. The above Theorem 4 induces the following general lower bound on distillation overhead. Corollary 5. Consider distillation procedures represented by a free operation that transform n copies of noisy states ρ to a target pure state ψ within error . Then n must satisfy: Proof. Suppose the transformation is given by the free operation N . Then it holds from Theorem 4 that Note that, due to the super-multiplicity property from Proposition 3, we get Γ ρ ⊗n ≥ (Γ ρ ) n . This gives which is equivalent to the above assertion.
Our new method essentially replaces the minimum eigenvalue of ρ in the corresponding bounds in Ref. [12] (which we refer to as the min-eigenvalue bounds) by its free component, which represents a significant improvement from both qualitative and quantitative perspectives, as detailed in the following.
First, the range of applicability of the no-purification theorem is significantly extended. The proof using the quantum hypothesis testing relative entropy presented in Ref. [12] applies only to full-rank input states. However, Theorem 4 implies that the no-purification rule actually holds more broadly (see also Ref. [20,Proposition 2]): Corollary 6. There is no free operation that exactly transforms a state ρ to any pure state ψ / ∈ F if Γ ρ > 0.
Proof. Since F is closed by assumption and ψ / ∈ F , we have f ψ < 1. Then due to Theorem 4, the transformation error ε > 0, indicating that exact transformation is impossible.
It is clear that for any pure resource state ψ / ∈ F we have Γ ψ = 0, so the no-purification bounds can only be nontrivial for mixed states. Meanwhile, it can be immediately seen (e.g. from (b)) that the Γ > 0 condition is weaker than the full-rank condition. In fact, it holds as long as the support of ρ contains some free state in its support, which is generically the case for mixed states in common resource theories. Also note that the Γ > 0 condition does not necessarily hold for all mixed states. For a concrete example, consider the coherence theory defined by an orthonormal basis {|0 , |1 , |2 , |3 }. Consider the state ρ = (|ψ 1 ψ 1 | + |ψ 2 ψ 2 |)/2 where |ψ 1 = (|0 + |1 )/ √ 2, |ψ 2 = (|2 + |3 )/ √ 2. It can be verified that ρ is mixed but Γ ρ = 0 because any decreasing of the diagonal entries will render the matrix negative. It would be interesting to further understand and characterize the Γ > 0 condition in specific theories. Furthermore, note that the derivation and results (also the channel versions below) apply to continuous variable or infinite-dimensional quantum systems: the relevant quantities, the free component Γ and the maximum overlap f , can be defined likewise (supremum instead of maximum over F ), and the proof steps follow. In particular, Γ > 0, f < 1 still indicate no-purification. An elementary continuous variable example will be given later.
We remark that if we only require the purification transformation to succeed with some probability (the probabilistic setting), the Γ > 0 condition is not sufficient to rule out purification and it seems that the full-rank condition cannot be alleviated. For example, consider the following state with a flag register F : where ψ is the target pure state and τ is a state such that Γ τ > 0. Then we have Γ ρ > 0 (ρ is not full-rank), but we can obtain ψ with probability p simply by measuring F (which is conventionally free) and postselect on 0. Second, the new results are quantitatively better than the corresponding ones in Ref. [12] for full-rank input states. It is first straightforward to see that Γ ρ ≥ λ min ρ , where λ min ρ denotes the minimum nonzero eigenvalue of ρ, because ρ ≥ λ min ρ · I ≥ λ min ρ · σ for any state σ where I denotes the identity matrix on supp(ρ). So by definition, Γ ρ ≥ λ min ρ . In sum, the new free component bounds cover the min-eigenvalue bounds. In particular, when the noisy state ρ is close to the set of free states F , the minimum eigenvalue λ min ρ could still be small but Γ ρ approaches one. This indicates that the free component bounds potentially exhibit much tighter behaviors in the large error regime like when ρ is close to F . Importantly, the distillation overhead bound Corollary 5 indicates the key behavior that as ρ approaches F , it holds that n → ∞, i.e. the number of copies needed diverges, because Γ ρ → 1. This cannot be deduced from the min-eigenvalue bounds in Ref. [12]. Now we discuss the application of our general bounds in a few important specific scenarios that are of practical interest in diverse manners, showcasing the versatility of our theory. In particular, it is concretely demonstrated that the free component bounds can strictly outperform the corresponding min-eigenvalue bounds in Ref. [12] and notably, could be tight, in key scenarios.
Example 1 (Magic state distillation). Consider T states |T = T |+ = (|0 + e iπ/4 |1 )/ √ 2 contaminated by depolarizing or dephasing noise, given by where ζ is the noise rate, as the input. Note that we are interested in ζ ∈ (0, 1 − 1/ √ 2) so that τ is a mixed state 4 verified that ρ is mixed but Γ ρ = 0 because any decreasing of the diagonal entries will render the matrix negative. It would be interesting to further understand and characterize the Γ > 0 condition in specific theories. Furthermore, note that the derivation and results (also the channel versions below) apply to continuous variable or infinite-dimensional quantum systems: the relevant quantities, the free component Γ and the maximum overlap f , can be defined likewise (supremum instead of maximum over F ), and the proof steps follow. In particular, Γ > 0, f < 1 still indicate no-purification. An elementary continuous variable example will be given later.
We remark that if we only require the purification transformation to succeed with some probability (the probabilistic setting), the Γ > 0 condition is not sufficient to rule out purification and it seems that the full-rank condition cannot be alleviated. For example, consider the following state with a flag register F , where ψ is the target pure state and τ is a state such that Γ τ > 0. Then we have Γ ρ > 0 (ρ is not full rank), but we can obtain ψ with probability p simply by measuring F (which is conventionally free) and postselect on 0. Second, the new results are quantitatively better than the corresponding ones in Ref. [12] for full-rank input states. It is first straightforward to see that Γ ρ ≥ λ min ρ where λ min ρ denotes the minimum non-zero eigenvalue of ρ, because ρ ≥ λ min ρ · I ≥ λ min ρ · σ for any state σ where I denotes the identity matrix on supp(ρ). So by definition, Γ ρ ≥ λ min ρ . In sum, the new free component bounds cover the min-eigenvalue bounds. In particular, when the noisy state ρ is close to the set of free states F , the minimum eigenvalue λ min ρ could still be small but Γ ρ approaches one. This indicates that the free component bounds potentially exhibit much tighter behaviors in the large error regime like when ρ is close to F . Importantly, the distillation overhead bound Corollary 5 indicates the key behavior that as ρ approaches F , it holds that n → ∞, i.e. the number of copies needed diverges, because Γ ρ → 1. This cannot be deduced from the min-eigenvalue bounds in Ref. [12]. Now we discuss the application of our general bounds in a few important specific scenarios that are of practical interest in diverse manners, showcasing the versatility of our theory. In particular, it is concretely demonstrated that the free component bounds can strictly outperform the corresponding min-eigenvalue bounds in Ref. [12] and notably, could be tight, in key scenarios.
Example 1 (Magic state distillation). Consider T -states |T = T |+ = (|0 + e iπ/4 |1 )/ √ 2 contaminated by depolarizing or dephasing noise, given by where ζ is the noise rate, as the input. Note that we are interested in ζ ∈ (0, 1 − 1/ √ 2) so that τ is a mixed state τ is actually |T subject to p = 1 − 1/ √ 2 depolarizing noise and lies on the edge of STAB. τ is the noisy input state that lies between |T andτ . outside of the stabilizer hull. On the one hand, it can be directly checked that λ min τ = ζ/2. On the other hand, Γ τ is bounded as follows. Consider the free statē which sits at the edge of the stabilizer hull closest to |T (as depicted in Fig. 1). Then by definition we have with α = 1 2 (1 − γ) and β = 1−ζ 2 e −iπ/4 − γ 1−i 4 . By solving the determinant we obtain that when ζ ∈ (0, 1−1/ √ 2). This implies Γ τ > λ min τ , and thus the previous error bound is outperformed for any pure target state by a constant factor. As a sanity check, the bound indeed approaches 1 as ζ → 1 − 1/ √ 2, in contrast to the λ min bound. This indeed implies the expected phenomenon that the total distillation overhead blows up as τ approaches the stabilizer hull. In particular, for the standard task of distilling T -states, we thus obtain an improved bound on the average overhead following the proof of Theorem 3 in Ref. [12]: Corollary 8. Consider the following common formulation of magic state distillation task: given n copies of noisy states τ (defined in Eq. (16)), output an m-qubit state σ such that Tr σ i T = T |σ i |T ≥ 1− , ∀i = 1, · · · , m where σ i = Trī σ is the i-th qubit, by some free (stabilizerpreserving) operation. Then n must satisfy: Proof. By applying the union bound, we have The cross section of the Bloch sphere through the center perpendicular to the Z axis. The blue square represents the corresponding cross section of the stabilizer hull (STAB). τ is actually |T subject to p = 1 − 1/ √ 2 depolarizing noise and lies on the edge of STAB. τ is the noisy input state that lies between |T andτ . outside of the stabilizer hull. On the one hand, it can be directly checked that λ min τ = ζ/2. On the other hand, Γ τ is bounded as follows. Consider the free statē which sits at the edge of the stabilizer hull closest to |T (as depicted in Fig. 1). Then by definition we have with α = 1 2 (1 − γ) and β = 1−ζ 2 e −iπ/4 − γ 1−i 4 . By solving the determinant we obtain that when ζ ∈ (0, 1−1/ √ 2). This implies Γ τ > λ min τ , and thus the previous error bound is outperformed for any pure target state by a constant factor. As a sanity check, the bound indeed approaches 1 as ζ → 1 − 1/ √ 2, in contrast to the λ min bound. This indeed implies the expected phenomenon that the total distillation overhead blows up as τ approaches the stabilizer hull. In particular, for the standard task of distilling T states, we thus obtain an improved bound on the average overhead following the proof of Theorem 3 in Ref. [12].
For the depolarizing noise (here the dephasing noise has an equivalent effect), the noisy state is given by ρ is the noise rate. Then λ min ρ = µ/2, and it can be easily calculated that Γ ρ = µ. That is, the new error bound is twice the min-eigenvalue bound for any pure target state.
For the amplitude damping noise, the free component bounds have a more remarkable advantage. Here the noisy state is given by . We numerically solve Γ ρ , and compare it with λ min ρ in Fig. 2(b) (note that the values plotted are all multiplied by a factor 1/2; see below). Note that, as ν increases, i.e. ρ is more heavily damped towards the free state |0 , the error of purification is expected to grow. As can be seen from Fig. 2(b), as ν → 1, λ min ρ vanishes and so do corresponding bounds, but Γ ρ indeed keeps growing, showcasing an important scenario where only the free component bounds are nontrivial.
Let us explicitly consider |+ as the target state. It is known that the optimal fidelity of transforming ρ to |+ by free operations (MIO) can be solved by the following SDP [24, Theorem 3]: where ∆ takes the diagonal part of a given matrix. In Fig. 2, we plot the optimal error obtained by the above SDP as well as the free component and min-eigenvalue lower bounds for comparison. In particular, for depolarizing or dephasing noise, the free component error bound turns out to be tight, i.e. is achievable, for any noise rate.

Example 3 (Constrained quantum error correction).
Here we demonstrate how the state no-purification bounds can be used to find limits on quantum error correction (QEC). In particular, we consider the broadly important situations where the QEC procedures are subject to certain constraints (such as stabilizer or Clifford constraints, symmetries) so that resource theory becomes useful. Notice that the decoding procedures are aimed at recovering all logical states from noisy physical states, indicating connections between the no-purification bounds and the overall recovery accuracy. More specifically, we have the general result (L, S denote the logical and physical systems, respectively): Suppose the decoding operation is free. Then given encoding operation E L→S and noise channel N S acting on the physical system S, the error 5 in Eq. (17) and notice that f T ⊗m = (4 − 2 √ 2) −m [13,[21][22][23]. By plugging everything into Eq. (12) we obtain the claimed bound.
For the depolarizing noise (here the dephasing noise has an equivalent effect), the noisy state is given by ρ is the noise rate. Then λ min ρ = µ/2, and it can be easily calculated that Γ ρ = µ. That is, the new error bound is twice the min-eigenvalue bound for any pure target state.
For the amplitude damping noise, the free component bounds have a more remarkable advantage. Here the noisy state is given by . We numerically solve Γ ρ , and compare it with λ min ρ in Fig. 2(b) (note that the values plotted are all multiplied by a factor 1/2; see below). Note that, as ν increases, i.e. ρ is more heavily damped towards the free state |0 , the error of purification is expected to grow. As can be seen from Fig. 2(b), as ν → 1, λ min ρ vanishes and so do corresponding bounds, but Γ ρ indeed keeps growing, showcasing an important scenario where only the free component bounds are nontrivial.
Let us explicitly consider |+ as the target state. It is known that the optimal fidelity of transforming ρ to |+ by free operations (MIO) can be solved by the following SDP [24,Theorem 3] max Tr Gρ : where ∆ takes the diagonal part of a given matrix. In Fig. 2, we plot the optimal error obtained by the above SDP as well as the free component and min-eigenvalue lower bounds for comparison. In particular, for depolarizing or dephasing noise, the free component error bound turns out to be tight, i.e. is achievable, for any noise rate.

Example 3 (Constrained quantum error correction).
Here we demonstrate how the state no-purification bounds can be used to find limits on quantum error correction. In particular, we consider the broadly important situations where the error correction procedures are subject to certain constraints (such as stabilizer or Clifford constraints, symmetries) so that resource theory becomes useful. Notice that the decoding procedures are aimed at recovering all logical states from noisy physical states, indicating connections between the no-purification bounds and the overall recovery accuracy. More specifically, we have the general result (L, S denote the logical and physical systems respectively): Corollary 9. Suppose the decoding operation is free. Then given encoding operation E L→S and noise channel N S acting on the physical system S, the error of the recovery of pure logical state ψ L obeys ε ≥ , based on which we directly obtain bounds on measures of the overall accuracy of the code, such as the worst-case error given by maximization over ψ L , and the average-case error given by a certain (e.g. Haar) average over ψ L .
We further remark on the case of covariant (symmetryconstrained) codes, which play fundamental roles in quantum computing and physics and has drawn considerable recent interest [25][26][27][28][29][30][31]. Suppose we consider some compact continuous symmetry group G. Based on Lemma 2 in Ref. [29] 1 , it can be seen that when the noise channel N S is covariant (which is usually the case), then we can construct a covariant decoding operation that achieves the optimal error. That is, we can actually remove the freeness assumption of the decoder to apply the no-purification bounds, leading to the following adapted version: Corollary 10 (Covariant code). Let G be a compact continuous symmetry group. Let E L→S be a G-covariant encoding operation. Suppose the noise channel N S is Gcovariant. Then Corollary 9 (where the parameters are defined in terms of the G-asymmetry theory) holds for any decoder.
See Sec. III B 2 for related discussions and results in the channel setting.  Comparisons between the optimal achievable error of a standard purification task and the lower bounds induced by Γ (this work) and λ min (Ref. [12]) in coherence theory. The task is to recover the maximally coherent qubit state |+ under typical noise channels: (a) depolarizing and dephasing; (b) amplitude damping. The green and blue dashed lines are respectively the free component and min-eigenvalue lower bounds on the error, and the red line is the minimum error achieved by MIO computed by SDP Eq. (19). In (a) the green dashed line actually overlaps with the red line, indicating that the free component error bound is tight.
, based on which we directly obtain bounds on measures of the overall accuracy of the code, such as the worst-case error given by maximization over ψ L , and the average-case error given by a certain (e.g. Haar) average over ψ L .
We further remark on the case of covariant (symmetryconstrained) codes, which play fundamental roles in quantum computing and physics and has drawn considerable recent interest [25][26][27][28][29][30][31]. Suppose we consider some compact continuous symmetry group G. Based on Lemma 2 in Ref. [29] 1 , it can be seen that when the noise channel N S is covariant (which is usually the case), then we can construct a covariant decoding operation that achieves the optimal error. That is, we can actually remove the freeness assumption of the decoder to apply the no-purification bounds, leading to the following adapted version: Corollary 10 (Covariant code). Let G be a compact continuous symmetry group. Let E L→S be a G-covariant encoding operation. Suppose the noise channel N S is Gcovariant. Then Corollary 9 (where the parameters are defined in terms of the G-asymmetry theory) holds for any decoder.
See Sec. III B 2 for related discussions and results in the channel setting. Example 4 (Continuous variable). Lastly, we provide an elementary example of the application to continuousvariable theories. Consider continuous-variable nonclassicality, a characteristic resource feature in quantum optics that is closely relevant to, e.g., linear optical quantum computation [32] and metrology [33][34][35]. Here the coherent states of light and their probabilistic mixtures are considered free (classical). The coherent state corresponding to complex amplitude α ∈ C takes the form in the number state (Fock) basis {|n }. A prototypical type of nonclassical resource states is the (single-mode) squeezed states [36,37] generated by the squeezing operator S(r) := exp r â 2 − (â † ) 2 /2 (â andâ † are, respectively, the annihilation and creation operators) acting on the vacuum state |0 , where r ≥ 0 is the squeezing parameter. It can be calculated that using which we obtain where we used tanh r < 1. Then, to showcase an example of a no-purification bound, consider the task of distilling some squeezed state |s r from noisy state ρ using free, namely classicality-preserving operations (which, in particular, include passive linear optical operations) [34,38]. Then Theorem 4 directly implies that the transformation error ε ≥ Γ ρ [1 − (cosh r) −1 ], from which it can be observed that the task indeed becomes more demanding as the squeezing parameter increases. Like the discretevariable setting, for specific noise models, it is often easy to calculate or bound Γ ρ so that the error bound can be further specified.

III. CHANNEL THEORY
We now extend the no-purification theory to quantum channels or dynamical settings. The channel analog of purification is to transform a noisy channel (or noisy channels, as will be discussed) to a unitary (noiseless) channel, or equivalently, to simulate the unitary channel by the noisy ones. The free component approach directly enables us to study these problems in the channel resource theory setting where the resource objects are quantum channels instead of states (note that it is not clear how to fully extend the hypothesis testing approach in Ref. [12] to channels). It is worth noting again that the structure of channel theories is much richer than the state theories since multiple instances of channels can be used in different ways, such as in parallel, sequentially, or adaptively. Here, we first present error bounds in the most general forms, and then specifically investigate the adaptive or sequential simulation setting, which represents a fundamental difference from state theories. To demonstrate the practical relevance of the general nogo rules and bounds, we discuss them in more specific contexts, and, in particular, outline the applications to quantum error correction, gate and circuit synthesis, and channel capacities.
Note that we often specify the input and output systems of channels in the subscripts (a channel N from system A to system B is denoted as N A→B , and if the input and output systems are the same one A it is simply denoted as N A ), but when there is no ambiguity we shall omit the labels. Given linear maps N , M, the order N − M ≥ 0 means N − M is a completely positive map. To simplify the notation, given some input state ρ on A and reference system R, we will also denote the output state of the channel N A→B acting on A by In particular, the Choi state of N is given by where Φ AR = j |j A |j R / √ d is the maximally entangled state between A and reference system R of the same dimension d.
A. General theory and results

Setups and basic error bounds
For channel resource theories, the building blocks analogous to free states and free operations are free channels and free superchannels, where superchannels map channels to channels. Like the state case, we consider the most general framework where the free superchannels are required only to obey the golden rule that any free superchannel must map a free channel to another free channel. Note again that this golden rule gives rise to the largest possible set of superchannels that encompasses any legitimate set of free superchannels, so that the fundamental limits induced by it apply universally. We also assume the following two commonly held properties of the set of free channels (which we still denote by F ): (i) The composition of free channels (for channels there are two fundamental types of composition-parallel composition (represented by tensor product ⊗), and sequential composition (represented by •)) should be free, that is, if N 1 , N 2 ∈ F , then both N 1 ⊗ N 2 ∈ F and N 2 • N 1 ∈ F hold; (ii) F is closed. We refer readers to e.g. Refs. [39,40] for more comprehensive discussions of the general framework of channel resource theories.
We now define the channel version of free component as follows.
Definition 11 (Channel free component). The free component of quantum channel N is defined as Equivalently, where C is the set of all completely positive and tracepreserving maps (quantum channels). Since N ≥ γM is equivalent to Φ N ≥ γΦ M , we also have the relation where on the RHS, Φ N is the Choi state of N and the free component Γ is defined with respect to the set of free states consisting of the Choi states of all free channels. Similar to the state case, as long as F can be characterized by semidefinite conditions, the channel free component Γ N can be efficiently computed by SDP. The channel free component also exhibits monotonicity and super-multiplicity properties.
Proposition 12 (Monotonicity). For any quantum channel N and free superchannel Π, it holds that For channels, we need to consider sequential composition in addition to parallel composition represented by tensor product. The channel free component is supermultiplicative under both types of composition.
Proposition 13 (Super-multiplicity). For any quantum channels N 1 , N 2 , it holds that Proof. Suppose that the maximization in Γ N1 , Γ N2 are, respectively, achieved by M 1 , M 2 ∈ F , that is, Here we are interested in the channel simulation task of transforming a given quantum channel N to a target unitary channel U via some superchannel up to some error that is measured by certain choices of channel distances. Let F (ρ, σ) = √ ρ √ σ 2 1 be the Uhlmann fidelity between general states ρ and σ. Consider the following three typical versions of channel fidelity that are commonly used.
• Worst-case (entanglement) fidelity: where ρ N , ρ M are, respectively, the channel output states of N , M defined in Eq. (28), and the optimization includes system R. Note that it is equivalent to optimize over pure input states due to the joint concavity of fidelity F [41].
• Choi fidelity: where Φ N , Φ M are, respectively, the Choi states of N , M.
• Average-case fidelity [42]: where the integral is over the Haar measure on the input state space.
The corresponding versions of infidelity are then Also, a standard measure of distance between channels is given by the diamond norm distance: where N := sup ρ AR N A→B ⊗ id R (ρ AR ) 1 . Again, it is equivalent to optimize over pure input states due to the convexity of trace norm · 1 . All the above channel distance measures are symmetric in its arguments. Note that these channel distance measures are commonly used in different scenarios [42]. For example, the worst-case entanglement fidelity and the diamond norm error are commonly used in quantum computation scenarios like circuit synthesis (see, e.g., Refs. [43,44], Sec. III B 4) and approximate quantum error correction (see, e.g.,Ref. [45], Sec. III B 2); the Choi fidelity is used in quantum Shannon theory to evaluate the performance of quantum communication (see, e.g., Refs. [46,47], Sec. III B 3); the average-case fidelity is easier to estimate in experiments (see, e.g.,Refs. [48][49][50][51]).
In this work, we are mostly interested in the case where an argument is a unitary channel U. Note that for pure state ψ, we have the inequality [3] 1 Applying the above result to channels, we can conclude Also, it is known [42,52] that the average-case fidelity and the Choi fidelity have the following direct relation: and thus where d is the dimension of the input system. Furthermore, it is clear from definition that for any channels N , M. To summarize, for the case of comparing with unitary channel U which is of interest in this work, the four channel distance measures are ordered as follows: We are interested in the task of using channel N to simulate unitary target channel U via transformation superchannel Π. The (different versions of) simulation error is simply given by Also define corresponding versions of the maximum overlap of channel N with free channels as Note the following simple fact.
Proposition 14 (Faithfulness). For any quantum channels N and M, for x ∈ {W, C, A}, and as a consequence, Proof. The first equivalence follows from the fact of state fidelity that F (ρ, σ) = 1 if and only if ρ = σ. The second equivalence follows since F is closed by assumption.
We now present error bounds for these channel error measures. For the Choi and average-case fidelities, note the following linearity property.
Lemma 15 (Linearity). Let x ∈ {C, A} and U be a unitary channel. Then F x (N , U) is linear in N . That is, given N = pN 1 + (1 − p)N 2 for p ∈ [0, 1] and quantum channels N 1 , N 2 , it holds that Proof. Consider the Choi fidelity first. We have where the second equality follows since the Choi state Φ U is a pure state, and the third equality follows from the linearlity of the trace function. Then due to Eq. (43), we conclude that F A has the same linearity property.
Collectively, our best bounds are the following.
Theorem 16. Given any quantum channel N and any unitary target channel U, it holds for any free superchannel Π that and where d is the dimension of the input system of U.
Proof. The proof is analogous to that of Theorem 4. By the definition of Γ N , there exists free channel M ∈ F and channel R such that N can be decomposed as follows: Let Π be any free superchannel. By the linearity of superchannels, Then for x ∈ {C, A}, it holds that where the third line follows from the linearity property Lemma 15, and the inequality follows from the fact that Note that the best bounds we can get for all error measures are in terms of the Choi overlap f C U . A natural question is whether one can directly use f W U in the bound for ε W , which would improve the bound. The problem is we do not have a linearity property analogous to Lemma 15 for the worst-case fidelity F W , so the third line does not go through.
As long as the target channel U ∈ F , it is clear by definition that f C U < 1.That is, for any channel N satisfying the Γ N > 0 condition and any resource unitary channel, all the above error bounds are nontrivial and thus imply a nonzero error.

Multiple channel uses and adaptive channel simulation
Now we discuss the scenario where one takes multiple noisy channels as inputs and intends to simulate some unitary channel, which is analogous to the standard task of distilling high-quality resources from many noisy resources in the state setting. However, the multiple instance setting represents a very important difference between channels and states. The composition of multiple states has a simple parallel structure represented by tensor products. In contrast, multiple channels can be used sequentially and adaptively, which is not simply described by tensor products and may be more powerful than the parallel scheme. Whether the adaptive scheme can outperform the parallel one is a crucial problem in many research areas about quantum channels, such as channel simulation, discrimination, and estimation (see, e.g., Refs. [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67]).
First, note that the parallel use of multiple channels N 1 , · · · , N n is again represented by tensor product and thus can be simply regarded as a single channel N = n i=1 N i . Therefore, the results above can be directly applied. In addition to error bounds, using the super-multiplicity property (Proposition 13), we directly bound the cost or overhead of unitary channel simulation, defined by the number of instances of a certain channel 9 Then for x ∈ {C, A}, it holds that where the third line follows from the linearity property Lemma 15, and the inequality follows from the fact that Note that the best bounds we can get for all error measures are in terms of the Choi overlap f C U . A natural question is whether one can directly use f W U in the bound for ε W , which would improve the bound. The problem is we do not have a linearity property analogous to Lemma 15 for the worst-case fidelity F W , so the third line does not go through.
As long as the target channel U ∈ F , it is clear by definition that f C U < 1.That is, for any channel N satisfying the Γ N > 0 condition and any resource unitary channel, all the above error bounds are nontrivial and thus imply a non-zero error.

Multiple channel uses and adaptive channel simulation
Now we discuss the scenario where one takes multiple noisy channels as inputs and intends to simulate some unitary channel, which is analogous to the standard task of distilling high-quality resources from many noisy resources in the state setting. However, the multiple instance setting represents a very important difference between channels and states. The composition of multiple states has a simple parallel structure represented by tensor products. In contrast, multiple channels can be used sequentially and adaptively, which is not simply described by tensor products and may be more powerful the parallel scheme. Whether the adaptive scheme can outperform the parallel one is a crucial problem in many research areas about quantum channels, such as channel simulation, discrimination, and estimation (see e.g. Refs. [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67]).
First, note that the parallel use of multiple channels N 1 , · · · , N n is again represented by tensor product and thus can be simply regarded as a single channel N = n i=1 N i . Therefore, the results above can be directly applied. In addition to error bounds, using the super-multiplicity property (Proposition 13), we directly bound the cost or overhead of unitary channel simulation, defined by the number of instances of a certain channel P1 P2 Pn Pn+1 N1 Nn · · · · · · FIG. 3. Quantum comb. Given input channels N1, · · · , Nn, the general map that outputs a channel can be represented by a quantum comb (gray area) realized by channels P1, · · · , Pn+1, and the input channels are used by inserting them into the slots.
needed to simulate some unitary channel, using parallel strategies.
Corollary 17 (Parallel simulation cost). Suppose some free superchannel Π transforms n instances of noisy channels N to target unitary channel U with a certain type of error ε x (N ⊗n Π − → U) ≤ x , x ∈ { , W, C, A}. Then n must satisfy: for any x ∈ { , W, C}. The bound on n in terms of the average-case error ε A is equivalent to that in terms of the Choi error ε C .
Now we consider the adaptive scheme, which represents a more general way to use multiple input channels to simulate an output channel. Here, the action on input channels N 1 , · · · , N n is represented by a "quantum comb" [68,69] Π with appropriate dimensions realized by channels P 1 , · · · , P n+1 , and the input channels are inserted into the slots (as depicted in Fig. 3). In resource theory contexts, there is again a golden rule on the combs that a free comb must map free channels to a free channel, that is, if one inserts free channels N 1 , · · · , N n ∈ F in the slots of comb Π n then the overall channel Π n (N [n] ) ∈ F (where N [n] is short for the channel collection [N 1 , · · · , N n ]). Note that, in the case where the comb is realized by free channels P 1 , · · · , P n+1 ∈ F (and the identities on the ancilla systems are considered free), it obviously obey the golden rule, because axiomatically the composition of free channels is free. However, the converse is not necessarily true, that is, the notion of free combs is more general than free realization.
Note that the channel free component obeys the following monotonicity property under free combs: Proposition 18 (Monotonicity). Given any channels N 1 , · · · , N n (collectively denoted by N [n] ), it holds that, for any free comb Π n acting on N [n] , FIG. 3. Quantum comb. Given input channels N1, · · · , Nn, the general map that outputs a channel can be represented by a quantum comb (gray area) realized by channels P1, · · · , Pn+1, and the input channels are used by inserting them into the slots.
needed to simulate some unitary channel, using parallel strategies.
Corollary 17 (Parallel simulation cost). Suppose some free superchannel Π transforms n instances of noisy channels N to target unitary channel U with a certain type of error ε x (N ⊗n Π − → U) ≤ x , x ∈ { , W, C, A}. Then n must satisfy: for any x ∈ { , W, C}. The bound on n in terms of the average-case error ε A is equivalent to that in terms of the Choi error ε C .
Now we consider the adaptive scheme, which represents a more general way to use multiple input channels to simulate an output channel. Here, the action on input channels N 1 , · · · , N n is represented by a "quantum comb" [68,69] Π with appropriate dimensions realized by channels P 1 , · · · , P n+1 , and the input channels are inserted into the slots (as depicted in Fig. 3). In resource theory contexts, there is again a golden rule on the combs that a free comb must map free channels to a free channel, that is, if one inserts free channels N 1 , · · · , N n ∈ F in the slots of comb Π n then the overall channel Π n (N [n] ) ∈ F (where N [n] is short for the channel collection [N 1 , · · · , N n ]). Note that, in the case where the comb is realized by free channels P 1 , · · · , P n+1 ∈ F (and the identities on the ancilla systems are considered free), it obviously obey the golden rule, because axiomatically the composition of free channels is free. However, the converse is not necessarily true, that is, the notion of free combs is more general than free realization.
Note that the channel free component obeys the following monotonicity property under free combs.
Proposition 18 (Monotonicity). Given any channels N 1 , · · · , N n (collectively denoted by N [n] ), it holds that, for any free comb Π n acting on N [n] , Proof. Suppose the quantum comb Π n is realized by channels P i with i = 1, · · · , n + 1, as depicted in Fig. 3. We emphasize that P i are not necessarily free channels themselves; the only requirement here is that the whole comb obeys the golden rule, i.e. Π n (M [n] ) ∈ F as long as where the inequality follows from the fact that channel tensorizations and compositions preserve the channel or- Γ Ni by definition. Now for input channels N [n] = [N 1 , · · · , N n ], comb Π n and unitary target channel U, the simulation error is defined as for x ∈ { , W, C, A}. By a little tweak of the proofs above, we establish bounds on the error and cost for adaptive simulation, which match those for the parallel case.
Corollary 19 (Adaptive simulation error). Given any channels N 1 , · · · , N n (collectively denoted by N [n] and any unitary target channel U), it holds that, for any free comb Π n acting on N [n] , where d is the dimension of the input system of U.
Proof. Simply note that, according to Eq. (70), we have the following decomposition for some channel R. By following the arguments in the proof of Theorem 16, one can establish similar error bounds where Γ N is replaced by Therefore, we can establish the same bound on the simulation cost for the adaptive scheme.
Corollary 20 (Adaptive simulation cost). Suppose some free comb Π n transforms n instances of noisy channels N to target unitary channel U with a certain type of error ε x ([N , · · · , N ] Πn − − → U) ≤ x , x ∈ { , W, C, A}. Then n must satisfy for any x ∈ { , W, C}. The bound on n in terms of the average-case error ε A is equivalent to that in terms of the Choi error ε C .
Note that the adaptive strategies may potentially reduce the error or cost of simulation compared to parallel ones, so the adaptive simulation bounds can be regarded stronger.
A general observation is that the simulation cost asymptotically scales at least as Ω(log(1/ x )) as target error x → 0 even if we allow adaptive usages of the input channels, no matter which kind of error measure x is chosen.

No-purification conditions
Here, we discuss the situations where no-go rules are in place for channel resource purification, i.e. no unitary resource channels can be exactly simulated. For both the cases of single and multiple input channels, the basic statement goes as follows.

Corollary 21.
There is no free superchannel (or comb) that exactly transforms channel N (or a collection of channels {N i }) to any unitary resource channel U / ∈ F if Γ N > 0 (or Γ Ni > 0, ∀i).
Proof. Since F is closed by assumption and U / ∈ F , we have f U < 1. Then due to Theorem 16 the transformation error (in whichever measure) is strictly positive, indicating that the exact transformation is impossible. Now similar to Proposition 7, we give a series of alternative characterizations of the Γ > 0 condition for channels, which could be illustrative or useful in certain scenarios: Worth noting, in the channel theory, the counterparts of min-relative entropy monotones also nicely contrast noisy entities with pure ones.

B. Practical scenarios and applications
The above no-purification rules and bounds are given in general forms so that their range of applicability is as wide as possible. To provide some concrete understanding and guideline of their practical relevance, we now discuss some specific scenarios and applications of interest. We shall start with a general discussion on typical noise models and the corresponding no-purification bounds in the contexts of different kinds of channel resource theories, and then specifically consider the roles of no-purification bounds in the contexts of quantum error correction, quantum communication, and circuit synthesis. Note that the main objective of our discussion here is to establish the frameworks for linking the no-purification principles to these practical problems. We shall mostly present general-form bounds, which are expected to be crude for certain specific resource features, noise models, system features etc., leaving refined analyses elsewhere.

Channel resource theories and practical noises
At a high level, we have the following two major different types of channel resource theories, signified by the role of the identity channel.
• Information preservation theories. In such theories, one is primarily interested in the noise channels and their abilities to simulate noiseless channels so as to preserve or transmit information. Typical scenarios include quantum error correction and quantum communication. A signature of such theories is that the identity channel (between certain systems) is an ideal resource channel, representing no error or loss of quantum information occurring. The set of free channels commonly involve e.g., certain constant (replacer) channels, which represent complete loss of information. Here the free channels are in general directly induced by physical restrictions on the implementable operations that, e.g., perform the tasks of encoding and decoding.
• Resource generation theories. Such theories are commonly based on some resource theory defined at the level of states (such as entanglement, coherence, magic states). The features of channels and simulation tasks of interest are related to their ability of generating the state resource. Here the set of free channels are derived from state theories and thus obey the resource non-generating property (for example, the identity channel is axiomatically free).
A typical scenario of this kind is synthesis, where a common task is to simulate, or "synthesize" some complicated target channel by elementary resource channels. See further discussions in the next part.
In some sense, theories of the first kind are intrinsically based on channels, and those of the second kind are induced by state theories. Such classification may help elucidate the interplay between channel and state resource theories. Now we discuss typical noisy channels of interest in these two different kinds of channel resource theories.
First, consider the first kind, i.e. information preservation theories, where the identity channel id is a resource. Here, the simulation capabilities (capacities) of noise channels themselves are of interest. A general observation is that, for stochastic noise where µ ∈ (0, 1) is the noise rate, if the noise channel N is considered free in the theory in consideration, then Γ Nµ ≥ µ, which can be directly used to establish bounds on simulation error and cost. We list a few important noise models that are special cases: (i) Depolarizing noise: N (ρ) = I/d is just a constant channel that outputs the maximally mixed state; (ii) Erasure noise: N (ρ) = |⊥ ⊥| is also a constant channel that outputs an orthogonal garbage state; (For these two cases N is normally free as it essentially erases information completely.) (iii) Dephasing noise: N = ∆ which erases the off-diagonal entries and thus is typically free in quantum scenarios since all coherence-related information is lost; (iv) Pauli noise: N (ρ) = i µ i P i ρP i where ∀i µ i ≥ 0, i µ i = µ and P i 's are non-identity Pauli operators (note that this model encompasses the depolarizing and dephasing noises); Here N is a stabilizer operation, and thus the global Pauli noise has free component in the stabilizer theory, leading to limitations on stabilizer codes. We shall demonstrate the connections to quantum error correction in more detail in Sec. III B 2. Quantum communication is another important scenario of this kind, which we shall discuss more specifically in Sec. III B 3. For the second kind, i.e. resource generation theories, the input channels of practical interest are usually not the noise channels themselves but the resource-generating channels contaminated by noises. For example, consider N µ • G = (1 − µ)G + µN • G where N µ is a stochastic noise and G is a noiseless resource-generating channel. Also note that, in contrast to the first kind, the theory is commonly built upon a clear notion of free states. Then a general observation for this case is that if N always output a free state, then Γ Nµ•G ≥ µ. Again, this holds for the depolarizing and erasure noises in normal theories where the maximally mixed state and the garbage state are free (note that the bound can be loose in e.g. magic theory; see Sec. III B 4). Then by definition, it also applies to dephasing noise in theories where the diagonal states are free (such as coherence and certain asymmetry theories). As mentioned, a particularly important problem in such theories is gate synthesis. In Sec. III B 4, we shall discuss the implications of our general results to practical synthesis problems in more detail.
Notably, certain communication problems and gate synthesis correspond to adaptive channel simulation, which cannot be understood in the single-channel or parallel simulation schemes.

Quantum error correction
As a cornerstone of quantum computing and information [3], quantum error correction (QEC) serves to reduce noise effects and errors in physical systems by the idea of encoding the quantum information in a suitable way so that after noise and errors occur the original logical information can be restored (decoded) . It is clearly important to understand various kinds of limits on QEC. Our results here are relevant to the broadly important scenario where the QEC procedures and codes obey certain rules or constraints. Typical examples include the well-studied stabilizer codes [3,71], and covariant codes [25][26][27][28][29][30][31], which has recently drawn considerable interest in quantum computing and physics. In Sec. II we presented general limits on the QEC accuracy based on understanding the decoding as a purification task. Here the channel framework provides an alternate formulation: Notice that the QEC task is essentially to simulate an identity channel on the logical system; then the channel no-purification bounds induce fundamental limits on this channel simulation task. As a result, we have the following general bounds on the accuracy and cost of constrained QEC when the system is subject to generic non-unitary noises (L, S denote the logical and physical systems respectively):

Corollary 23 (Constrained quantum error correction).
Suppose that the encoder and decoder are free channels (subject to certain resource theory constraints) Π. Then given noise channel N S acting on the physical system S, the commonly considered overall error measures for approximate QEC ε x , x ∈ { , W, C} obey For example, consider the natural independent noise model where the noise channel N acts independently and uniformly on each subsystem (e.g. qubit), i.e., the overall noise channel has the form N S = N ⊗n . Then and therefore, to achieve target error ε x (N S → id L ) ≤ x , the number of physical subsystems n obeys In the case of stochastic noise N = (1−µ)id+µM where M ∈ F , Γ N in the above bounds can be replaced by µ.
As previously noted, this general result applies to the important cases of stabilizer and covariant QEC, which, respectively, correspond to Clifford [71] and symmetry [29] constraints. Note again that, in covariant QEC, under the commonly held assumption that the noise channel N S is covariant, we have the stronger conclusion that the error bound Eq. (77) holds for any decoder, meaning that covariant codes are no better than [Γ N S (1 − f C id L )]correctable for any decoder [29,Lemma 2]. For independent Pauli and erasure noises in the stabilizer case, and depolarizing, dephasing, and erasure noises in the covariant case, Γ N can be replaced by µ in the bounds.
The bounds here are given in the most general forms, indicating universal limitations on the accuracy and cost of constrained QEC schemes for any noise channel with free component like typical global noise channels, which are naturally important but underinvestigated in the context of QEC. It would be interesting to perform more refined analysis of the bounds for specific constraints and noise models, which we leave for future work.

Quantum communication and Shannon theory
The central problem in quantum Shannon theory is to determine the capability of quantum channels to reliably transmit information. Depending on the purpose of transmission (e.g., transmitting classical or quantum information) and the resources that can be used at hand, there are many different variants of channel capacities, each of which corresponds to a channel simulation task in the language of resource theory (see, e.g., Refs. [39,55,[72][73][74][75][76]).
Here we discuss quantum capacities, which correspond to the task of transforming a given channel to an identity channel between two distinct, distant parties (labs). Note that we need to distinguish the identity channel shared between distant labs from the local identity channel whose input and output systems belong to the same lab. The former is regarded as the ideal resource while the latter is completely free. In resource theory language, channel capacities are determined by the choice of free superchannels or combs Π, which correspond to specific coding strategies. Some important cases include the following [74][75][76]: • Unassisted code: superchannel Π can be decomposed into an encoder K A→A by Alice composed with a decoder D B→B by Bob, i.e., Π = D B→B K A→A ; • Entanglement-assisted code: superchannel Π acts as Π(N )(ρ A ) = D BB→B N A →B K AĀ→A (ρ A ⊗ ωĀB) with encoder K AĀ→A , decoder D BB→B and shared quantum state ωĀB; • Non-signalling assisted code: superchannel Π is non-signalling from Alice and Bob and vice versa; • Two-way classical-communication-assisted code: quantum comb Π can be realized by local operations and classical communication (LOCC) operations P 1 , · · · , P n+1 between Alice and Bob (see Fig. 3).
Once the free superchannels or combs are set, the set of free channels is then implicitly defined as the channels that can be generated via these superchannels or combs. Note that the first three coding strategies correspond to parallel channel simulation while the last one corresponds to adaptive channel simulation. The performance of quantum communication can be characterized by an achievable triplet (n, k, ), meaning that there exists a Π-assisted coding strategy that uses n instances of the resource channel to transmit k qubits, or simulate id 2 k (identity channel on the system of dimension 2 k ), within error (here we consider Choi error, which is the standard choice of error measure for quantum communication). Then by Corollary 19, we can obtain the following bounds on these parameters for general quantum communication in the non-asymptotic regime.
Corollary 24 (Quantum communication). Suppose (n, k, C ) is an achievable quantum communication triplet by noise channel N with an Π-assisted code. Then the Choi error C obeys In other words, the minimum number of channel uses required to enable reliable transmission of k qubits within Choi error C must satisfy We now discuss in more detail the two-way assisted quantum capacity, which is of particular importance due to its close relation to the practical scenario of distributed quantum computing and quantum key distribution. Due to the notorious difficulty of adaptive communication strategies and the involved structure of LOCC operations, this quantum communication scenario is not well understood in spite of its practical importance. The corresponding asymptotic setting that assumes infinite access to the resource channels was recently investigated by a relaxation of LOCC operations to the mathematically more tractable PPT operations (see, e.g., Refs. [75][76][77]). In this case, we have the maximum overlap f C id 2 k ≤ 1 2 k [78]. As the quantum capacity concerns the maximum number of qubits that can be reliably transmitted per use of the channel, we can equivalently obtain from Corollary 24 a nontrivial trade-off (which can be interpreted as a bound on the non-asymptotic two-way assisted quantum capacity): Also note that, since PPT operations are semidefinite representable, Γ N here can be efficiently computed by where Φ N is the Choi state of N A→B . Fitting this into Eq. (82) can help us do analysis beyond the asymptotic treatment and understand the intricate trade-off between different operational parameters of concern.

Noisy circuit synthesis
The problem of approximating some desired transformation by quantum circuits consisted of certain elementary gates, commonly studied under the name of quantum circuit or gate or unitary synthesis (or sometimes known as "compiling"), is crucial to the practical implementation of quantum computation. Depending on the practical setting, it is often the case that some gates are considered particularly costly as compared to other gates, and thus we are mostly interested in the amount of costly gates needed for the desired synthesis task. A key observation here is that such synthesis tasks can be formalized as adaptive channel simulation problems, where free gates form a comb and the costly gates are input channels that are inserted into the slots of the comb. A particularly important case is "Clifford+T ," where we would like to decompose the target transformation into Clifford gates, which are assumed to be free since they can be rather easily implemented fault tolerantly, and the "expensive" T gates T = |0 0| + e iπ/4 |1 1|. Note that the T gates are often implemented by "state injection" gadgets [79] that make use of T states produced by magic state distillation (studied in Sec. II), which is a resourceintensive procedure. Therefore, the key figure of merit we would like to optimize is the number of T gates used (namely the "T -count"); see, e.g., Refs. [80][81][82][83][84][85][86] for a host of previous studies related to this problem. Notably, resource theory is helpful for finding good bounds on the T -count in certain cases [23,87].
Existing literature on the synthesis problem mostly focuses on the noiseless scenario, where the elementary gates are unitary. The noisy nature of practical (especially near-term) devices motivates us to consider the scenario where certain gates are intrinsically associated with noise and such noisy gates are the elementary components of the circuit for synthesis. For example, a key incentive for the Clifford+T model is that the non-Clifford gates are much harder to protect compared to Clifford gates, so that one may want to consider intrinsically noisy non-Clifford gates (see below). We note that there are fundamental differences between this noisy synthesis setting and the noiseless one, as seen later. Now, the central question is how many noisy resource gates are needed to approximate a target unitary. Based on the observation mentioned above which links the synthesis problem to adaptive channel simulation, we establish the following universal lower bounds on such "noisy gate count" from Corollary 20 (note that for synthesis problems we often use the diamond norm error).
Corollary 25 (Noisy gate count). Consider the synthesis task of simulating unitary channel U by channel (noisy gate) G and arbitrary use of a set of free channels, which compose a free comb, within diamond norm error . Then the number of instances of G needed must satisfy We now investigate the Clifford+T case specifically, where the T gate is associated with noise, and we are interested in the number of such noisy T gates, or the "noisy T -count". Let C n = {U j } N j=1 be the n-qubit Clifford group consisting of N discrete elements. Let the set of free channels be the convex hull of C n , i.e., F = conv(C n ), meaning that we allow mixtures of Clifford gates. Any free channel M ∈ F can be written as a convex combination M = N j=1 p j U j with p j ≥ 0 and N j=1 p j = 1. For condition N ≥ γM, we can replace q j = γp j and obtain an equivalent condition M ≥ N j=1 q j U j with q j ≥ 0 and γ = N j=1 q j . Therefore, the free component can be computed by a semidefinite program

14
Clifford gates, which are assumed to be free since they can be rather easily implemented fault-tolerantly, and the "expensive" T -gates T = |0 0| + e iπ/4 |1 1|. Note that the T -gates are often implemented by "state injection" gadgets [79] that make use of T -states produced by magic state distillation (studied in Sec. II), which is a resourceintensive procedure. Therefore, the key figure of merit we would like to optimize is the number of T -gates used (namely the "T -count"); see e.g. [80][81][82][83][84][85][86] for a host of previous studies related to this problem. Notably, resource theory is helpful for finding good bounds on T -counts in certain cases [23,87]. Existing literature on the synthesis problem mostly focuses on the noiseless scenario, where the elementary gates are unitary. The noisy nature of practical (especially near-term) devices motivates us to consider the scenario where certain gates are intrinsically associated with noise and such noisy gates are the elementary components of the circuit for synthesis. For example, a key incentive for the Clifford+T model is that the non-Clifford gates are much harder to protect compared to Clifford gates, so that one may want to consider intrinsically noisy non-Clifford gates (see below). We note that there are fundamental differences between this noisy synthesis setting and the noiseless one, as will be seen later. Now, the central question is how many noisy resource gates are needed to approximate a target unitary. Based on the observation mentioned above which links the synthesis problem to adaptive channel simulation, we establish the following universal lower bounds on such "noisy gate count" from Corollary 20 (note that for synthesis problems we often use the diamond norm error): Corollary 25 (Noisy gate count). Consider the synthesis task of simulating unitary channel U by channel (noisy gate) G and arbitrary use of a set of free channels which compose a free comb, within diamond norm error . Then the number of instances of G needed must satisfy: We now investigate the Clifford+T case specifically, where the T -gate is associated with noise, and we are interested in the number of such noisy T -gates, or the "noisy T -count". Let C n = {U j } N j=1 be the n-qubit Clifford group consisting of N discrete elements. Let the set of free channels be the convex hull of C n , i.e. F = conv(C n ), meaning that we allow mixtures of Clifford gates. Any free channel M ∈ F can be written as a convex combination M = N j=1 p j U j with p j ≥ 0 and N j=1 p j = 1. For condition N ≥ γM, we can replace q j = γp j and obtain an equivalent condition M ≥ N j=1 q j U j with q j ≥ 0 and γ = N j=1 q j . Therefore, the free component can be computed by a semidef- inite program where Φ N and Φ Uj are the Choi states of N and U j respectively. As a concrete example, consider T gate followed by depolarizing noise N µ (ρ) = (1 − µ)ρ + µI/2 as the elementary channel. Its free component Γ Nµ•T is computed by the SDP Eq. (85) where N = 24 (see e.g. [88] for an explicit enumeration of C 1 ), and depicted in Fig. 4(a). When µ ≥ 1 − √ 3/3 ≈ 0.42 we see that Γ Nµ•T = 1, as N µ compresses the entire Bloch sphere into the stabilizer octahedron so any output is a stabilizer state. Note that this explicit calculation improves the general bound µ as discussed in Sec. III B 1. In Fig. 4(b), as an example, we plot the lower bounds on the noisy Tcount in order to approximate a CCZ gate, obtained from Corollary 25 (where we used f W CCZ ≤ 9/16 [23, Eq.(33)]). Recently, Ref. [89,Proposition 26] also gave an expression for the noisy gate counts in magic theory of odd dimensions using the mana monotone. Note that our result applies to any dimension, and is expected to outperform the mana bound especially in the small target error regime. In particular, our bound implies diverging cost as the target error → 0, which is in line with intuitions, but the mana bound cannot.
Finally, we would like to remark that the noisy synthesis results here are fundamentally different from the existing ones on noiseless synthesis, in spite of some apparent relations. Most notably, it is known that for any universal gate set, the number of gates needed to approximate all unitaries up to error (which can essentially be measured by any channel error measure discussed earlier) scales at least as Ω(log(1/ )) [90] (note that the wellknown Solovay-Kitaev theorem [3,43,44] concerns the upper bound). Although the Ω(log(1/ )) scaling is similar to our lower bound on noisy gate counts, there are two key differences: (i) Our noisy synthesis result bounds the where Φ N and Φ Uj are the Choi states of N and U j respectively. As a concrete example, consider T gate followed by depolarizing noise N µ (ρ) = (1 − µ)ρ + µI/2 as the elementary channel. Its free component Γ Nµ•T is computed by the SDP Eq. (85) where N = 24 (see, e.g., Ref. [88] for an explicit enumeration of C 1 ), and depicted in Fig. 4(a). When µ ≥ 1 − √ 3/3 ≈ 0.42 we see that Γ Nµ•T = 1, as N µ compresses the entire Bloch sphere into the stabilizer octahedron so any output is a stabilizer state. Note that this explicit calculation improves the general bound µ as discussed in Sec. III B 1. In Fig. 4(b), as an example, we plot the lower bounds on the noisy T -count in order to approximate a CCZ gate, obtained from Corollary 25 (where we used f W CCZ ≤ 9/16 [23, Eq.(33)]). Recently, Ref. [89,Proposition 26] also gave an expression for the noisy gate counts in magic theory of odd dimensions using the mana monotone. Note that our result applies to any dimension, and is expected to outperform the mana bound especially in the small target error regime. In particular, our bound implies diverging cost as the target error → 0, which is in line with intuitions, but the mana bound cannot.
Finally, we would like to remark that the noisy synthesis results here are fundamentally different from the existing ones on noiseless synthesis, in spite of some apparent relations. Most notably, it is known that for any universal gate set, the number of gates needed to approximate all unitaries up to error (which can essentially be measured by any channel error measure discussed earlier) scales at least as Ω(log(1/ )) [90] (note that the wellknown Solovay-Kitaev theorem [3,43,44] concerns the upper bound). Although the Ω(log(1/ )) scaling is similar to our lower bound on noisy gate counts, there are two key differences: (i) Our noisy synthesis result bounds the number of resource gates needed and says nothing about the number of free gates, while the previous noiselesscase result counts the total number of gates; (ii) Our noisy synthesis result is universal for any target resource unitary, while the previous noiseless-case result examines the worst case and there could well be target unitaries with lower or even trivial cost (some target unitaries can be exactly simulated, such as T in Clifford+T ). Relatedly, the geometric covering argument used in Ref. [90] is not useful for the noisy case. In general, the noiseless and noisy synthesis and gate counts are fundamentally disparate problems contingent on different factors. This can again be seen from Clifford+T , where intricate number theory properties and techniques play decisive roles in the noiseless case [80][81][82][83][84] while being irrelevant in the noisy case.

IV. CONCLUDING REMARKS
We introduced a simple, universal framework for understanding and analyzing the limitations on quantum resource purification tasks that applies to virtually any resource theory, based on the notion of "free component" of noisy resources. We developed the theory in detail for both quantum states and channels. For the state theory, our new results significantly improve over corresponding ones discovered in Ref. [12] in terms of both the regime of the no-purification rules and the quantitative limits. This framework also enabled us to quantitatively understand the no-purification principles for quantum channels or dynamical resources. Specifically, the channel theory involves complications concerning error measures and the possibility of adaptively using multiple resource instances, as compared to the state theory. We demonstrated broad theoretical and practical relevance of our techniques and results by discussing their applications to several key areas of quantum information science and physics. The simplicity and generality of our theory highlight the fundamental nature of the no-purification principles.
Several technical problems are worth further study. First, we considered channel simulation with a single target channel here, but more generally the output can also be a comb [68]; It would be interesting to further study the no-purification bounds for such cases and explore their relevance. Second, we formulated the results in terms of deterministic one-shot transformation and only left preliminary remarks on the probabilistic case; A comprehensive understanding of the probabilistic case is left for future work. Third, it is worth further study purification tasks for continuous variables, especially resource (e.g. non-Gaussianity) distillation tasks and their applications in optical quantum information processing, given that there are some sharp distinctions known concerning the feasibility and behaviors of distillation procedures [38,91] between continuous and discrete variables, but the understanding of the full correspondence is still preliminary.
Furthermore, it would be interesting to further analyze our bounds and associated parameters in specific theories and problems. The discussion on the applications we gave here mainly serve to establish the general, conceptual connections and are thus preliminary. Further developments of these connections, taking specific features of the system, resource, and noise etc. into account, may be fruitful. In particular, for the extensively studied topics of quantum error correction and quantum Shannon theory, it would be interesting to further optimize the bounds and compare them with existing results in specific scenarios. We eventually hope that our demonstrations here will spark explorations of further applications or consequences of the no-purification principles in quantum information and physics.
Note added. After the completion of our paper, we became aware that Regula and Takagi independently considered the resource weight and obtained results related to ours which later developed into Ref. [92]. The two papers were arranged to be released concurrently on arXiv.